Software Testing and Verification - Fall 2024
Project Guidelines
Detailed Guidelines of Each Phase:
Github Classroom Invitation
Backgrounds:
In software testing, paths from Control Flow Graphs (CFG) and Data Flow Graphs (DFG) are commonly used for tasks like
test coverage analysis. However, constructing these graphs can be complex and time-consuming due to elements like
loops,
conditionals, and nested structures. For instance, loops in a CFG can introduce paths that repeat many times, leading
to
a combinatorial explosion of scenarios to analyze, especially when nested or combined with other conditions. To
simplify
the process, we will instead extract paths from the Abstract Syntax Tree (AST), which represents the code’s structure
without needing to analyze complex runtime behavior. This approach reduces complexity while still capturing the essential idea of traversing a particular code representation.
Overall Description:
In this project, you will implement an AST Path and Path-context Extractor for Java grammar. You will mainly use
Tree-sitter — a popular AST parser for multiple languages, and its Python binding to finish the project. The goal is to
create a tool that can traverse an AST, extract AST paths, and output in the given format.
Definitions: (Ref: Section4.1)
- Abstract Syntax Tree (AST). It consists of:
- A set of non-terminal nodes N.
- A set of terminal leaf nodes T.
- A set of terminal leaf values X.
- A root node in N.
- A function that maps from non-terminal nodes to non-terminal leaf nodes or terminal leaf nodes.
- A function that maps from terminal leaf nodes to terminal values.
- AST Path.Within the scope of this project, we define an AST path as a path from one terminal leaf node to another terminal leaf node. A path can be partitioned
into three parts:
- All the up-directed edges and nodes between the starting terminal leaf node (inclusive) to the lowest common prefix
node
(exclusive).
- Lowest common prefix node.
- All the down-directed edges and nodes between the lowest common prefix node (exclusive) to the ending terminal leaf
node
(inclusive).
- To simplify, we include the terminal values associated with the terminal leaf nodes on both
ends of the AST path defined
above as the ASTPath data structure in our project.
- Path Length. The number of edges between two leaf nodes of the AST path.
- Path Width. The difference between the sibling nodes that are children of the lowest common prefix node of
the path. Specifically, these sibling nodes must also be ancestors of the terminal nodes that form the endpoints of
the path.
- An example:
In this example, an AST path is highlighted in red.
- The AST path is (n1(v4)-up-n2-down-n3(v7))
- The Path length is 2. (1 up edge + 1 down edge. We print path with terminal values attached, but we do not include them in the length calculating.)
- The Path width is 2. (Distance between n1 and n3, abs(child_id(n1)-child_id(n3)))
Implementation Overview:
The project is divided into three stages:
- Environment setup and terminal leaf nodes collection.
- AST path construction given the two terminal leaf nodes.
- AST paths extraction from every two AST nodes of every function within the given code, with feature constraints.
-
In Project Phase I, you will first set up the GitHub Classroom and Python Tree-sitter environment. You will
implement:
LeavesCollector
for traversing collecting the leaf nodes (terminal leaf nodes) given an AST.
NodeProperty
, to maintain user-defined properties other than original AST node properties. These
properties will be collected simultaneously with the traversal of LeavesCollector.
-
In Project Phase II, you will implement:
-
AstPath
data structure, which represent a single path in AST (explained in the example above).
-
ExtractPath(source: Node, target: Node)
function as an essential component of our goal
PathExtractor.
-
FunctionVisitor
: A visitor similar to leaves collector implemented in stage I. But it will collect
information from all method nodes during traversal.
Implementation of this module build the foundation for the next phase.
-
In Project Phase III, you will implement:
-
ASTPathSet
. It encapsulates multiple ASTPath
s within a given method and include the
method name.
-
Complete
PathExtractor
. It can batch process methods in the given code.
It can also extract paths under certain constraints such us path length and path width we mentioned above.
A final report is required for the last phase.
For each phase, we will use GitHub Classroom to submit the code.
You will need to complete the code by addressing TODOs given in the template.
We will release a detailed guideline for each phase.
The functionalities and intentions are explained in the template comments as well.
We provide two tests for each project phase. You can modify them as needed.
You can run them by executing the main function of each module or using pytest.
This is a fast and simple way to check the correctness and functionalities of your implementation.
After submitting, we will test your code based on our in-house test cases. The result will be shown as "pass" or "fail",
and you can submit repeatedly until the deadline.
Reference and Hints:
-
Tree-sitter:
-
Use of LLMs:
- You can use LLM tools for programming within the project.
- Be sure to include references to the use of LLMs as comments in your code.
- Recommended Tools: Cursor,
Copilot (free for student accounts),
ChatGPT.
Grading:
Phase I: 10% if all tests are passed.
Phase II: 10% if all tests are passed.
Phase III: 10% if all tests are passed + 5% final report.
The percentage contributes to your final grade.