Project Phase II
Due 11/22
Update the Template for Phase II
git checkout -b UCR-CS182-main main
git pull https://github.com/UCR-CS182/ucr-cs182-24f-project-template1_test.git main
git checkout main
git merge --no-ff UCR-CS182-main
git push origin main
Structure of the Phase II
File added:
ast_path
: It defines AstPath
class to represent a single path in the AST. It has 4 fields:
target
and source
are tree-sitter nodes of the path start node and path end node, which are two terminal leaf nodes.
path
represents a formatted string of the path without including the terminal leaf values of the target and source nodes.
hashed_path
stores the hash of the path string, but it will not be used in this project.
path_extractor.py
: This file defines PathExtractor class, which you need to implement.
For this phase, focus on implementing extract_path
function, which extracts the path between
two terminal leaf nodes.
The start node of the path is referred to as the “source node,” and the end node as the “target node.
Implementation:
-
__get_path_stack(node)
: Given a terminal leaf node,
this helper function generates a path from the leaf to the root,
returning a list of nodes in this order.
This list (path_stack) will be useful for comparing paths between nodes.
-
__len_common_prefix
: This method compares two stacks of nodes,
each representing a path from a terminal leaf to the root. It calculates the
length of the common prefix shared between the two stacks and identifies
the indices of the first nodes in each stack where they differ.
-
The
extract_path
method generates a string representation of the path between two terminal leaf nodes, source and
target. This path is represented by a sequence of abstract types and child IDs, separated by defined symbols (UpSymbol
and DownSymbol). The method ensures that common prefixes in the path are included only once and checks path length and
width constraints to limit the path complexity.
-
Initialize Stacks
-
The source_stack and target_stack are constructed using the helper method
__get_path_stack
.
These stacks represent the hierarchical path from each leaf node (source and target) to the root of the tree.
-
Using
__len_common_prefix
, the method calculates the common prefix length between the two stacks and identifies the
indices of the first nodes that differ. This helps in constructing the path representation more efficiently by avoiding
redundancy.
-
Check Path Feature Constraints
-
The method checks that the length of the path (based on the nodes in source_stack and target_stack) is within a
specified limit (MaxPathLen). If the path length exceeds this limit, the function returns an empty string.
-
Additionally, it calculates the path width, which is the difference between the child IDs of the first differing nodes
in the two stacks. If the width exceeds a specified maximum (MaxPathWidth), the function also returns an empty string.
-
Construct the Path
-
Source Half: The first half of the path string is built by iterating through
source_stack
, excluding the common prefix
nodes. For each node in this part, it checks whether to add a child ID based on specific criteria, such as the node type
and its parent type. The child ID is limited by MaxChildId using the helper method __saturate_id
, if necessary.
-
Common Prefix: The common prefix node, representing the shared part of the source and target paths, is added only once
in the path string.
-
Target Half: The method then builds the target half of the path by iterating through
target_stack
in reverse order,
excluding the common prefix nodes, and appends each node’s abstract type and child ID.
Tests
For this phase, no local tests are provided.
You are encouraged to write your own tests to validate the functionality of
your implementation. Similar to phase I,
you can test by implementing the main function in path_extractor.py or implement pytests under "tests" folder.