Unified: Add schema checking and corpus-style tests#21848
Draft
asgerf wants to merge 21 commits into
Draft
Conversation
This adds tests consisting of source code and a printout of its rewritten AST.
One-shot desugaring rules now skip unnamed nodes (punctuation, keywords, etc.) since rules are intended to target named nodes only. Also prevent infinite recursion when a capture refers to the root node of the matched tree (e.g. an @_ capture on the pattern root). Additionally fix the swift.rs add_phase call to match the updated 3-arg signature introduced by the one-shot phase kind commit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s framework Add ast_types.yml defining the unified output AST schema with supertypes (expr, stmt, condition, pattern) and named nodes (top_level, binary_expr, name_expr, etc.). Rewrite swift translation rules to map from tree-sitter Swift grammar to the unified AST, using one-shot phase rules. Update the generator to use the output AST schema for dbscheme/QL generation, and normalize the extraction table prefix to 'unified'. Improve the corpus test framework to include raw tree-sitter parse output, type-error checking against the output schema, and better failure reporting. Regenerate Ast.qll, unified.dbscheme, and update BasicTest accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add corpus test cases for Swift covering closures, collections, control flow, functions, literals, loops, operators, optionals/errors, types, and variables. Update existing desugar.txt with raw parse sections. Note: operator nodes currently render their node ID instead of the actual operator text (e.g. operator "3" instead of operator "+"). This will be fixed in the next commit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce NodeRef as a typed wrapper around node arena IDs. Captures in
desugaring rules are now bound as NodeRef instead of raw usize, which
prevents accidental misuse and enables source-text-aware rendering.
Add the YeastDisplay trait as an alternative to Display: its
yeast_to_string method receives the Ast, allowing NodeRef to resolve to
the captured node's source text instead of printing a numeric ID.
Store the original source bytes in the Ast so that NodeContent::Range
values (from synthesized literal nodes) can be resolved back to text.
Update yeast-macros to emit NodeRef-typed capture bindings and use
Into::<usize>::into where raw IDs are needed. The #{expr} template
syntax now uses YeastDisplay instead of Display.
The effect is visible in the corpus tests: operator nodes now correctly
render as e.g. operator "+" instead of operator "3".
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n in field patterns Two changes to parse_query_fields: - Allow `field: (kind)* @cap` (repetition + optional capture) in field position, mirroring how it works for bare children. - When the same field name is declared multiple times in a query (e.g. `condition: (foo) condition: (bar)`), merge them into a single ordered list of children rather than emitting duplicate field entries (which at runtime restart the iterator for the field and cause the second declaration to re-match from the first child).
…d mapping
ast_types.yml additions:
- tuple_pattern { element*: pattern } in the pattern supertype.
- sequence_condition { stmt*: stmt, condition: condition } in the
condition supertype.
swift.rs:
- Map Swift tuple destructuring (e.g. `let (a, b) = pair`) to the new
tuple_pattern instead of synthesizing an apply_pattern.
- if-let / guard-let: explicitly match the value_binding_pattern
(the `let` keyword) and bind the source expression as the next
condition child, so `let` no longer leaks into the output.
The branch was rebased on the grammar changes, but rewriting the history was too difficult, so I'm just updating the test output here.
The output is not so interesting as the mapping removes most nodes from the current test file. I added a name_expr.swift test so at least one NameExpr makes it through.
Contributor
Author
Rerun has been triggered: 2 restarted 🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Corpus tests with schema checking
This PR adds tests modelled after tree-sitter's "corpus tests". A test file contains a set of of triples (source code, raw tree, final tree), where the trees are indentation-printed ASTs.
The "final tree" is also rendered with schema violations printed inline. Schema violations would cause a QL extractor crash, but integrating with corpus tests results in a tighter feedback loop (runs faster and has more informative errors)
For example, one test looks like:
However, if I were to introduce a type error in the mapping:
the last section of the test output would look like:
Since YEAST is not strongly typed we currently have to rely on testing to catch these errors.
Why is this PR so big?
This PR ends up doing a lot of stuff, which is unfortunately hard to disentangle.