Skip to content

Unified: Add schema checking and corpus-style tests#21848

Draft
asgerf wants to merge 21 commits into
github:mainfrom
asgerf:asgerf/swift-yeast
Draft

Unified: Add schema checking and corpus-style tests#21848
asgerf wants to merge 21 commits into
github:mainfrom
asgerf:asgerf/swift-yeast

Conversation

@asgerf
Copy link
Copy Markdown
Contributor

@asgerf asgerf commented May 13, 2026

Corpus tests with schema checking

This PR adds tests modelled after tree-sitter's "corpus tests". A test file contains a set of of triples (source code, raw tree, final tree), where the trees are indentation-printed ASTs.

The "final tree" is also rendered with schema violations printed inline. Schema violations would cause a QL extractor crash, but integrating with corpus tests results in a tighter feedback loop (runs faster and has more informative errors)

For example, one test looks like:

===
Additive expression is desugared
===

1 + 2

---

source_file
  additive_expression
    lhs: integer_literal "1"
    op: +
    rhs: integer_literal "2"

---

top_level
  body:
    binary_expr
      operator: operator "+"
      left: int_literal "1"
      right: int_literal "2"

However, if I were to introduce a type error in the mapping:

        rule!(
            (integer_literal) @lit
            =>
            (block_stmt) // Deliberate error
        ),

the last section of the test output would look like:

top_level
  body:
    binary_expr
      operator: operator "+"
      left: block_stmt "1" <-- ERROR: The field binary_expr.left should contain expr, but got block_stmt
      right: block_stmt "2" <-- ERROR: The field binary_expr.right should contain expr, but got block_stmt

Since YEAST is not strongly typed we currently have to rely on testing to catch these errors.

Why is this PR so big?

This PR ends up doing a lot of stuff, which is unfortunately hard to disentangle.

  • The main objective is to corpus tests with schema checking in there ASAP.
  • Unfortunately this can only be exercised by changing the output schema, which causes the QL extractor to panic until the mapping successfully replaces the tree with an output that fits the target schema.
  • In the process of building up such a mapping, a bunch of problems with YEAST were discovered and fixed along the way.
  • These fixes are not that easy to understand without corresponding corpus tests to visualise their effect. Cherry-picking and merging in isolation was technically possible but not necessarily easier to review.

asgerf and others added 20 commits May 13, 2026 10:35
This adds tests consisting of source code and a printout of its rewritten AST.
One-shot desugaring rules now skip unnamed nodes (punctuation, keywords,
etc.) since rules are intended to target named nodes only.

Also prevent infinite recursion when a capture refers to the root node of
the matched tree (e.g. an @_ capture on the pattern root).

Additionally fix the swift.rs add_phase call to match the updated 3-arg
signature introduced by the one-shot phase kind commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s framework

Add ast_types.yml defining the unified output AST schema with supertypes
(expr, stmt, condition, pattern) and named nodes (top_level, binary_expr,
name_expr, etc.).

Rewrite swift translation rules to map from tree-sitter Swift grammar to
the unified AST, using one-shot phase rules.

Update the generator to use the output AST schema for dbscheme/QL
generation, and normalize the extraction table prefix to 'unified'.

Improve the corpus test framework to include raw tree-sitter parse output,
type-error checking against the output schema, and better failure
reporting.

Regenerate Ast.qll, unified.dbscheme, and update BasicTest accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add corpus test cases for Swift covering closures, collections, control
flow, functions, literals, loops, operators, optionals/errors, types,
and variables. Update existing desugar.txt with raw parse sections.

Note: operator nodes currently render their node ID instead of the actual
operator text (e.g. operator "3" instead of operator "+"). This will be
fixed in the next commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce NodeRef as a typed wrapper around node arena IDs. Captures in
desugaring rules are now bound as NodeRef instead of raw usize, which
prevents accidental misuse and enables source-text-aware rendering.

Add the YeastDisplay trait as an alternative to Display: its
yeast_to_string method receives the Ast, allowing NodeRef to resolve to
the captured node's source text instead of printing a numeric ID.

Store the original source bytes in the Ast so that NodeContent::Range
values (from synthesized literal nodes) can be resolved back to text.

Update yeast-macros to emit NodeRef-typed capture bindings and use
Into::<usize>::into where raw IDs are needed. The #{expr} template
syntax now uses YeastDisplay instead of Display.

The effect is visible in the corpus tests: operator nodes now correctly
render as e.g. operator "+" instead of operator "3".

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n in field patterns

Two changes to parse_query_fields:

- Allow `field: (kind)* @cap` (repetition + optional capture) in field
  position, mirroring how it works for bare children.
- When the same field name is declared multiple times in a query (e.g.
  `condition: (foo) condition: (bar)`), merge them into a single
  ordered list of children rather than emitting duplicate field
  entries (which at runtime restart the iterator for the field and
  cause the second declaration to re-match from the first child).
…d mapping

ast_types.yml additions:
- tuple_pattern { element*: pattern } in the pattern supertype.
- sequence_condition { stmt*: stmt, condition: condition } in the
  condition supertype.

swift.rs:
- Map Swift tuple destructuring (e.g. `let (a, b) = pair`) to the new
  tuple_pattern instead of synthesizing an apply_pattern.
- if-let / guard-let: explicitly match the value_binding_pattern
  (the `let` keyword) and bind the source expression as the next
  condition child, so `let` no longer leaks into the output.
The branch was rebased on the grammar changes, but rewriting the history was too difficult, so I'm just updating the test output here.
The output is not so interesting as the mapping removes most nodes from the current test file.

I added a name_expr.swift test so at least one NameExpr makes it through.
@asgerf asgerf added the no-change-note-required This PR does not need a change note label May 13, 2026
@asgerf
Copy link
Copy Markdown
Contributor Author

asgerf commented May 13, 2026

Rerun has been triggered: 2 restarted 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant