The implementation splits into a shared frontend, a shared Core IR, and two independent backends. The split point is Core IR: everything above it is implemented once in OCaml; everything below diverges.
Source
└─ Lexer / Parser (OCaml)
└─ Type Checker (OCaml)
├─ Keyword-first / kind-first
├─ Type inference
├─ Fork detection rewrite
├─ Hook resolution
├─ Definition-site ambiguity check
└─ Z3 dimension checking (OCaml, Z3 bindings)
└─ Program T Expansion (OCaml, fixed-point)
└─ Core Lowering (OCaml)
├─ Treewalk Interpreter (OCaml 5 effects)
└─ Whole-Program Passes (OCaml)
├─ Specialization / monomorphization
├─ Effect-based specialization
├─ Layer 1 — algebraic rewriting
├─ Layer 2 — SOAC fusion
├─ Layer 3 — generator-consumer fusion
└─ Derived DAG construction
└─ ANF / Closure Conversion (OCaml)
└─ C Emission (OCaml → C)
└─ Primitive Kernels (C + inline assembly)
The layout rule must be resolved before the token stream is fed to the Menhir
grammar. A dedicated preprocessor pass runs between the raw lexer and the parser,
inserting virtual {, ;, and } tokens according to the indentation rules in
SYNTAX.md §1.7. This pass is deliberately isolated so it can be tested
independently from the grammar.
Non-trivial cases to cover in the preprocessor test suite before building the parser on top:
iso blocks and their nested indentation'handle blocks with multiple clauses on separate lines( suppressing layout-block openingattr '(' decl* ')' which are not layout-governedSee FRONTEND.md §1 for the full layout rule specification.
Builds on the preprocessor token stream. Produces an untyped Surface.Ast
from source text. Handles Unicode/ASCII duality, attribute parsing, and the
iso_def block form. No semantic analysis.
Two grammar features need extra care before writing Menhir rules:
f g h = h(g(f)). Juxtaposition
associates left but the rightmost element is the function, which is the
reverse of standard ML-style application. Left-recursive expression rules in
Menhir need adjustment for this call-direction convention.gen, from, iso) — these appear inside
expression position and are annotated differently in the AST from the
input-type dispatch hooks. The grammar must distinguish them syntactically so
the type checker receives the right hook_head variant.See FRONTEND.md §1 for full design.
The type checker is the most complex single piece of the frontend. Its architecture must be designed before implementation begins because several decisions have wide-ranging consequences:
This phase produces an architecture document (not code). It runs in parallel with Phase 1b and should be complete before Phase 3 begins.
A pipeline of sub-passes: keyword-first/kind-first disambiguation, bidirectional type inference, fork detection rewrite, hook resolution with specificity ordering and import-boundary finalization, definition-site ambiguity checking, and Z3 dimension checking batched per definition.
See FRONTEND.md §2 for full design.
Evaluates compile-time splices in a three-pass fixed-point loop. Hooks are resolved at quote time; the splice produces typed AST; no re-resolution at splice time. A cycle is a compile error.
See FRONTEND.md §3 for full design.
Desugars the typed AST to Core IR — no sugar, no implicit coercions, all dispatch resolved to direct references. Core IR is the QCheck boundary between the frontend and both backends.
Start with simple fragments — plain lambdas, let bindings, basic case — and
wire them end-to-end through the pipeline (parser → TC → lowering → treewalk)
as early as possible. Validating the round-trip on a small fragment before the
full hook surface is implemented catches architectural mismatches cheaply.
Expand incrementally to the full hook surface: project/assign sugar, effect
handle/perform elaboration, iso desugaring to two from hooks.
See FRONTEND.md §4 and CORE.md for full design.
Interprets Core IR directly using OCaml 5’s native effect system. Naively correct, zero optimization. The QCheck oracle for all compiler passes.
See BACKEND.md §1 for full design.
Specialization / monomorphization, effect-based specialization, Layer 1–3 fusion, derived DAG construction, and kernel selection. All run over Core IR before C emission.
See BACKEND.md §2 for full design.
ANF puts Core into administrative normal form; closure conversion converts lambdas to explicit closure records. Both are required before C emission.
See BACKEND.md §4 for full design.
Emits C for control flow, closures, and effect handlers. Hot array primitives are not emitted as C loops — kernel selection (Phase 6) has already matched fused sequences to hand-written kernels; the emitted C calls into them directly.
See BACKEND.md §5 for full design.
Hand-written, SIMD-optimized kernels for the most frequent fused operation patterns. The ground truth for performance.
See BACKEND.md §6 for full design.
Generates well-typed Core IR programs and checks that treewalk and compiler agree on all outputs. The correctness guarantee for every backend optimization.
The Core IR generator is the first deliverable: a QCheck Gen that produces
arbitrary well-typed Core IR programs. This is independent of the parser and can
be built as soon as the treewalk is stable. It provides much broader coverage of
the evaluator than hand-written tests and is a prerequisite for the
treewalk-vs-compiler differential testing that validates every subsequent
optimization pass.
See BACKEND.md §7 for full design.
The REPL plugs in after Core Lowering and runs on the treewalk interpreter. Each expression is parsed and type-checked in the context of the accumulated session state, lowered to Core IR, and interpreted directly. No fusion, no kernel selection, no C emission. This gives a correct, low-latency REPL for free once the treewalk exists. Large array computations will be slow — the performance gap relative to the compiled backend is expected and acceptable for interactive use.
The REPL requires incremental type checker state: a persistent typing environment that accumulates definitions, hooks, and trait implementations across expressions without re-checking prior entries from scratch. Each REPL expression extends the environment; bindings introduced at the prompt are visible to subsequent expressions. This is the same requirement as an LSP server doing incremental checking on keystroke, so the incremental state machinery is shared between both.
The incremental type checker state is worth designing carefully from the start:
A later REPL mode can compile expressions through the full pipeline and load the resulting code natively, closing the performance gap for large array workloads. The incremental type checker state is reused unchanged for this mode.
'scan horizontal fusion (currently deferred in the spec).Build bottom-up: Core IR is the QCheck boundary between the treewalk and compiler backends, and the type checker’s primary job is producing it. Designing Core first prevents frontend/backend mismatches and gives an early correctness oracle.
Gen for arbitrary well-typed Core IR
programs. Independent of the parser; can be built now. Exercises the treewalk
far beyond what hand-written tests can reach and is the prerequisite for
treewalk-vs-compiler differential testing.{, ;, } tokens. Test independently before building the
grammar on top. Cover the non-trivial cases: iso blocks, 'handle clauses,
same-line ( suppression, attribute-scope blocks.Surface.Ast. Builds
on the preprocessor token stream. Take care with juxtaposition call-direction
(rightmost element is the function) and result-type dispatch hooks (gen,
from, iso in expression position).