Vector Labs

Arra — Roadmap

Architecture Overview

The implementation splits into a shared frontend, a shared Core IR, and two independent backends. The split point is Core IR: everything above it is implemented once in OCaml; everything below diverges.

Source
  └─ Lexer / Parser                           (OCaml)
       └─ Type Checker                        (OCaml)
            ├─ Keyword-first / kind-first
            ├─ Type inference
            ├─ Fork detection rewrite
            ├─ Hook resolution
            ├─ Definition-site ambiguity check
            └─ Z3 dimension checking          (OCaml, Z3 bindings)
                 └─ Program T Expansion       (OCaml, fixed-point)
                      └─ Core Lowering        (OCaml)
                           ├─ Treewalk Interpreter         (OCaml 5 effects)
                           └─ Whole-Program Passes         (OCaml)
                                ├─ Specialization / monomorphization
                                ├─ Effect-based specialization
                                ├─ Layer 1 — algebraic rewriting
                                ├─ Layer 2 — SOAC fusion
                                ├─ Layer 3 — generator-consumer fusion
                                └─ Derived DAG construction
                                     └─ ANF / Closure Conversion    (OCaml)
                                          └─ C Emission              (OCaml → C)
                                               └─ Primitive Kernels  (C + inline assembly)

Phase 1a — Layout Preprocessor

The layout rule must be resolved before the token stream is fed to the Menhir grammar. A dedicated preprocessor pass runs between the raw lexer and the parser, inserting virtual {, ;, and } tokens according to the indentation rules in SYNTAX.md §1.7. This pass is deliberately isolated so it can be tested independently from the grammar.

Non-trivial cases to cover in the preprocessor test suite before building the parser on top:

See FRONTEND.md §1 for the full layout rule specification.


Phase 1b — Lexer / Menhir Grammar

Builds on the preprocessor token stream. Produces an untyped Surface.Ast from source text. Handles Unicode/ASCII duality, attribute parsing, and the iso_def block form. No semantic analysis.

Two grammar features need extra care before writing Menhir rules:

See FRONTEND.md §1 for full design.


Phase 2 — Type Checker Architecture Sketch

The type checker is the most complex single piece of the frontend. Its architecture must be designed before implementation begins because several decisions have wide-ranging consequences:

This phase produces an architecture document (not code). It runs in parallel with Phase 1b and should be complete before Phase 3 begins.


Phase 3 — Type Checker

A pipeline of sub-passes: keyword-first/kind-first disambiguation, bidirectional type inference, fork detection rewrite, hook resolution with specificity ordering and import-boundary finalization, definition-site ambiguity checking, and Z3 dimension checking batched per definition.

See FRONTEND.md §2 for full design.


Phase 4 — Program T Expansion

Evaluates compile-time splices in a three-pass fixed-point loop. Hooks are resolved at quote time; the splice produces typed AST; no re-resolution at splice time. A cycle is a compile error.

See FRONTEND.md §3 for full design.


Phase 5 — Core Lowering

Desugars the typed AST to Core IR — no sugar, no implicit coercions, all dispatch resolved to direct references. Core IR is the QCheck boundary between the frontend and both backends.

Start with simple fragments — plain lambdas, let bindings, basic case — and wire them end-to-end through the pipeline (parser → TC → lowering → treewalk) as early as possible. Validating the round-trip on a small fragment before the full hook surface is implemented catches architectural mismatches cheaply. Expand incrementally to the full hook surface: project/assign sugar, effect handle/perform elaboration, iso desugaring to two from hooks.

See FRONTEND.md §4 and CORE.md for full design.


Phase 6 — Treewalk Backend

Interprets Core IR directly using OCaml 5’s native effect system. Naively correct, zero optimization. The QCheck oracle for all compiler passes.

See BACKEND.md §1 for full design.


Phase 7 — Whole-Program Passes (Compiler only)

Specialization / monomorphization, effect-based specialization, Layer 1–3 fusion, derived DAG construction, and kernel selection. All run over Core IR before C emission.

See BACKEND.md §2 for full design.


Phase 8 — ANF / Closure Conversion (Compiler only)

ANF puts Core into administrative normal form; closure conversion converts lambdas to explicit closure records. Both are required before C emission.

See BACKEND.md §4 for full design.


Phase 9 — C Emission (Compiler only)

Emits C for control flow, closures, and effect handlers. Hot array primitives are not emitted as C loops — kernel selection (Phase 6) has already matched fused sequences to hand-written kernels; the emitted C calls into them directly.

See BACKEND.md §5 for full design.


Primitive Kernel Library

Hand-written, SIMD-optimized kernels for the most frequent fused operation patterns. The ground truth for performance.

See BACKEND.md §6 for full design.


QCheck Harness

Generates well-typed Core IR programs and checks that treewalk and compiler agree on all outputs. The correctness guarantee for every backend optimization.

The Core IR generator is the first deliverable: a QCheck Gen that produces arbitrary well-typed Core IR programs. This is independent of the parser and can be built as soon as the treewalk is stable. It provides much broader coverage of the evaluator than hand-written tests and is a prerequisite for the treewalk-vs-compiler differential testing that validates every subsequent optimization pass.

See BACKEND.md §7 for full design.


REPL

The REPL plugs in after Core Lowering and runs on the treewalk interpreter. Each expression is parsed and type-checked in the context of the accumulated session state, lowered to Core IR, and interpreted directly. No fusion, no kernel selection, no C emission. This gives a correct, low-latency REPL for free once the treewalk exists. Large array computations will be slow — the performance gap relative to the compiled backend is expected and acceptable for interactive use.

The REPL requires incremental type checker state: a persistent typing environment that accumulates definitions, hooks, and trait implementations across expressions without re-checking prior entries from scratch. Each REPL expression extends the environment; bindings introduced at the prompt are visible to subsequent expressions. This is the same requirement as an LSP server doing incremental checking on keystroke, so the incremental state machinery is shared between both.

The incremental type checker state is worth designing carefully from the start:

A later REPL mode can compile expressions through the full pipeline and load the resulting code natively, closing the performance gap for large array workloads. The incremental type checker state is reused unchanged for this mode.


Later


Implementation Order

Build bottom-up: Core IR is the QCheck boundary between the treewalk and compiler backends, and the type checker’s primary job is producing it. Designing Core first prevents frontend/backend mismatches and gives an early correctness oracle.