Keeps this page in sync as the body changes. Pause it any time for a quieter view.
Path /vision/lc-grammar-as-readable-bnf
Last refresh never
A grammar is data, and data wants a tongue a human can read. BNF (Backus
A grammar is data, and data wants a tongue a human can read. BNF (Backus 1959 · Naur 1960) is the tongue. BMF (2000) added executable action on each rule match. The body holds the engine that walks grammar-as-data already; what was missing is the readable surface and a loader that turns BNF text into the engine's data shape at runtime. With that bridge, a math theorem prover, a CSS file, a .tsx file with embedded JSX-and-CSS-in-JS, a grammar specification, a Python module — all become the same operation: pick the right grammar, stream the source, emit universal Recipes. The kernel walks Recipes; the kernel walks everything.
`lc-parsers-as-recipes` names that grammar rules are first-class data, not code. The body's [`engine.fk`](../../../experiments/form-stdlib/engine.fk) already accepts grammars as the data tuple `(tokens token-config) (rules parse-rules)` and walks any of them. What was missing: a readable surface over that data. Currently the per-tongue grammars at [`experiments/form-stdlib/grammars/`](../../../experiments/form-stdlib/grammars/) are still hand-coded line-based parsers, ~250 lines of imperative `.fk` per language. They duplicate dispatch logic and they aren't authorable by anything but a Form programmer reading carefully.
BMF (Backtracking Model Form, docs/presences/bmf-grammar.md) named the inversion in 2000: a grammar file is BNF augmented with code that fires on match. The grammar of BMF was itself written in BML (`BMF-grammar.bml` in the master-thesis archive) — self-hosting at the surface altitude. This concept names the return of that pattern through the substrate-resident engine the body has built since.
A `.grammar.fk` file in BNF style looks like:
``` tokens { whitespace [32, 9, 10, 13] digit-kind INT string-kind STRING ident-kind IDENT operators ["{" LBRACE, "}" RBRACE, "[" LBRACK, "]" RBRACK, ":" COLON, "," COMMA] keywords ["true", "false", "null"] }
rules { json := value value := object | array | STRING | INT | "true" | "false" | "null" object := "{" pairs? "}" => emit-object($pairs) pairs := pair ("," pair) => emit-list($all) pair := STRING ":" value => emit-pair($1, $3) array := "[" elements? "]" => emit-list($elements) elements:= value ("," value) => emit-list($all) } ```
The `:= ... =>` shape is BNF + action. Each rule reads aloud as a sentence. The `=>` arrow names which substrate emitter walks the captures into a universal Recipe. The emitter library ([`grammar-emitters.fk`](../../../experiments/form-stdlib/grammar-emitters.fk)) holds the generic primitives — `emit-object`, `emit-list`, `emit-pair`, `emit-math`, `emit-function-decl`, `emit-call`, `emit-cond` — keyed to the universal Blueprints from [`universal-shapes.form`](../../coherence-substrate/universal-shapes.form). Per-tongue grammars supply only their tokens block and their rule patterns; the emitters are shared.
[`grammar-bnf.fk`](../../../experiments/form-stdlib/grammar-bnf.fk) reads BNF text and produces the data shape `engine.fk` consumes. The loader is itself a grammar fed through `engine.fk` — the BNF surface syntax is described by a meta-grammar whose tokens are `IDENT|STRING_LIT|":="|"=>"|"|"|""|"?"|"$"` and whose rules describe how a `tokens { ... }` block and a `rules { ... }` block compose into the engine's data tuple. The grammar of grammars is itself loaded the same way every other grammar is.* This is the self-hosting that BMF named in 2000 and the substrate makes practical now.
A `.tsx` file is not one grammar — it is TypeScript, with JSX regions inside `<Tag>...</Tag>`, with CSS regions inside `css\`...\``. A `.md` file is markdown with embedded code blocks in arbitrary languages. The goal names this directly: stream any source part using any recipe.
The shape: each grammar declares its region delimiters alongside its tokens. When the streaming reader encounters a delimiter that opens another grammar, it pushes the current parse state, switches active grammar to the inner one, and consumes until the closing delimiter returns control. The result is one Recipe tree whose subtrees were emitted by different grammars but all share the universal Blueprints — a JSX subtree's `R_FunctionDecl` is the same Blueprint as the surrounding TypeScript's `R_FunctionDecl`. Cross-region equivalence drops out of content-addressing.
This is what makes full language support tractable. A Python parser that handles f-strings is a Python grammar with an f-string region that switches to expression-grammar inside `{...}`. A TypeScript parser with template literals is the same pattern. The complexity each real-world language hides in its lexer (mode stacks, lookahead, context- sensitive tokens) becomes a small composition of single-grammar regions with explicit delimiters.
Read is parse-to-Recipe. Understand is walk-the-Recipe-tree-and- recognize-the-Blueprints. Execute is have the kernel evaluate each Recipe via its Blueprint's evaluator arm. The body already walks Recipes — [`form-engine.form`](../../coherence-substrate/form-engine.form) holds the meta-circular evaluator with 15/15 Python dispatch arms covered. Full execution for Python/TypeScript/Rust/Go reduces to:
1. Grammar coverage — each tongue's `.grammar.fk` covers enough surface to parse real source. (Per-tongue, this is the bulk of the work; each grammar grows shape by shape, validated by parsing real files in the repo.) 2. Recipe coverage — every universal Blueprint the grammars emit has an evaluator arm in the kernel. (When a grammar emits a shape the kernel cannot walk, the kernel grows by exactly one arm.) 3. Native bridge — primitives that escape the substrate (file I/O, network, OS) live in the host kernel; everything above is `.fk`.
The work is per-tongue ripening of the grammar files. The architecture makes it possible; the body does it one breath at a time.
For cells authoring a new grammar:
For cells reading a grammar file:
Listening for voices…
The people, places, works, and concepts the graph shows connected to this one.
Concepts · 8
This concept lives in the body's content-addressed lattice. Two cells with the same Blueprint NodeID share structural identity regardless of name — recognition by coordinate, not vocabulary.