How DUUMBI Compiles JSON-LD to Native Code
Most compilers start with text. DUUMBI starts with a graph. This post walks through the full compilation pipeline — from JSON-LD source files to a running native binary — and explains the engineering decisions behind each stage.
Stage 1: JSON-LD parsing
DUUMBI programs live in .jsonld files inside a .duumbi/graph/ directory. Each file is a standard JSON-LD document with @type annotations from the duumbi: namespace.
{
"@context": "https://duumbi.dev/ns/core#",
"@type": "duumbi:Program",
"duumbi:functions": [{
"@type": "duumbi:Function",
"@id": "duumbi:main/main",
"duumbi:name": "main",
"duumbi:returnType": "i64",
"duumbi:blocks": [{
"@type": "duumbi:Block",
"@id": "duumbi:main/main/entry",
"duumbi:label": "entry",
"duumbi:ops": [
{ "@type": "duumbi:Const", "duumbi:value": 3, "duumbi:valueType": "i64" },
{ "@type": "duumbi:Const", "duumbi:value": 5, "duumbi:valueType": "i64" },
{ "@type": "duumbi:Add" },
{ "@type": "duumbi:Print" },
{ "@type": "duumbi:Return" }
]
}]
}]
}
Parsing uses serde_json to deserialize into Rust types. The parser validates the JSON-LD structure, resolves @id references, and produces a typed AST where every node has a known @type and validated fields. Invalid ops (E002), missing fields (E003), and duplicate IDs (E005) are caught here.
Why JSON-LD?
JSON-LD brings three properties we exploit:
- Explicit typing — every node has
@type, so we never need type inference at the syntax level - URI-based identity —
@idreferences are unambiguous, no scope resolution needed - Linked data semantics — the graph is self-describing, which makes AI mutation dramatically simpler
Stage 2: Graph construction
The typed AST feeds into a petgraph::StableGraph<Node, Edge>. We chose StableGraph over DiGraph because node indices must survive mutation — when an AI agent adds or removes nodes, existing references can’t be invalidated.
Each JSON-LD node becomes a graph node with typed metadata:
NodeId("duumbi:main/main/entry/0") → Node { op: Const(3), type: I64 }
NodeId("duumbi:main/main/entry/1") → Node { op: Const(5), type: I64 }
NodeId("duumbi:main/main/entry/2") → Node { op: Add, type: I64 }
Edges represent data flow and control flow:
entry/0 --[DataFlow]--> entry/2 (left operand)
entry/1 --[DataFlow]--> entry/2 (right operand)
entry/2 --[DataFlow]--> entry/3 (print input)
Validation
After graph construction, two validation passes run:
- Schema validation — each node is checked against
core.schema.json(E009 if invalid) - Type checking — operand types must match (E001), all references must resolve (E004), no cycles in data flow (E007)
For heap types (strings, arrays, structs), a third pass runs:
- Ownership validation — the borrow checker verifies single ownership (E020), no use-after-move (E021), borrow exclusivity (E022), and complete cleanup (E024)
All errors are collected (the validator doesn’t short-circuit) and reported as structured JSONL with the node ID, error code, and contextual details.
Stage 3: Cranelift IR lowering
This is where the graph becomes machine code. DUUMBI uses Cranelift — the code generator from the Wasmtime project — as its compilation backend.
Each duumbi:Function in the graph becomes a Cranelift function. The lowering walks the graph in topological order (using petgraph::visit::Topo) and emits Cranelift IR for each op:
// Simplified lowering for Add op
match op {
Op::Add => {
let left = get_value(left_node); // Cranelift Value
let right = get_value(right_node); // Cranelift Value
match value_type {
DuumbiType::I64 => builder.ins().iadd(left, right),
DuumbiType::F64 => builder.ins().fadd(left, right),
_ => return Err(CompileError::type_mismatch(...)),
}
}
// ...
}
Key decisions in the Cranelift integration:
- One function per
duumbi:Function— no inlining at this stage, keeping the mapping between graph and IR transparent - SSA form via FunctionBuilder — Cranelift’s
FunctionBuilderhandles SSA construction. We declare variables withdeclare_varand usedef_var/use_varfor Store/Load ops - Branch handling —
duumbi:Branchmaps to Cranelift’sbrifwith two successor blocks.duumbi:Comparemaps toicmp/fcmp - Function calls —
duumbi:Callbecomes a Craneliftcallinstruction. The callee is looked up by function ID in the module
Heap type compilation
String, array, and struct operations compile to runtime function calls:
duumbi:ConstString("hello") → call duumbi_string_new(ptr, len)
duumbi:StringConcat(a, b) → call duumbi_string_concat(a, b)
duumbi:ArrayPush(arr, val) → call duumbi_array_push(arr, val)
duumbi:FieldGet(obj, "x") → call duumbi_struct_field_get(obj, field_idx)
The runtime functions are defined in duumbi_runtime.c, a small C file that provides heap allocation, string operations, array management, and struct field access. It’s compiled separately and linked in the final stage.
Ownership ops
Ownership ops have varying compilation strategies:
- Alloc → calls type-specific allocators (
duumbi_string_new,duumbi_array_new, etc.) - Move/Borrow/BorrowMut → zero runtime cost (pointer copies). The safety comes from graph-level validation, not runtime checks
- Drop → calls type-specific deallocators (
duumbi_string_free,duumbi_array_free, etc.)
This means the ownership model is truly zero-cost at runtime — all the checking happens at compile time (graph validation), and the compiled code only does allocations and deallocations.
Stage 4: Linking
After Cranelift emits the .o object file, DUUMBI links it with the runtime:
cc output.o duumbi_runtime.o -o output -lc
The linker is selected by checking $CC first, then falling back to cc on PATH. The runtime object is built from duumbi_runtime.c which provides:
duumbi_print_i64,duumbi_print_f64,duumbi_print_bool— basic outputduumbi_string_new,duumbi_string_concat, etc. — string operationsduumbi_array_new,duumbi_array_push, etc. — array operationsduumbi_struct_new,duumbi_struct_field_get, etc. — struct operationsduumbi_alloc, type-specific free functions — memory management
The result is a standalone native binary with no runtime dependencies beyond libc.
Performance characteristics
The compilation pipeline is designed for fast iteration:
- Parsing is essentially JSON deserialization — microseconds for typical programs
- Graph construction is linear in the number of nodes
- Validation is a few graph traversals — milliseconds
- Cranelift lowering is the most expensive step, but Cranelift is optimized for compile speed (it’s designed for JIT compilation)
- Linking is a single
ccinvocation
For a typical DUUMBI program (10–50 functions, 100–500 ops), the full pipeline from .jsonld to running binary takes under a second.
What makes this different
The key insight is that by starting from a semantic graph instead of text, we eliminate an entire class of problems:
- No ambiguity — every node has a single interpretation
- No parsing errors — the structure is always valid JSON
- Trivial AI mutation — adding a function is adding nodes and edges, not generating syntactically correct text
- Structural validation — type errors and ownership violations are graph properties, checkable by traversal
The tradeoff is that humans don’t write JSON-LD by hand. That’s the point — DUUMBI is an AI-first compiler. The AI writes the graph, the compiler validates it, and the developer describes intent.
Try it yourself
cargo install duumbi
duumbi init hello
cd hello
duumbi add "create a function that computes the absolute value of an integer"
duumbi build && ./output -7
The source code is at github.com/hgahub/duumbi, and the documentation is at docs.duumbi.dev. We welcome contributions and feedback.