How DUUMBI Compiles JSON-LD to Native Code

Most compilers start with text. DUUMBI starts with a graph. This post walks through the full compilation pipeline — from JSON-LD source files to a running native binary — and explains the engineering decisions behind each stage.

Stage 1: JSON-LD parsing

DUUMBI programs live in .jsonld files inside a .duumbi/graph/ directory. Each file is a standard JSON-LD document with @type annotations from the duumbi: namespace.

{
  "@context": "https://duumbi.dev/ns/core#",
  "@type": "duumbi:Program",
  "duumbi:functions": [{
    "@type": "duumbi:Function",
    "@id": "duumbi:main/main",
    "duumbi:name": "main",
    "duumbi:returnType": "i64",
    "duumbi:blocks": [{
      "@type": "duumbi:Block",
      "@id": "duumbi:main/main/entry",
      "duumbi:label": "entry",
      "duumbi:ops": [
        { "@type": "duumbi:Const", "duumbi:value": 3, "duumbi:valueType": "i64" },
        { "@type": "duumbi:Const", "duumbi:value": 5, "duumbi:valueType": "i64" },
        { "@type": "duumbi:Add" },
        { "@type": "duumbi:Print" },
        { "@type": "duumbi:Return" }
      ]
    }]
  }]
}

Parsing uses serde_json to deserialize into Rust types. The parser validates the JSON-LD structure, resolves @id references, and produces a typed AST where every node has a known @type and validated fields. Invalid ops (E002), missing fields (E003), and duplicate IDs (E005) are caught here.

Why JSON-LD?

JSON-LD brings three properties we exploit:

Explicit typing — every node has @type, so we never need type inference at the syntax level
URI-based identity — @id references are unambiguous, no scope resolution needed
Linked data semantics — the graph is self-describing, which makes AI mutation dramatically simpler

Stage 2: Graph construction

The typed AST feeds into a petgraph::StableGraph<Node, Edge>. We chose StableGraph over DiGraph because node indices must survive mutation — when an AI agent adds or removes nodes, existing references can’t be invalidated.

Each JSON-LD node becomes a graph node with typed metadata:

NodeId("duumbi:main/main/entry/0") → Node { op: Const(3), type: I64 }
NodeId("duumbi:main/main/entry/1") → Node { op: Const(5), type: I64 }
NodeId("duumbi:main/main/entry/2") → Node { op: Add, type: I64 }

Edges represent data flow and control flow:

entry/0 --[DataFlow]--> entry/2 (left operand)
entry/1 --[DataFlow]--> entry/2 (right operand)
entry/2 --[DataFlow]--> entry/3 (print input)

Validation

After graph construction, two validation passes run:

Schema validation — each node is checked against core.schema.json (E009 if invalid)
Type checking — operand types must match (E001), all references must resolve (E004), no cycles in data flow (E007)

For heap types (strings, arrays, structs), a third pass runs:

Ownership validation — the borrow checker verifies single ownership (E020), no use-after-move (E021), borrow exclusivity (E022), and complete cleanup (E024)

All errors are collected (the validator doesn’t short-circuit) and reported as structured JSONL with the node ID, error code, and contextual details.

Stage 3: Cranelift IR lowering

This is where the graph becomes machine code. DUUMBI uses Cranelift — the code generator from the Wasmtime project — as its compilation backend.

Each duumbi:Function in the graph becomes a Cranelift function. The lowering walks the graph in topological order (using petgraph::visit::Topo) and emits Cranelift IR for each op:

// Simplified lowering for Add op
match op {
    Op::Add => {
        let left = get_value(left_node);   // Cranelift Value
        let right = get_value(right_node); // Cranelift Value
        match value_type {
            DuumbiType::I64 => builder.ins().iadd(left, right),
            DuumbiType::F64 => builder.ins().fadd(left, right),
            _ => return Err(CompileError::type_mismatch(...)),
        }
    }
    // ...
}

Key decisions in the Cranelift integration:

One function per duumbi:Function — no inlining at this stage, keeping the mapping between graph and IR transparent
SSA form via FunctionBuilder — Cranelift’s FunctionBuilder handles SSA construction. We declare variables with declare_var and use def_var/use_var for Store/Load ops
Branch handling — duumbi:Branch maps to Cranelift’s brif with two successor blocks. duumbi:Compare maps to icmp/fcmp
Function calls — duumbi:Call becomes a Cranelift call instruction. The callee is looked up by function ID in the module

Heap type compilation

String, array, and struct operations compile to runtime function calls:

duumbi:ConstString("hello") → call duumbi_string_new(ptr, len)
duumbi:StringConcat(a, b)   → call duumbi_string_concat(a, b)
duumbi:ArrayPush(arr, val)  → call duumbi_array_push(arr, val)
duumbi:FieldGet(obj, "x")   → call duumbi_struct_field_get(obj, field_idx)

The runtime functions are defined in duumbi_runtime.c, a small C file that provides heap allocation, string operations, array management, and struct field access. It’s compiled separately and linked in the final stage.

Ownership ops

Ownership ops have varying compilation strategies:

Alloc → calls type-specific allocators (duumbi_string_new, duumbi_array_new, etc.)
Move/Borrow/BorrowMut → zero runtime cost (pointer copies). The safety comes from graph-level validation, not runtime checks
Drop → calls type-specific deallocators (duumbi_string_free, duumbi_array_free, etc.)

This means the ownership model is truly zero-cost at runtime — all the checking happens at compile time (graph validation), and the compiled code only does allocations and deallocations.

Stage 4: Linking

After Cranelift emits the .o object file, DUUMBI links it with the runtime:

cc output.o duumbi_runtime.o -o output -lc

The linker is selected by checking $CC first, then falling back to cc on PATH. The runtime object is built from duumbi_runtime.c which provides:

duumbi_print_i64, duumbi_print_f64, duumbi_print_bool — basic output
duumbi_string_new, duumbi_string_concat, etc. — string operations
duumbi_array_new, duumbi_array_push, etc. — array operations
duumbi_struct_new, duumbi_struct_field_get, etc. — struct operations
duumbi_alloc, type-specific free functions — memory management

The result is a standalone native binary with no runtime dependencies beyond libc.

Performance characteristics

The compilation pipeline is designed for fast iteration:

Parsing is essentially JSON deserialization — microseconds for typical programs
Graph construction is linear in the number of nodes
Validation is a few graph traversals — milliseconds
Cranelift lowering is the most expensive step, but Cranelift is optimized for compile speed (it’s designed for JIT compilation)
Linking is a single cc invocation

For a typical DUUMBI program (10–50 functions, 100–500 ops), the full pipeline from .jsonld to running binary takes under a second.

What makes this different

The key insight is that by starting from a semantic graph instead of text, we eliminate an entire class of problems:

No ambiguity — every node has a single interpretation
No parsing errors — the structure is always valid JSON
Trivial AI mutation — adding a function is adding nodes and edges, not generating syntactically correct text
Structural validation — type errors and ownership violations are graph properties, checkable by traversal

The tradeoff is that humans don’t write JSON-LD by hand. That’s the point — DUUMBI is an AI-first compiler. The AI writes the graph, the compiler validates it, and the developer describes intent.

Try it yourself

DUUMBI_VERSION=v0.4.1-preview
DUUMBI_TARGET=<target>
curl -LO "https://github.com/hgahub/duumbi/releases/download/${DUUMBI_VERSION}/duumbi-${DUUMBI_VERSION}-${DUUMBI_TARGET}.tar.gz"
tar xzf "duumbi-${DUUMBI_VERSION}-${DUUMBI_TARGET}.tar.gz"
export PATH="$PWD/duumbi-${DUUMBI_VERSION}-${DUUMBI_TARGET}:$PATH"
duumbi init hello
cd hello
duumbi add "create a function that computes the absolute value of an integer"
duumbi build && ./output -7

The source code is at github.com/hgahub/duumbi, and the documentation is at docs.duumbi.dev. We welcome contributions and feedback.