Files
el/spec/language.md
T
Will Anderson 48b72843e1 feat: package manager, build system, native cross-compilation, plugin system
Add three new crates and extend the compiler and CLI toolchain:

- el-manifest: el.toml manifest parser using serde + toml crate; supports
  package info, registry/path/version deps, build config with seal key
  sources, cross targets, and plugins; Manifest::find_manifest() walks up
  the directory tree

- el-registry: HTTP registry client (reqwest + tokio) for
  packages.neurontechnologies.ai; PackageMetadata, fetch/download/publish/
  search, BLAKE3 checksum verification, local cache at ~/.engram/packages/

- el-build: build orchestrator with incremental builds (BLAKE3 file hashes
  in .el/build-cache.json), cross-compilation target tagging, dep resolution,
  plugin registry with on_ast/on_typed_ast/on_bytecode hooks, test runner,
  fmt/check/clean commands

- CrossTarget and NativeTarget enums with triple() and artifact_extension()
  methods; NativeTarget::Host detects compile-time platform via cfg! macros

- Plugin system: CompilerPlugin trait + PluginRegistry; dynamic loading is
  a marked TODO with clear extension point for libloading

- CLI extended with: new, add, remove, update, build --cross, run, test,
  check, fmt, clean, publish, search, plugin add/remove/list; old
  single-file commands moved to build-file/seal/unseal subcommands

- Fix pre-existing debugger.rs borrow error (unwrap_or temporary lifetime)
- Fix checker.rs and codegen.rs to handle TestDef/Seed/Assert Stmt variants
- Add spec/language.md sections 12-14: package system, build system,
  plugin system, cross-compilation targets table

130 tests passing, zero warnings
2026-04-27 19:08:25 -05:00

25 KiB

Engram Language Specification

Version 0.1.0 — April 2026


Overview

Engram is a statically-typed programming language designed from first principles around a knowledge graph type system. Its three defining properties:

  1. Types are Engram nodes. Every named type in the language is a node in a knowledge graph. Type compatibility is not purely structural — it is also semantic. Two types are compatible if their Engram node embeddings are similar enough in meaning-space.

  2. Autocomplete is spreading activation. The language server (LSP) uses spreading activation over the type graph to suggest completions. You get concepts semantically related to what you're building, not just methods on the current type.

  3. The prod compilation target is quantum-sealed. Bytecode compiled with --target prod is encrypted with AES-256-GCM and signed. Without the deployment key, the artifact is indistinguishable from random bytes. No static analysis tool can decompile it.


1. Syntax Reference

1.1 Comments

// Single-line comment — extends to end of line

Block comments are not supported in v0.1. Use // on each line.

1.2 Variable Declarations

let name: Type = expression
let name = expression    // type inferred

Variables are immutable by default. All bindings are block-scoped.

1.3 Functions

fn name(param1: Type1, param2: Type2) -> ReturnType {
    // body
    return expression
}

Functions are first-class values. The type of a function is:

fn(Type1, Type2) -> ReturnType

1.4 Types (Structs)

type TypeName {
    field1: Type1
    field2: Type2
    // ...
}

Every type definition registers a node in the Engram type graph. The type name becomes searchable via spreading activation.

1.5 Enums

enum EnumName {
    Variant1
    Variant2
    VariantWithPayload(PayloadType)
}

Enum variants without parentheses carry no payload. Variants with parentheses carry exactly one value of the given type.

1.6 Pattern Matching

match expression {
    Pattern1 => result_expr1
    Pattern2 => result_expr2
    // ...
}

Pattern forms:

  • EnumName::Variant — unit enum variant
  • EnumName::Variant(binding) — enum variant with payload binding
  • literal — exact literal (42, "str", true)
  • name — binding (captures the subject into name)
  • _ — wildcard (always matches, discards)

All arms must produce the same type. The match expression evaluates to that type.

1.7 Control Flow

If/else:

if condition {
    // then branch
} else {
    // else branch
}

Both branches must produce the same type. The else branch is optional (produces Void).

For loops:

for item in collection {
    // body
}

1.8 Field Access

value.field_name

Field access is type-checked at compile time. Accessing a field that does not exist in the type definition is a compile error.

1.9 Array Literals

let numbers: [Int] = [1, 2, 3]
let empty: [String] = []

1.10 Index Access

let first: Int = numbers[0]

Index expressions require an Int index. Bounds checking is runtime behavior.


2. Type System

2.1 Primitive Types

Type Description Example literal
Int 64-bit signed integer 42, -7, 1_000
Float 64-bit IEEE 754 double 3.14, 0.5
String UTF-8 string "hello"
Bool Boolean true, false
Uuid RFC 4122 UUID (runtime only)
Void Unit type; no value

2.2 Composite Types

Type form Description
[T] Array of T
T? Optional T (may be absent)
Named User-defined struct or enum

2.3 Numeric Coercions

  • Int is implicitly coercible to Float.
  • Float is not coercible to Int (use explicit conversion when available).
  • String + String uses concatenation (the + operator is overloaded).

2.4 Structural vs. Semantic Compatibility

Standard structural compatibility:

  • Named("User") is compatible with Named("User") (same name).
  • [Int] is compatible with [Int].
  • T is compatible with T? (non-optional can be used as optional).
  • Int is compatible with Float (widening).

Semantic compatibility (novel):

When two named types have registered Engram node type mappings that refer to the same node class, they are considered semantically compatible:

// Register User and Customer as both mapping to the "Entity" Engram node type
// → User and Customer are semantically compatible

This is computed via cosine similarity over node embeddings when an Engram database is connected. Without a database, the comparison is symbolic (same node type string = compatible).

Semantic compatibility threshold: cosine similarity ≥ 0.85 (configurable).

2.5 Type Inference

The compiler infers types for let bindings without annotations:

let x = 42       // inferred: Int
let s = "hello"  // inferred: String
let b = true     // inferred: Bool

Function return types and parameter types must always be annotated. This is intentional: function signatures are documentation.


3. The activate Construct

3.1 Syntax

activate TypeName where "semantic query string"

3.2 What It Does

activate is a first-class language construct that performs a spreading activation query over the Engram knowledge graph and returns a typed array of results.

The query string is a natural language description. At runtime, the Engram runtime:

  1. Embeds the query string into the same vector space as node embeddings.
  2. Starts activation at the TypeName node and all nodes semantically related to it (cosine similarity above threshold).
  3. Spreads activation outward through graph edges, attenuated by edge weight and node salience.
  4. Returns all nodes whose activation level exceeds the minimum threshold, projected back to the TypeName schema.

3.3 Static Typing

The result type of activate TypeName where "..." is always [TypeName].

let users: [User] = activate User where "recent premium subscribers"
//  ↑ type-checked: [User]

If TypeName is not a registered type, the compiler emits an error. The query string is opaque to the type checker — it only passes the string through to the runtime.

3.4 Without an Engram Database

When compiled without an Engram database (CompilerOptions::engram_db_path is None), the activate construct emits a stub instruction. At runtime, the interpreter emits a diagnostic and returns an empty array [].

This allows programs using activate to compile and run in development without requiring a live Engram instance.

3.5 Compile-time Behavior

At --target prod, activate is compiled to an ACTIVATE bytecode instruction. The query string is embedded in the bytecode. The sealed artifact protects the query from being read by static analysis.


4. Sealed Blocks

4.1 Syntax

sealed {
    let api_key: String = env("API_KEY")
    // ...
}

4.2 What's Protected

Code inside a sealed {} block is subject to additional runtime protection:

  • In debug builds: The SEALED_BEGIN / SEALED_END bytecode markers are emitted. The debugger is notified not to expose values in this region.
  • In release builds: Same as debug, with no source map entries for the sealed region.
  • In prod builds: The entire bytecode (including the sealed section) is AES-256-GCM encrypted in the sealed artifact. There is no additional treatment of the sealed section — the entire prod artifact is the sealed section.

4.3 Intent

The sealed {} block communicates developer intent: "this section handles sensitive material." It is especially meaningful during development when debug builds are used, since it signals to the runtime and any attached debugger to redact values from inspection.

In prod builds, the sealed {} annotation is redundant (the whole artifact is sealed), but it is preserved for documentation and future tooling that can enforce stricter runtime isolation.


5. Compilation Targets

5.1 debug

el build file.el --target debug

Produces:

  • file.elc — JSON-serialized bytecode instructions
  • file.map.json — source map: JSON array of {instruction, start, end, line, col} objects

The source map allows debuggers and error reporters to translate bytecode offsets back to exact source positions (file + line + column).

Debug builds:

  • No dead-code elimination
  • No constant folding
  • Full source map coverage
  • Type errors are reported as warnings (compilation continues)

5.2 release

el build file.el --target release

Produces:

  • file.elc — JSON-serialized bytecode instructions

Release builds:

  • No source map
  • Minor dead-code pruning (unreachable after return)
  • Type errors are warnings (compilation continues)

5.3 prod

el build file.el --target prod
ENGRAM_SEAL_KEY=my-secret el build file.el --target prod

Produces:

  • file.sealed — quantum-sealed artifact

Prod builds:

  • Type errors are fatal — the compiler refuses to produce a sealed artifact from a program with type errors
  • The output is encrypted and cannot be decompiled
  • All debug information is stripped before sealing
  • Source maps are never produced

6. The Sealed Artifact Format

6.1 Wire Format

Offset  Size    Field
──────  ──────  ────────────────────────────────────────────
0       8       Magic: b"ENGRAM01"
8       2       Format version: u16 big-endian (currently 1)
10      *       JSON body: SealedArtifact struct

The JSON body is a SealedArtifact:

{
  "algorithm_id": "aes256gcm-v1",
  "signature":         "...(base64)...",
  "encapsulated_key":  "...(base64)...",
  "nonce":             "...(base64)...",
  "ciphertext":        "...(base64)...",
  "deployment_fingerprint": "...(base64 or null)..."
}

6.2 Field Descriptions

Field Description
algorithm_id The encryption algorithm. Currently aes256gcm-v1. Reserved for ML-KEM upgrade.
signature BLAKE3 keyed MAC over (algorithm_id ‖ nonce ‖ ciphertext). Detects tampering before decryption attempt.
encapsulated_key 32 bytes: symmetric_key XOR BLAKE3(deployment_binding_material). Requires knowledge of the deployment secret to recover the symmetric key.
nonce 12-byte (96-bit) AES-GCM nonce. Randomly generated per seal operation.
ciphertext AES-256-GCM ciphertext of the bytecode, including the 128-bit GCM authentication tag.
deployment_fingerprint BLAKE3 hash of the deployment binding material. Stored so the unsealer can verify it is running in the correct environment before attempting decryption. null for DeploymentBinding::None.

6.3 Sealing Process

  1. Generate a cryptographically random 256-bit symmetric key K.
  2. Encrypt bytecode: ciphertext = AES-256-GCM(K, nonce=random_96bit, plaintext=bytecode).
  3. Derive the deployment binding hash: H = BLAKE3(deployment_material).
  4. Encapsulate: encapsulated_key = K XOR H (32 bytes).
  5. Compute MAC: signature = BLAKE3-keyed(K, algorithm_id ‖ nonce ‖ ciphertext).
  6. Serialize: ENGRAM01 ‖ version_u16be ‖ JSON(artifact).

6.4 Unsealing Process

  1. Parse magic and version; reject if not ENGRAM01 / version 1.
  2. Derive deployment hash: H = BLAKE3(provided_binding_key).
  3. Verify fingerprint: if deployment_fingerprint is present, assert BLAKE3(binding_key) == fingerprint. Fail with BindingMismatch if not.
  4. Recover symmetric key: K = encapsulated_key XOR H.
  5. Verify MAC: compute BLAKE3-keyed(K, ...) and compare to signature. Fail with SignatureInvalid if mismatch.
  6. Decrypt: bytecode = AES-256-GCM-Decrypt(K, nonce, ciphertext). The GCM auth tag is verified here automatically.

6.5 Security Properties

Why "quantum-sealed":

AES-256 is quantum-resistant at the 256-bit key length. Grover's algorithm provides a quadratic speedup in key search, reducing effective security from 2^256 to 2^128. 128-bit quantum security is considered sufficient by NIST for the foreseeable future.

The algorithm_id field is forward-compatible: when ml-kem (CRYSTALS-Kyber ML-KEM-768 or ML-KEM-1024) crates stabilize, the upgrade is:

  1. Implement SealAlgorithm::MlKem768 in el-seal.
  2. The encapsulated_key field becomes the KEM-encapsulated ciphertext.
  3. Old artifacts retain their aes256gcm-v1 algorithm_id and continue to unseal via the existing code path.

Decompilation resistance:

Without the deployment key, K cannot be recovered (requires knowing deployment_material), so ciphertext is indistinguishable from random bytes. Static analysis tools, disassemblers, and decompilers receive the AES-GCM ciphertext — semantically empty. Any tampering flips bits in the GCM ciphertext, causing authentication tag verification to fail before the symmetric layer is even reached.


7. Deployment Binding Modes

Mode Description Security
EnvironmentKey(var) Derives binding from the value of an environment variable. Default: ENGRAM_SEAL_KEY. High — key must be provisioned at runtime
MachineFingerprint Derives binding from hostname + OS + architecture. Artifact can only run on the same machine. Medium — fingerprint is observable
None No binding (zero vector). Testing and development only. None

8. Operators

Operator Types Result
+ Int, Float, String same as operands (String: concatenation)
- Int, Float same
* Int, Float same
/ Int, Float same
== any compatible pair Bool
!= any compatible pair Bool
< > <= >= Int, Float Bool
&& Bool, Bool Bool
|| Bool, Bool Bool
! Bool Bool

Operator precedence (high to low):

  1. ! (unary)
  2. * /
  3. + -
  4. < > <= >=
  5. == !=
  6. &&
  7. ||

9. Escape Sequences in String Literals

Sequence Character
\n Newline
\t Tab
\r Carriage return
\" Double quote
\\ Backslash
\0 Null byte

10. CLI Reference

el build <file.el> [--target debug|release|prod] [-o <output>]
el run   <file.el>
el check <file.el>
el seal  <artifact> [-o <output>]
el unseal <artifact> [-o <output>]

el build — Compile a source file. Default target is debug.

el run — Compile with debug target and execute immediately in the built-in interpreter. Does not write an output file.

el check — Type-check only. Exits with code 0 if no errors, 1 if errors. Useful for CI.

el seal — Take an existing release artifact and seal it. Reads ENGRAM_SEAL_KEY from the environment if set.

el unseal — Decrypt a sealed artifact. Reads ENGRAM_SEAL_KEY from the environment. Writes decrypted bytecode to the output path.


11. Grammar (EBNF)

program    = stmt* EOF

stmt       = let_stmt
           | return_stmt
           | fn_def
           | type_def
           | enum_def
           | expr_stmt

let_stmt   = "let" IDENT (":" type_expr)? "=" expr ";"?
return_stmt = "return" expr ";"?
expr_stmt  = expr ";"?

fn_def     = "fn" IDENT "(" param_list ")" "->" type_expr "{" stmt* "}"
type_def   = "type" IDENT "{" (IDENT ":" type_expr ","? ";"?)* "}"
enum_def   = "enum" IDENT "{" variant* "}"
variant    = IDENT ("(" type_expr ")")? ","?

param_list = (param ("," param)*)?
param      = IDENT ":" type_expr

type_expr  = IDENT
           | "[" type_expr "]"
           | type_expr "?"
           | "fn" "(" (type_expr ("," type_expr)*)? ")" "->" type_expr

expr       = or_expr
or_expr    = and_expr ("||" and_expr)*
and_expr   = eq_expr ("&&" eq_expr)*
eq_expr    = cmp_expr (("==" | "!=") cmp_expr)*
cmp_expr   = add_expr (("<" | ">" | "<=" | ">=") add_expr)*
add_expr   = mul_expr (("+" | "-") mul_expr)*
mul_expr   = unary_expr (("*" | "/") unary_expr)*
unary_expr = "!" unary_expr | postfix_expr
postfix_expr = primary ("." IDENT | "(" arg_list ")" | "[" expr "]")*

primary    = INT | FLOAT | STRING | BOOL
           | "(" expr ")"
           | "[" arg_list "]"
           | "{" stmt* "}"
           | "if" expr primary ("else" primary)?
           | "match" expr "{" match_arm* "}"
           | "activate" IDENT "where" STRING
           | "sealed" "{" stmt* "}"
           | IDENT ("::" IDENT)*

arg_list   = (expr ("," expr)*)?
match_arm  = pattern "=>" expr ","?

pattern    = "_"
           | IDENT "::" IDENT ("(" IDENT ")")?
           | INT | STRING | BOOL
           | IDENT

12. Package System

12.1 Project Manifest — el.toml

Every Engram project has an el.toml at its root. The manifest is parsed by the el-manifest crate.

[package]
name = "my-service"
version = "0.1.0"
description = "What this does"
authors = ["Will Anderson <will@neurontechnologies.ai>"]
license = "MIT"
edition = "2026"

[dependencies]
engram-http = "1.2"
engram-auth = "0.8.1"
some-local = { path = "../some-local" }

[dev-dependencies]
el-test = "0.1"

[build]
target = "prod"              # debug | release | prod (default: debug)
entry = "src/main.el"        # main entry point (default: src/main.el)
output = "dist/"             # output directory (default: dist/)
seal_key = "env:ENGRAM_SEAL_KEY"   # key source for prod sealed artifacts

[cross]
targets = ["x86_64-linux", "aarch64-linux", "x86_64-macos", "aarch64-macos", "wasm32"]

[plugins]
el-fmt = "1.0"               # code formatter plugin
el-doc = "0.3"               # documentation generator

Dependency specifiers

Form Example Meaning
String "1.2" Version requirement from default registry
Path table { path = "../lib" } Local path dependency
Registry table { version = "1.0", registry = "https://..." } Private registry

Seal key sources

Form Example Meaning
env:VAR env:ENGRAM_SEAL_KEY Read from environment variable at build time
file:path file:/etc/engram/key.bin Read raw bytes from a file
Literal my-secret-key Inline key (development only)

12.2 Dependency Resolution

Dependencies are resolved by the el-registry crate, which talks to the registry at https://packages.neurontechnologies.ai.

Resolution algorithm:

  1. For each [dependencies] entry, fetch all available versions from the registry.
  2. Pick the highest version satisfying the version requirement (semver).
  3. Download the tarball and verify the BLAKE3 checksum.
  4. Cache in ~/.engram/packages/{name}/{version}/.
  5. Path dependencies bypass the registry entirely.

12.3 Version Requirements

Engram uses the semver crate's version requirement syntax (identical to Cargo's):

Requirement Example Matches
"1.2" ^1.2.0 (caret) 1.2.0, 1.3.0, but not 2.0.0
">=1.0, <2.0" range explicit range
"*" wildcard any version

13. Build System

13.1 Build Targets

Target Artifact Notes
debug .elc + .map.json Full debug info, source maps
release .elc Optimized, no debug info
prod .sealed AES-256-GCM encrypted, tamper-evident

13.2 CLI Commands

el new <name>              scaffold a new project
el add <pkg>[@ver]         add a dependency to el.toml
el remove <pkg>            remove a dependency
el update                  update all deps to latest compatible
el build [--target prod]   build (reads el.toml)
el build --cross           build for all cross targets
el run                     build debug and run
el test                    run tests
el check                   type-check only
el fmt                     format source files
el clean                   clean build artifacts
el publish                 publish to registry
el search <query>          search registry
el plugin add <plugin>     add a compiler plugin

13.3 Incremental Builds

The build system tracks a BLAKE3 hash of every source file in .el/build-cache.json. On subsequent builds, only files whose hashes have changed (and their dependents) are recompiled. The cache is invalidated by el clean.

13.4 Cross-Compilation

The [cross].targets list specifies which native targets to produce when running el build --cross. Each cross build produces a separate artifact tagged with the target triple.

Target name Triple Notes
x86_64-linux x86_64-unknown-linux-gnu Standard Linux 64-bit
aarch64-linux aarch64-unknown-linux-gnu ARM64 Linux
x86_64-macos x86_64-apple-darwin Intel Mac
aarch64-macos aarch64-apple-darwin Apple Silicon
wasm32 wasm32-unknown-unknown WebAssembly

Cross-compilation currently emits bytecode tagged with the target triple. A native LLVM backend (future work) will use the triple to select the correct code generation backend. The LLVM extension point is clearly marked in the el-build crate source.

13.5 Artifact Names

Target Cross Artifact name
debug none {name}.elc
release none {name}.elc
prod none {name}.sealed
any wasm32 {name}-wasm32.wasm
any other {name}-{triple-short}.elc

14. Plugin System

14.1 Overview

Compiler plugins are Rust dynamic libraries (.dylib on macOS, .so on Linux) that implement the CompilerPlugin trait. They are loaded at compile time via dlopen (stub — full dynamic loading is a TODO) and receive hooks at three points in the compilation pipeline.

14.2 Plugin Trait

pub trait CompilerPlugin: Send + Sync {
    fn name(&self) -> &str;
    fn version(&self) -> &str;

    /// Called after parsing, before type checking.
    fn on_ast(&self, program: &mut Program) -> Result<(), PluginError>;

    /// Called after type checking, before code generation.
    fn on_typed_ast(&self, program: &Program, types: &TypeEnv) -> Result<(), PluginError>;

    /// Called after code generation, before sealing.
    fn on_bytecode(&self, bytecode: &mut Vec<u8>) -> Result<(), PluginError>;
}

14.3 Lifecycle Hooks

  1. on_ast — mutate or observe the AST after parsing. Use for: AST macros, synthetic node injection, linting.
  2. on_typed_ast — observe the type-checked AST. Use for: documentation generation, type-aware linting.
  3. on_bytecode — mutate or observe the final bytecode. Use for: instrumentation, size analysis.

14.4 Writing a Plugin

use el_build::{CompilerPlugin, PluginError};

pub struct MyPlugin;

impl CompilerPlugin for MyPlugin {
    fn name(&self) -> &str { "my-plugin" }
    fn version(&self) -> &str { "0.1.0" }

    fn on_ast(&self, _program: &mut Program) -> Result<(), PluginError> {
        // Observe or mutate the AST
        Ok(())
    }

    fn on_typed_ast(&self, _program: &Program, _types: &TypeEnv) -> Result<(), PluginError> {
        Ok(())
    }

    fn on_bytecode(&self, _bytecode: &mut Vec<u8>) -> Result<(), PluginError> {
        Ok(())
    }
}

// Required export symbol for dynamic loading:
#[no_mangle]
pub extern "C" fn engram_plugin_init() -> Box<dyn CompilerPlugin> {
    Box::new(MyPlugin)
}

14.5 Installing Plugins

Add to [plugins] in el.toml:

[plugins]
el-fmt = "1.0"
el-doc = "0.3"

Or use the CLI:

el plugin add el-fmt@1.0

Plugins are looked up in the system plugin directory. The el-registry fetches and installs them like regular packages.


15. Future Directions

  • ML-KEM sealed artifacts — upgrade el-seal to CRYSTALS-Kyber when the ml-kem crate stabilizes (drop-in: same format, new algorithm_id).
  • LSP server — spreading activation for autocomplete using the Engram database as the type graph backend.
  • Engram DB integration — live connection to an Engram database for activate at compile time (semantic type checking) and at runtime (actual node retrieval).
  • Struct construction syntaxUser { id: uuid, name: "Alice", ... }.
  • Genericsfn identity<T>(x: T) -> T { return x }.
  • Trait system — behavioral interfaces that interact with the Engram type graph.
  • Pattern matching on struct fieldsmatch user { User { name: "admin" } => ... }.