Add three new crates and extend the compiler and CLI toolchain: - el-manifest: el.toml manifest parser using serde + toml crate; supports package info, registry/path/version deps, build config with seal key sources, cross targets, and plugins; Manifest::find_manifest() walks up the directory tree - el-registry: HTTP registry client (reqwest + tokio) for packages.neurontechnologies.ai; PackageMetadata, fetch/download/publish/ search, BLAKE3 checksum verification, local cache at ~/.engram/packages/ - el-build: build orchestrator with incremental builds (BLAKE3 file hashes in .el/build-cache.json), cross-compilation target tagging, dep resolution, plugin registry with on_ast/on_typed_ast/on_bytecode hooks, test runner, fmt/check/clean commands - CrossTarget and NativeTarget enums with triple() and artifact_extension() methods; NativeTarget::Host detects compile-time platform via cfg! macros - Plugin system: CompilerPlugin trait + PluginRegistry; dynamic loading is a marked TODO with clear extension point for libloading - CLI extended with: new, add, remove, update, build --cross, run, test, check, fmt, clean, publish, search, plugin add/remove/list; old single-file commands moved to build-file/seal/unseal subcommands - Fix pre-existing debugger.rs borrow error (unwrap_or temporary lifetime) - Fix checker.rs and codegen.rs to handle TestDef/Seed/Assert Stmt variants - Add spec/language.md sections 12-14: package system, build system, plugin system, cross-compilation targets table 130 tests passing, zero warnings
25 KiB
Engram Language Specification
Version 0.1.0 — April 2026
Overview
Engram is a statically-typed programming language designed from first principles around a knowledge graph type system. Its three defining properties:
-
Types are Engram nodes. Every named type in the language is a node in a knowledge graph. Type compatibility is not purely structural — it is also semantic. Two types are compatible if their Engram node embeddings are similar enough in meaning-space.
-
Autocomplete is spreading activation. The language server (LSP) uses spreading activation over the type graph to suggest completions. You get concepts semantically related to what you're building, not just methods on the current type.
-
The
prodcompilation target is quantum-sealed. Bytecode compiled with--target prodis encrypted with AES-256-GCM and signed. Without the deployment key, the artifact is indistinguishable from random bytes. No static analysis tool can decompile it.
1. Syntax Reference
1.1 Comments
// Single-line comment — extends to end of line
Block comments are not supported in v0.1. Use // on each line.
1.2 Variable Declarations
let name: Type = expression
let name = expression // type inferred
Variables are immutable by default. All bindings are block-scoped.
1.3 Functions
fn name(param1: Type1, param2: Type2) -> ReturnType {
// body
return expression
}
Functions are first-class values. The type of a function is:
fn(Type1, Type2) -> ReturnType
1.4 Types (Structs)
type TypeName {
field1: Type1
field2: Type2
// ...
}
Every type definition registers a node in the Engram type graph. The type
name becomes searchable via spreading activation.
1.5 Enums
enum EnumName {
Variant1
Variant2
VariantWithPayload(PayloadType)
}
Enum variants without parentheses carry no payload. Variants with parentheses carry exactly one value of the given type.
1.6 Pattern Matching
match expression {
Pattern1 => result_expr1
Pattern2 => result_expr2
// ...
}
Pattern forms:
EnumName::Variant— unit enum variantEnumName::Variant(binding)— enum variant with payload bindingliteral— exact literal (42, "str", true)name— binding (captures the subject intoname)_— wildcard (always matches, discards)
All arms must produce the same type. The match expression evaluates to that type.
1.7 Control Flow
If/else:
if condition {
// then branch
} else {
// else branch
}
Both branches must produce the same type. The else branch is optional (produces Void).
For loops:
for item in collection {
// body
}
1.8 Field Access
value.field_name
Field access is type-checked at compile time. Accessing a field that does not exist in the type definition is a compile error.
1.9 Array Literals
let numbers: [Int] = [1, 2, 3]
let empty: [String] = []
1.10 Index Access
let first: Int = numbers[0]
Index expressions require an Int index. Bounds checking is runtime behavior.
2. Type System
2.1 Primitive Types
| Type | Description | Example literal |
|---|---|---|
Int |
64-bit signed integer | 42, -7, 1_000 |
Float |
64-bit IEEE 754 double | 3.14, 0.5 |
String |
UTF-8 string | "hello" |
Bool |
Boolean | true, false |
Uuid |
RFC 4122 UUID | (runtime only) |
Void |
Unit type; no value | — |
2.2 Composite Types
| Type form | Description |
|---|---|
[T] |
Array of T |
T? |
Optional T (may be absent) |
Named |
User-defined struct or enum |
2.3 Numeric Coercions
Intis implicitly coercible toFloat.Floatis not coercible toInt(use explicit conversion when available).String + Stringuses concatenation (the+operator is overloaded).
2.4 Structural vs. Semantic Compatibility
Standard structural compatibility:
Named("User")is compatible withNamed("User")(same name).[Int]is compatible with[Int].Tis compatible withT?(non-optional can be used as optional).Intis compatible withFloat(widening).
Semantic compatibility (novel):
When two named types have registered Engram node type mappings that refer to the same node class, they are considered semantically compatible:
// Register User and Customer as both mapping to the "Entity" Engram node type
// → User and Customer are semantically compatible
This is computed via cosine similarity over node embeddings when an Engram database is connected. Without a database, the comparison is symbolic (same node type string = compatible).
Semantic compatibility threshold: cosine similarity ≥ 0.85 (configurable).
2.5 Type Inference
The compiler infers types for let bindings without annotations:
let x = 42 // inferred: Int
let s = "hello" // inferred: String
let b = true // inferred: Bool
Function return types and parameter types must always be annotated. This is intentional: function signatures are documentation.
3. The activate Construct
3.1 Syntax
activate TypeName where "semantic query string"
3.2 What It Does
activate is a first-class language construct that performs a spreading
activation query over the Engram knowledge graph and returns a typed array of
results.
The query string is a natural language description. At runtime, the Engram runtime:
- Embeds the query string into the same vector space as node embeddings.
- Starts activation at the
TypeNamenode and all nodes semantically related to it (cosine similarity above threshold). - Spreads activation outward through graph edges, attenuated by edge weight and node salience.
- Returns all nodes whose activation level exceeds the minimum threshold,
projected back to the
TypeNameschema.
3.3 Static Typing
The result type of activate TypeName where "..." is always [TypeName].
let users: [User] = activate User where "recent premium subscribers"
// ↑ type-checked: [User]
If TypeName is not a registered type, the compiler emits an error. The
query string is opaque to the type checker — it only passes the string through
to the runtime.
3.4 Without an Engram Database
When compiled without an Engram database (CompilerOptions::engram_db_path is
None), the activate construct emits a stub instruction. At runtime, the
interpreter emits a diagnostic and returns an empty array [].
This allows programs using activate to compile and run in development without
requiring a live Engram instance.
3.5 Compile-time Behavior
At --target prod, activate is compiled to an ACTIVATE bytecode instruction.
The query string is embedded in the bytecode. The sealed artifact protects the
query from being read by static analysis.
4. Sealed Blocks
4.1 Syntax
sealed {
let api_key: String = env("API_KEY")
// ...
}
4.2 What's Protected
Code inside a sealed {} block is subject to additional runtime protection:
- In debug builds: The
SEALED_BEGIN/SEALED_ENDbytecode markers are emitted. The debugger is notified not to expose values in this region. - In release builds: Same as debug, with no source map entries for the sealed region.
- In prod builds: The entire bytecode (including the sealed section) is AES-256-GCM encrypted in the sealed artifact. There is no additional treatment of the sealed section — the entire prod artifact is the sealed section.
4.3 Intent
The sealed {} block communicates developer intent: "this section handles
sensitive material." It is especially meaningful during development when
debug builds are used, since it signals to the runtime and any attached
debugger to redact values from inspection.
In prod builds, the sealed {} annotation is redundant (the whole artifact
is sealed), but it is preserved for documentation and future tooling that can
enforce stricter runtime isolation.
5. Compilation Targets
5.1 debug
el build file.el --target debug
Produces:
file.elc— JSON-serialized bytecode instructionsfile.map.json— source map: JSON array of{instruction, start, end, line, col}objects
The source map allows debuggers and error reporters to translate bytecode offsets back to exact source positions (file + line + column).
Debug builds:
- No dead-code elimination
- No constant folding
- Full source map coverage
- Type errors are reported as warnings (compilation continues)
5.2 release
el build file.el --target release
Produces:
file.elc— JSON-serialized bytecode instructions
Release builds:
- No source map
- Minor dead-code pruning (unreachable after
return) - Type errors are warnings (compilation continues)
5.3 prod
el build file.el --target prod
ENGRAM_SEAL_KEY=my-secret el build file.el --target prod
Produces:
file.sealed— quantum-sealed artifact
Prod builds:
- Type errors are fatal — the compiler refuses to produce a sealed artifact from a program with type errors
- The output is encrypted and cannot be decompiled
- All debug information is stripped before sealing
- Source maps are never produced
6. The Sealed Artifact Format
6.1 Wire Format
Offset Size Field
────── ────── ────────────────────────────────────────────
0 8 Magic: b"ENGRAM01"
8 2 Format version: u16 big-endian (currently 1)
10 * JSON body: SealedArtifact struct
The JSON body is a SealedArtifact:
{
"algorithm_id": "aes256gcm-v1",
"signature": "...(base64)...",
"encapsulated_key": "...(base64)...",
"nonce": "...(base64)...",
"ciphertext": "...(base64)...",
"deployment_fingerprint": "...(base64 or null)..."
}
6.2 Field Descriptions
| Field | Description |
|---|---|
algorithm_id |
The encryption algorithm. Currently aes256gcm-v1. Reserved for ML-KEM upgrade. |
signature |
BLAKE3 keyed MAC over (algorithm_id ‖ nonce ‖ ciphertext). Detects tampering before decryption attempt. |
encapsulated_key |
32 bytes: symmetric_key XOR BLAKE3(deployment_binding_material). Requires knowledge of the deployment secret to recover the symmetric key. |
nonce |
12-byte (96-bit) AES-GCM nonce. Randomly generated per seal operation. |
ciphertext |
AES-256-GCM ciphertext of the bytecode, including the 128-bit GCM authentication tag. |
deployment_fingerprint |
BLAKE3 hash of the deployment binding material. Stored so the unsealer can verify it is running in the correct environment before attempting decryption. null for DeploymentBinding::None. |
6.3 Sealing Process
- Generate a cryptographically random 256-bit symmetric key
K. - Encrypt bytecode:
ciphertext = AES-256-GCM(K, nonce=random_96bit, plaintext=bytecode). - Derive the deployment binding hash:
H = BLAKE3(deployment_material). - Encapsulate:
encapsulated_key = K XOR H(32 bytes). - Compute MAC:
signature = BLAKE3-keyed(K, algorithm_id ‖ nonce ‖ ciphertext). - Serialize:
ENGRAM01 ‖ version_u16be ‖ JSON(artifact).
6.4 Unsealing Process
- Parse magic and version; reject if not
ENGRAM01 / version 1. - Derive deployment hash:
H = BLAKE3(provided_binding_key). - Verify fingerprint: if
deployment_fingerprintis present, assertBLAKE3(binding_key) == fingerprint. Fail withBindingMismatchif not. - Recover symmetric key:
K = encapsulated_key XOR H. - Verify MAC: compute
BLAKE3-keyed(K, ...)and compare tosignature. Fail withSignatureInvalidif mismatch. - Decrypt:
bytecode = AES-256-GCM-Decrypt(K, nonce, ciphertext). The GCM auth tag is verified here automatically.
6.5 Security Properties
Why "quantum-sealed":
AES-256 is quantum-resistant at the 256-bit key length. Grover's algorithm provides a quadratic speedup in key search, reducing effective security from 2^256 to 2^128. 128-bit quantum security is considered sufficient by NIST for the foreseeable future.
The algorithm_id field is forward-compatible: when ml-kem (CRYSTALS-Kyber
ML-KEM-768 or ML-KEM-1024) crates stabilize, the upgrade is:
- Implement
SealAlgorithm::MlKem768inel-seal. - The
encapsulated_keyfield becomes the KEM-encapsulated ciphertext. - Old artifacts retain their
aes256gcm-v1algorithm_id and continue to unseal via the existing code path.
Decompilation resistance:
Without the deployment key, K cannot be recovered (requires knowing
deployment_material), so ciphertext is indistinguishable from random
bytes. Static analysis tools, disassemblers, and decompilers receive the
AES-GCM ciphertext — semantically empty. Any tampering flips bits in the GCM
ciphertext, causing authentication tag verification to fail before the
symmetric layer is even reached.
7. Deployment Binding Modes
| Mode | Description | Security |
|---|---|---|
EnvironmentKey(var) |
Derives binding from the value of an environment variable. Default: ENGRAM_SEAL_KEY. |
High — key must be provisioned at runtime |
MachineFingerprint |
Derives binding from hostname + OS + architecture. Artifact can only run on the same machine. | Medium — fingerprint is observable |
None |
No binding (zero vector). Testing and development only. | None |
8. Operators
| Operator | Types | Result |
|---|---|---|
+ |
Int, Float, String | same as operands (String: concatenation) |
- |
Int, Float | same |
* |
Int, Float | same |
/ |
Int, Float | same |
== |
any compatible pair | Bool |
!= |
any compatible pair | Bool |
< > <= >= |
Int, Float | Bool |
&& |
Bool, Bool | Bool |
|| |
Bool, Bool | Bool |
! |
Bool | Bool |
Operator precedence (high to low):
!(unary)*/+-<><=>===!=&&||
9. Escape Sequences in String Literals
| Sequence | Character |
|---|---|
\n |
Newline |
\t |
Tab |
\r |
Carriage return |
\" |
Double quote |
\\ |
Backslash |
\0 |
Null byte |
10. CLI Reference
el build <file.el> [--target debug|release|prod] [-o <output>]
el run <file.el>
el check <file.el>
el seal <artifact> [-o <output>]
el unseal <artifact> [-o <output>]
el build — Compile a source file. Default target is debug.
el run — Compile with debug target and execute immediately in the
built-in interpreter. Does not write an output file.
el check — Type-check only. Exits with code 0 if no errors, 1 if errors. Useful for CI.
el seal — Take an existing release artifact and seal it. Reads
ENGRAM_SEAL_KEY from the environment if set.
el unseal — Decrypt a sealed artifact. Reads ENGRAM_SEAL_KEY from the
environment. Writes decrypted bytecode to the output path.
11. Grammar (EBNF)
program = stmt* EOF
stmt = let_stmt
| return_stmt
| fn_def
| type_def
| enum_def
| expr_stmt
let_stmt = "let" IDENT (":" type_expr)? "=" expr ";"?
return_stmt = "return" expr ";"?
expr_stmt = expr ";"?
fn_def = "fn" IDENT "(" param_list ")" "->" type_expr "{" stmt* "}"
type_def = "type" IDENT "{" (IDENT ":" type_expr ","? ";"?)* "}"
enum_def = "enum" IDENT "{" variant* "}"
variant = IDENT ("(" type_expr ")")? ","?
param_list = (param ("," param)*)?
param = IDENT ":" type_expr
type_expr = IDENT
| "[" type_expr "]"
| type_expr "?"
| "fn" "(" (type_expr ("," type_expr)*)? ")" "->" type_expr
expr = or_expr
or_expr = and_expr ("||" and_expr)*
and_expr = eq_expr ("&&" eq_expr)*
eq_expr = cmp_expr (("==" | "!=") cmp_expr)*
cmp_expr = add_expr (("<" | ">" | "<=" | ">=") add_expr)*
add_expr = mul_expr (("+" | "-") mul_expr)*
mul_expr = unary_expr (("*" | "/") unary_expr)*
unary_expr = "!" unary_expr | postfix_expr
postfix_expr = primary ("." IDENT | "(" arg_list ")" | "[" expr "]")*
primary = INT | FLOAT | STRING | BOOL
| "(" expr ")"
| "[" arg_list "]"
| "{" stmt* "}"
| "if" expr primary ("else" primary)?
| "match" expr "{" match_arm* "}"
| "activate" IDENT "where" STRING
| "sealed" "{" stmt* "}"
| IDENT ("::" IDENT)*
arg_list = (expr ("," expr)*)?
match_arm = pattern "=>" expr ","?
pattern = "_"
| IDENT "::" IDENT ("(" IDENT ")")?
| INT | STRING | BOOL
| IDENT
12. Package System
12.1 Project Manifest — el.toml
Every Engram project has an el.toml at its root. The manifest is parsed by
the el-manifest crate.
[package]
name = "my-service"
version = "0.1.0"
description = "What this does"
authors = ["Will Anderson <will@neurontechnologies.ai>"]
license = "MIT"
edition = "2026"
[dependencies]
engram-http = "1.2"
engram-auth = "0.8.1"
some-local = { path = "../some-local" }
[dev-dependencies]
el-test = "0.1"
[build]
target = "prod" # debug | release | prod (default: debug)
entry = "src/main.el" # main entry point (default: src/main.el)
output = "dist/" # output directory (default: dist/)
seal_key = "env:ENGRAM_SEAL_KEY" # key source for prod sealed artifacts
[cross]
targets = ["x86_64-linux", "aarch64-linux", "x86_64-macos", "aarch64-macos", "wasm32"]
[plugins]
el-fmt = "1.0" # code formatter plugin
el-doc = "0.3" # documentation generator
Dependency specifiers
| Form | Example | Meaning |
|---|---|---|
| String | "1.2" |
Version requirement from default registry |
| Path table | { path = "../lib" } |
Local path dependency |
| Registry table | { version = "1.0", registry = "https://..." } |
Private registry |
Seal key sources
| Form | Example | Meaning |
|---|---|---|
env:VAR |
env:ENGRAM_SEAL_KEY |
Read from environment variable at build time |
file:path |
file:/etc/engram/key.bin |
Read raw bytes from a file |
| Literal | my-secret-key |
Inline key (development only) |
12.2 Dependency Resolution
Dependencies are resolved by the el-registry crate, which talks to the
registry at https://packages.neurontechnologies.ai.
Resolution algorithm:
- For each
[dependencies]entry, fetch all available versions from the registry. - Pick the highest version satisfying the version requirement (semver).
- Download the tarball and verify the BLAKE3 checksum.
- Cache in
~/.engram/packages/{name}/{version}/. - Path dependencies bypass the registry entirely.
12.3 Version Requirements
Engram uses the semver crate's version requirement syntax (identical to
Cargo's):
| Requirement | Example | Matches |
|---|---|---|
"1.2" |
^1.2.0 (caret) |
1.2.0, 1.3.0, but not 2.0.0 |
">=1.0, <2.0" |
range | explicit range |
"*" |
wildcard | any version |
13. Build System
13.1 Build Targets
| Target | Artifact | Notes |
|---|---|---|
debug |
.elc + .map.json |
Full debug info, source maps |
release |
.elc |
Optimized, no debug info |
prod |
.sealed |
AES-256-GCM encrypted, tamper-evident |
13.2 CLI Commands
el new <name> scaffold a new project
el add <pkg>[@ver] add a dependency to el.toml
el remove <pkg> remove a dependency
el update update all deps to latest compatible
el build [--target prod] build (reads el.toml)
el build --cross build for all cross targets
el run build debug and run
el test run tests
el check type-check only
el fmt format source files
el clean clean build artifacts
el publish publish to registry
el search <query> search registry
el plugin add <plugin> add a compiler plugin
13.3 Incremental Builds
The build system tracks a BLAKE3 hash of every source file in
.el/build-cache.json. On subsequent builds, only files whose hashes have
changed (and their dependents) are recompiled. The cache is invalidated by
el clean.
13.4 Cross-Compilation
The [cross].targets list specifies which native targets to produce when
running el build --cross. Each cross build produces a separate artifact
tagged with the target triple.
| Target name | Triple | Notes |
|---|---|---|
x86_64-linux |
x86_64-unknown-linux-gnu |
Standard Linux 64-bit |
aarch64-linux |
aarch64-unknown-linux-gnu |
ARM64 Linux |
x86_64-macos |
x86_64-apple-darwin |
Intel Mac |
aarch64-macos |
aarch64-apple-darwin |
Apple Silicon |
wasm32 |
wasm32-unknown-unknown |
WebAssembly |
Cross-compilation currently emits bytecode tagged with the target triple. A
native LLVM backend (future work) will use the triple to select the correct
code generation backend. The LLVM extension point is clearly marked in the
el-build crate source.
13.5 Artifact Names
| Target | Cross | Artifact name |
|---|---|---|
debug |
none | {name}.elc |
release |
none | {name}.elc |
prod |
none | {name}.sealed |
| any | wasm32 |
{name}-wasm32.wasm |
| any | other | {name}-{triple-short}.elc |
14. Plugin System
14.1 Overview
Compiler plugins are Rust dynamic libraries (.dylib on macOS, .so on
Linux) that implement the CompilerPlugin trait. They are loaded at compile
time via dlopen (stub — full dynamic loading is a TODO) and receive hooks at
three points in the compilation pipeline.
14.2 Plugin Trait
pub trait CompilerPlugin: Send + Sync {
fn name(&self) -> &str;
fn version(&self) -> &str;
/// Called after parsing, before type checking.
fn on_ast(&self, program: &mut Program) -> Result<(), PluginError>;
/// Called after type checking, before code generation.
fn on_typed_ast(&self, program: &Program, types: &TypeEnv) -> Result<(), PluginError>;
/// Called after code generation, before sealing.
fn on_bytecode(&self, bytecode: &mut Vec<u8>) -> Result<(), PluginError>;
}
14.3 Lifecycle Hooks
on_ast— mutate or observe the AST after parsing. Use for: AST macros, synthetic node injection, linting.on_typed_ast— observe the type-checked AST. Use for: documentation generation, type-aware linting.on_bytecode— mutate or observe the final bytecode. Use for: instrumentation, size analysis.
14.4 Writing a Plugin
use el_build::{CompilerPlugin, PluginError};
pub struct MyPlugin;
impl CompilerPlugin for MyPlugin {
fn name(&self) -> &str { "my-plugin" }
fn version(&self) -> &str { "0.1.0" }
fn on_ast(&self, _program: &mut Program) -> Result<(), PluginError> {
// Observe or mutate the AST
Ok(())
}
fn on_typed_ast(&self, _program: &Program, _types: &TypeEnv) -> Result<(), PluginError> {
Ok(())
}
fn on_bytecode(&self, _bytecode: &mut Vec<u8>) -> Result<(), PluginError> {
Ok(())
}
}
// Required export symbol for dynamic loading:
#[no_mangle]
pub extern "C" fn engram_plugin_init() -> Box<dyn CompilerPlugin> {
Box::new(MyPlugin)
}
14.5 Installing Plugins
Add to [plugins] in el.toml:
[plugins]
el-fmt = "1.0"
el-doc = "0.3"
Or use the CLI:
el plugin add el-fmt@1.0
Plugins are looked up in the system plugin directory. The el-registry fetches
and installs them like regular packages.
15. Future Directions
- ML-KEM sealed artifacts — upgrade
el-sealto CRYSTALS-Kyber when theml-kemcrate stabilizes (drop-in: same format, newalgorithm_id). - LSP server — spreading activation for autocomplete using the Engram database as the type graph backend.
- Engram DB integration — live connection to an Engram database for
activateat compile time (semantic type checking) and at runtime (actual node retrieval). - Struct construction syntax —
User { id: uuid, name: "Alice", ... }. - Generics —
fn identity<T>(x: T) -> T { return x }. - Trait system — behavioral interfaces that interact with the Engram type graph.
- Pattern matching on struct fields —
match user { User { name: "admin" } => ... }.