Files
el/lang/BOOTSTRAP.md
2026-05-05 01:38:51 -05:00

46 KiB

El Language Bootstrap Guide

This document is the authoritative guide for reconstructing the El compiler toolchain from scratch. If the bootstrap binary at dist/platform/elc is ever lost, this document is the path back.


1. The Bootstrap Chain (Current State)

The Trust Chain

El is a self-hosting language. The compiler is written in El. This creates a circular dependency: you need an El compiler to compile the El compiler. The chain is resolved by a seed binary:

dist/platform/elc   (Mach-O arm64 native binary)
        ↓
   compiles elc-cli.el
        ↓
   new self-hosted elc binary
        ↓
   compiles itself again (identity check)
        ↓
   stable self-hosted compiler

The binary at dist/platform/elc is a Mach-O 64-bit arm64 executable. The elc.preselfhost and elc.legacy files in the same directory are older snapshots kept as fallback checkpoints.

The key property: every binary in dist/platform/ was produced by compiling the El source in el-compiler/src/ using a previous version of that same binary. The chain is auditable: the source is the ground truth, not the binary.

The Self-Hosting Pipeline

elc-cli.el
  imports → el-compiler/src/compiler.el
               imports → el-compiler/src/lexer.el
               imports → el-compiler/src/parser.el
               imports → el-compiler/src/codegen.el
               imports → el-compiler/src/codegen-js.el

Import resolution is textual. compiler.el recursively inlines all imported .el files before lex/parse. The result is one large unified source string that the compiler then processes in a single pass.

elc-combined.el in the repo root is a pre-merged single-file edition used during early bootstrap iterations.

What the Bootstrap Binary Actually Is

The dist/platform/elc binary is a compiled El program that was produced by running an earlier version of itself on elc-cli.el. It is not a Rust binary. The elc.legacy and elc.preselfhost checkpoints suggest the chain has been continuously self-hosting and re-stamped. The original genesis compiler (referenced in the language spec as a "Rust genesis compiler") was used to produce the first self-hosted binary; that Rust binary is not present in this repo.

To rebuild the current binary from source using the current binary:

cd /path/to/el
./dist/platform/elc elc-cli.el elc-new.c
cc -std=c11 -I el-compiler/runtime -lcurl -lpthread \
   -o dist/platform/elc-new \
   elc-new.c el-compiler/runtime/el_runtime.c

Verify self-hosting by using elc-new to recompile itself and diffing the outputs.


2. The Language

2.1 Lexical Structure

El source is UTF-8. File extension .el. Comments are single-line only: // to end of line.

Token representation: every token is a map { "kind": String, "value": String }.

Keywords — from keyword_kind() in lexer.el:

Keyword Token Kind Notes
let Let variable binding
fn Fn function definition
type Type struct definition
enum Enum enum definition
match Match pattern match
return Return function return
if If conditional
else Else
for For iteration
in In used in for x in list
while While loop
import Import module import
from From from mod import { Name }
as As (reserved, no parse form)
with With (reserved)
sealed Sealed (reserved)
activate Activate (reserved)
where Where (reserved)
test Test (reserved)
seed Seed (reserved)
assert Assert (reserved)
protocol Protocol (reserved)
impl Impl (reserved)
retry Retry reserved / soft keyword in expr position
times Times reserved / soft keyword
fallback Fallback reserved / soft keyword
reason Reason reserved / soft keyword
parallel Parallel reserved / soft keyword
trace Trace reserved / soft keyword
requires Requires reserved / soft keyword
deploy Deploy reserved / soft keyword
to To reserved / soft keyword
via Via reserved / soft keyword
target Target RESERVED — cannot use as identifier
true Bool literal value true
false Bool literal value false
cgi Cgi CGI identity block
service Service service declaration block
manager Manager VBD role decorator / soft keyword
engine Engine VBD role decorator / soft keyword
accessor Accessor VBD role decorator / soft keyword
vessel Vessel soft keyword
extern Extern extern fn forward declaration

Soft keywords (target, to, via, deploy, reason, times, fallback, retry, parallel, trace, requires, where, as, with, manager, engine, accessor, vessel): these have dedicated token kinds but the parser re-interprets them as Ident nodes when they appear in expression position (e.g., as parameter names or local variable names).

All token kinds:

Kind Pattern
Int [0-9]+
Float [0-9]+ '.' [0-9]+
Str "…" with \", \n, \t, \r, \\ escapes
Bool true or false
Ident [a-zA-Z_][a-zA-Z0-9_]* (not a keyword)
keyword tokens one per keyword above
Eq =
EqEq ==
NotEq !=
Not !
Lt / LtEq / Gt / GtEq < <= > >=
And && (single & is consumed and discarded)
Or ||
Pipe |
PipeOp |>
Plus / Minus / Star / Slash + - * /
Percent %
Arrow ->
FatArrow =>
Colon / ColonColon : ::
LParen / RParen ( )
LBrace / RBrace { }
LBracket / RBracket [ ]
Comma / Dot / Semicolon , . ;
At @
QuestionMark ?
Eof end-of-input sentinel

String comment stripping: the lexer contains a special heuristic for string literals that embed JavaScript or CSS (looks_like_code). If a string contains <script, <style, or function + ;, the lexer strips // and /* */ comments from the string value before producing the Str token. This is a compile-time content sanitization pass.

2.2 AST Node Types

Every AST node is a Map<String, Any>. The "expr" or "stmt" key names the node type.

Expression nodes:

expr value Fields Meaning
Int value: String integer literal
Float value: String float literal
Str value: String string literal
Bool value: String "true" or "false"
Nil null / missing
Ident name: String identifier reference
BinOp op: String, left, right binary operation
Not inner unary !
Neg inner unary -
Call func, args: [expr] function call
Field object, field: String obj.field
Index object, index obj[idx]
Array elems: [expr] [e1, e2, …]
Map pairs: [{ key: String, value: expr }] { "k": v, … }
If cond, then: [stmt], else: [stmt], has_else: Bool conditional expression
For item: String, list, body: [stmt] for-in expression
Match subject, arms: [{ pattern, body }] pattern match
DurationLit count: String, unit: String 30.seconds, 1.hour
Try inner postfix ? (no-op passthrough today)

Binary operators (op field values): Plus, Minus, Star, Slash, EqEq, NotEq, Lt, Gt, LtEq, GtEq, And, Or.

Operator precedence (higher = tighter binding):

Level Operators
6 Star, Slash
5 Plus, Minus
4 Lt, Gt, LtEq, GtEq
3 EqEq, NotEq
2 And
1 Or

Pattern nodes (used inside Match arms):

pattern value Fields Meaning
Wildcard _ — always matches
Binding name: String binds subject to name
LitInt value: String integer literal pattern
LitStr value: String string literal pattern
LitBool value: String boolean literal pattern

Statement nodes:

stmt value Fields Meaning
Let name: String, value: expr, type: String variable binding
Assign name: String, value: expr bare reassignment name = expr
Return value: expr return statement
While cond: expr, body: [stmt] while loop
For item: String, list: expr, body: [stmt] for-in loop
FnDef name: String, params: [param], body: [stmt], ret_type: String, decorator?: String function definition
ExternFn name: String, params: [param], ret_type: String forward declaration
TypeDef name: String, fields: [{ name: String }] struct type definition
EnumDef name: String, variants: [{ name: String }] enum definition
Import path: String import "file.el" or from mod import { … }
CgiBlock name, dharma_id, principal, network, engram, has_*: Bool CGI identity declaration
ServiceBlock name, sponsor, domain service declaration
Expr value: expr bare expression statement

Param nodes: { "name": String, "type": String } where type is the leading identifier of the type annotation (e.g., "Int", "String", "Map") or "" if unannotated.

2.3 The Type System

Type annotations are parsed and stored but not type-checked at compile time. They serve as documentation and as hints to the codegen for arithmetic dispatch.

Built-in types:

Type C representation Notes
String const char* cast to el_val_t via EL_STR() macro
Int int64_t direct
Bool int64_t 0 = false, nonzero = true
Float int64_t bit-cast double via el_from_float()
Void void functions returning nothing
Any void* cast to el_val_t generic containers
[T] el_val_t pointer to ElList struct
Map<K,V> el_val_t pointer to ElMap struct

Temporal types (first-class in codegen):

Type Representation Notes
Instant nanoseconds since Unix epoch as int64_t now() returns this
Duration signed nanoseconds as int64_t 30.seconds = 30 * 1000000000
Calendar pointer to heap-allocated struct earth_calendar(zone)
CalendarTime pointer to heap-allocated struct now_in(cal)
LocalDate pointer to heap-allocated struct local_date(y, m, d)
LocalTime nanoseconds since midnight, direct int64_t local_time(h, m, s, ns)
Zone pointer to heap-allocated struct zone("America/New_York")
Rhythm pointer to heap-allocated struct recurrence pattern

The codegen tracks type-annotated variable names in per-function process state (__int_names, __instant_names, __duration_names, etc.) to dispatch arithmetic and comparisons through the correct runtime wrappers. Type-mismatched operations (e.g., Instant + Instant) are emitted as #error directives.

Duration postfix literals: 30.seconds, 1.hour, 500.millis, 30.nanos are parsed as DurationLit AST nodes and compiled to el_duration_from_nanos(count * multiplier). The multipliers:

Unit Nanoseconds
nano / nanos 1
milli / millis / millisecond / milliseconds 1,000,000
second / seconds 1,000,000,000
minute / minutes 60,000,000,000
hour / hours 3,600,000,000,000
day / days 86,400,000,000,000

2.4 Key Language Semantics

Implicit return. The final expression in a function body becomes the return value if it is not a control-flow construct (If, For). The codegen's transform_implicit_return rewrites the last Expr statement into a Return statement before emitting.

Let-rebinding, not mutation. El uses let for both initial binding and rebinding:

let count = 0
let count = count + 1   // NOT mutation  creates a new binding in the same scope

The codegen tracks declared names per C scope. When count is already in declared, it emits count = count + 1; (plain assignment). When it is new, it emits el_val_t count = 0;. This means El does not have mutable variables in the traditional sense — every let is a potential redeclaration. The practical effect is that shadowing and in-place update use identical syntax.

Bare reassignment. The parser also handles name = expr (without let) when an Ident is immediately followed by Eq. This emits a plain C assignment.

target is reserved. The word target is lexed as the Target token kind — it cannot be used as a variable or parameter name. Use tgt or another name instead. This is a live gotcha in compiler.el itself, which uses tgt for exactly this reason.

__no_block_expr guard. The parser uses process state key __no_block_expr to suppress Map-literal parsing when parsing the condition of if, while, for, and match. This prevents a stray { (the start of the then-block) from being parsed as a Map literal.

Arena memory model. The runtime includes an arena allocator that is activated in server/long-running contexts. In CLI mode (elc, elb) the arena is inactive. Memory is managed via ARC (reference counting): el_retain() and el_release() on Lists and Maps. Strings and ints are not refcounted — the retain/release functions are safe no-ops on non-tagged values.


3. The Runtime API

All runtime functions are declared in el-compiler/runtime/el_runtime.h. Every compiled El program links against el-compiler/runtime/el_runtime.c.

All values are el_val_t (int64_t). Strings are pointers cast through int64_t using EL_STR(s) / EL_CSTR(v) macros.

Canonical compile command:

cc -std=c11 -I el-compiler/runtime -lcurl -lpthread \
   -o <out> <prog>.c el-compiler/runtime/el_runtime.c

I/O

Function Signature Description
println (s) -> Void print string + newline to stdout
print (s) -> Void print string without newline
readline () -> String read one line from stdin

String Operations

Function Signature Description
el_str_concat (a, b) -> String concatenate two strings
str_concat (a, b) -> String alias for el_str_concat
str_eq (a, b) -> Bool string equality comparison
str_starts_with (s, prefix) -> Bool prefix test
str_ends_with (s, suffix) -> Bool suffix test
str_contains (s, sub) -> Bool substring test
str_len (s) -> Int byte length
str_slice (s, start, end) -> String substring (byte offsets)
str_replace (s, from, to) -> String replace all occurrences
str_to_upper / str_upper (s) -> String uppercase
str_to_lower / str_lower (s) -> String lowercase
str_trim (s) -> String strip leading/trailing whitespace
str_lstrip / str_rstrip (s) -> String one-sided strip
str_index_of (s, sub) -> Int position of substring; -1 if absent
str_last_index_of (s, sub) -> Int last position
str_index_of_all (s, sub) -> [Int] all byte offsets (non-overlapping)
str_find_chars (s, any_of) -> Int first index of any char in set
str_split (s, sep) -> [String] split on separator
str_split_lines (s) -> [String] split on newlines
str_split_chars (s) -> [String] split into individual characters
str_split_n (s, sep, n) -> [String] split at most n times
str_join (list, sep) -> String join list with separator
str_char_at (s, i) -> String character at byte index
str_char_code (s, i) -> Int Unicode code point at index
str_pad_left (s, width, pad) -> String left-pad to width
str_pad_right (s, width, pad) -> String right-pad to width
str_format (fmt, data) -> String {key} interpolation
str_repeat (s, n) -> String repeat string n times
str_reverse (s) -> String reverse by codepoint
str_strip_prefix (s, prefix) -> String remove prefix if present
str_strip_suffix (s, suffix) -> String remove suffix if present
str_strip_chars (s, chars) -> String strip characters from both ends
str_count (s, sub) -> Int count non-overlapping occurrences
str_count_chars (s) -> Int codepoint count
str_count_bytes (s) -> Int alias for str_len
str_count_lines (s) -> Int line count
str_count_words (s) -> Int word count
str_count_letters (s) -> Int ASCII letter count
str_count_digits (s) -> Int ASCII digit count
is_letter / is_digit / is_alphanumeric (s) -> Bool ASCII char classification
is_whitespace / is_punctuation (s) -> Bool
is_uppercase / is_lowercase (s) -> Bool
int_to_str (n) -> String format integer
str_to_int (s) -> Int parse integer
str_to_float (s) -> Float parse float
parse_int (s, default) -> Int parse with fallback
bool_to_str (b) -> String format bool

Integer/Float Math

Function Description
el_abs(n) absolute value
el_max(a, b) maximum
el_min(a, b) minimum
float_to_str(f) format float as string
int_to_float(n) widen Int to Float
float_to_int(f) truncate Float to Int
format_float(f, decimals) format with N decimal places
decimal_round(f, decimals) round to N decimals
math_sqrt(f) square root
math_log(f) / math_ln(f) logarithms
math_sin(f) / math_cos(f) / math_pi() trigonometry

List Operations

Function Description
el_list_empty() create empty list
el_list_new(count, …) create list from N values (varargs)
el_list_len(list) length
el_list_get(list, i) element at index; 0 on out-of-bounds
el_list_append(list, e) append; returns updated list
el_list_clone(list) shallow copy
list_push(list, e) alias for el_list_append
list_push_front(list, e) prepend
list_join(list, sep) join to string
list_range(start, end) integer range [start, end)
native_list_empty() alias for el_list_empty (used in compiler source)
native_list_append(l, v) alias for el_list_append
native_list_get(l, idx) alias for el_list_get
native_list_len(l) alias for el_list_len
native_list_clone(l) alias for el_list_clone
append(l, e) method-call alias: list.append(e)
len(l) method-call alias: list.len()
get(l, i) method-call alias: list.get(i)

Map Operations

Function Description
el_map_new(count, …) create map from key/value pairs (varargs)
el_map_get(map, key) get value by key
el_map_set(map, key, value) set key; returns map
el_get_field(map, key) alias; emitted for .field access
map_get(map, key) method-call alias
map_set(map, key, value) method-call alias

ARC (Reference Counting)

Function Description
el_retain(v) increment refcount; no-op for non-heap values
el_release(v) decrement refcount; free when zero

In-Process State

Function Description
state_set(key, value) store in process-global key/value table
state_get(key) retrieve; "" if absent
state_del(key) delete key
state_keys() all keys as [String]

Filesystem

Function Description
fs_read(path) read file to string; "" on error
fs_write(path, content) write string; returns 1 on success
fs_write_bytes(path, bytes, length) write raw bytes of known length
fs_list(path) list directory entries
fs_exists(path) check if path exists
fs_mkdir(path) mkdir -p

HTTP Client

Function Description
http_get(url) GET; returns body string
http_post(url, body) POST; returns body string
http_post_json(url, json_body) POST with Content-Type: application/json
http_get_with_headers(url, headers_map) GET with custom headers
http_post_with_headers(url, body, headers_map) POST with custom headers
http_post_form_auth(url, form_body, auth_header) POST with auth
http_delete(url) DELETE
http_get_to_file(url, headers_map, output_path) stream response to file
http_post_to_file(url, body, headers_map, output_path) stream POST response to file
http_response(status, headers_json, body) build response envelope
url_encode(s) RFC 3986 percent-encoding
url_decode(s) URL decode
el_html_sanitize(html, allowlist_json) allowlist HTML sanitizer

HTTP Server

Function Description
http_serve(port, handler) start server; handler: (method, path, body) -> String
http_serve_v2(port, handler) start server; handler: (method, path, headers_map, body) -> String
http_set_handler(name) set handler by symbol name
http_set_handler_v2(name) v2 variant

JSON

Function Description
json_get(json, key) substring lookup of "key": value
json_parse(s) parse JSON string to List/Map
json_stringify(v) serialize Any to JSON string
json_get_string(j, key) typed extract: String
json_get_int(j, key) typed extract: Int
json_get_float(j, key) typed extract: Float
json_get_bool(j, key) typed extract: Bool
json_get_raw(j, key) extract nested object/array as JSON string
json_set(j, key, value) update field, return new JSON string
json_array_len(j) length of JSON array string
json_array_get(j, index) element at index
json_array_get_string(j, index) string element at index

Time (Epoch-Based)

Function Description
time_now() Unix epoch milliseconds
time_now_utc() same, explicit UTC
time_format(ts, fmt) format timestamp
time_to_parts(ts) decompose to Map of fields
time_from_parts(secs, ns, tz) construct timestamp
time_add(ts, n, unit) add duration
time_diff(ts1, ts2, unit) difference
unix_timestamp() Unix seconds as Int
sleep_secs(secs) sleep N seconds
sleep_ms(ms) sleep N milliseconds

Time (First-Class Instant/Duration)

Function Description
now() / el_now_instant() current time as Instant (nanoseconds)
unix_seconds(n) construct Instant from Unix seconds
unix_millis(n) construct Instant from Unix milliseconds
instant_from_iso8601(s) parse ISO 8601 string
instant_to_unix_seconds(i) extract Unix seconds
instant_to_unix_millis(i) extract Unix milliseconds
instant_to_iso8601(i) format as ISO 8601
el_duration_from_nanos(ns) construct Duration from nanoseconds
duration_seconds(n) Duration from seconds
duration_millis(n) Duration from milliseconds
duration_nanos(n) Duration from nanoseconds
duration_to_seconds(d) extract seconds
duration_to_millis(d) extract milliseconds
duration_to_nanos(d) extract nanoseconds
el_instant_add_dur(inst, dur) Instant + Duration
el_instant_sub_dur(inst, dur) Instant - Duration
el_instant_diff(a, b) Instant - Instant = Duration
el_duration_add/sub/scale/div Duration arithmetic
el_instant_lt/le/gt/ge/eq/ne Instant comparison
el_duration_lt/le/gt/ge/eq/ne Duration comparison
el_sleep_duration(dur) sleep for a Duration
ttl_cache_set(key, value) store with TTL
ttl_cache_get(key, max_age) retrieve if within max_age
ttl_cache_age(key) age of cached value as Duration

Calendar System

Function Description
zone(id) IANA zone or fixed offset
zone_utc() / zone_local() UTC and local zone
zone_offset(hours, minutes) fixed offset zone
earth_calendar(z) Gregorian calendar in zone
earth_calendar_default() system default
mars_calendar() / cycle_calendar(period) non-Earth calendars
no_cycle_calendar() / relative_calendar(epoch) abstract calendars
now_in(cal) current time as CalendarTime
in_calendar(inst, cal) project Instant into Calendar
cal_format(ct, pattern) format CalendarTime
cal_to_instant(ct) extract underlying Instant
cal_cycle_phase(ct) / cal_in(ct, cal) calendar ops
local_date(y, m, d) construct LocalDate
local_time(h, m, s, ns) construct LocalTime
local_datetime(date, time) construct LocalDateTime
zoned(date, time, cal) zoned datetime
local_date_year/month/day LocalDate accessors
local_time_hour/minute/second/nanos LocalTime accessors
el_local_date_add_dur / el_local_time_add_dur date/time arithmetic
el_local_date_lt / el_local_date_eq date comparison
rhythm_* recurrence patterns (cycle_start, weekday, weekly_at, next_after, matches, …)

Process / Execution

Function Description
args() command-line arguments as [String] (excludes argv[0])
env(key) read environment variable; "" if unset
exit(code) exit process with code
exit_program(code) alias for exit
getpid_now() current process ID
exec_command(cmd) run shell command; return exit code
exec_capture(cmd) run shell command; capture and return stdout
uuid_new() / uuid_v4() generate UUID v4
native_int_to_str(n) format integer (alias, used in compiler source)
native_string_chars(s) split string into [String] of single characters

Crypto

Function Description
sha256_hex(input) SHA-256, hex output
sha256_bytes(input) SHA-256, raw bytes
hmac_sha256_hex(key, msg) HMAC-SHA-256, hex
hmac_sha256_bytes(key, msg) HMAC-SHA-256, raw bytes
base64_encode(input) / base64_decode(input) standard base64
base64url_encode(input) / base64url_decode(input) URL-safe base64
sha3_256_hex(input) SHA3-256 (Keccak)
pq_keygen_signature() Dilithium-3 key pair
pq_sign(sk_hex, msg) / pq_verify(pk_hex, msg, sig_hex) PQ signatures
pq_kem_keygen() / pq_kem_encaps(pk) / pq_kem_decaps(sk, ct) Kyber-768 KEM
pq_hybrid_keygen() / pq_hybrid_handshake(remote_pub) X25519 + Kyber hybrid
aead_encrypt(key_hex, plaintext) AES-256-GCM encrypt
aead_decrypt(key_hex, nonce_hex, ct_hex) AES-256-GCM decrypt

DHARMA Network (CGI programs only)

Function Description
el_cgi_init(name, dharma_id, principal, network, engram) initialize CGI identity (called by generated main())
dharma_connect(cgi_id) open channel to peer
dharma_send(channel, content) send message; blocks for response
dharma_activate(query) spreading activation across DHARMA network
dharma_emit(event_type, payload) emit network event (@manager only)
dharma_field(event_type) wait for event (@manager only)
dharma_strengthen(cgi_id, weight) Hebbian potentiation
dharma_relationship(cgi_id) current relationship weight
dharma_peers() all connected peers sorted by weight

Engram Knowledge Graph

Function Description
engram_node(content, type, salience) create node; returns ID
engram_node_full(content, type, label, salience, importance, confidence, tier, tags) full node creation
engram_node_layered(…, layer_id) create node in specific layer
engram_get_node(id) retrieve node by ID
engram_strengthen(node_id) Hebbian potentiation
engram_forget(node_id) delete node and edges
engram_node_count() total node count
engram_edge_count() total edge count
engram_search(query, limit) full-text search
engram_scan_nodes(limit, offset) paginated node scan
engram_connect(from, to, weight, relation) create directed edge
engram_edge_between(from, to) get edge
engram_neighbors(node_id) BFS neighbors
engram_neighbors_filtered(node_id, max_depth, direction) filtered BFS
engram_activate(query, depth) spreading activation
engram_save(path) / engram_load(path) snapshot to/from disk
engram_add_layer(name, priority, suppressible, transparent, injectable) add consciousness layer
engram_remove_layer(layer_id) / engram_list_layers() layer management
engram_*_json variants JSON-string versions of search/scan/activate
engram_compile_layered_json(intent, depth) prompt-ready context block

LLM (Anthropic API)

Function Description
llm_call(model, prompt) single-turn call
llm_call_system(model, system, user) call with system prompt
llm_call_agentic(model, system, user, tools) agentic call with tools (CGI only)
llm_vision(model, system, prompt, image) vision call
llm_models() list available models
llm_register_tool(name, handler_fn_name) register tool handler (CGI only)

Observability

Function Description
emit_log(level, msg, fields_json) emit OTLP log
emit_metric(name, value, tags_json) emit OTLP metric
trace_span_start(name) start trace span
trace_span_end(span_handle) end trace span
emit_event(name, duration_ms) emit event

4. How to Re-Bootstrap from Zero

This section assumes the bootstrap binary is gone. Everything else (source files, runtime) is intact.

What You Need to Implement

A minimal El compiler has three parts: lexer, parser, codegen. Each can be written in any language. The goal is to compile elc-cli.el into a working elc binary, after which El is self-hosting again.

Step 1: Write a Minimal Lexer

The lexer must produce a list of { "kind": String, "value": String } maps (or equivalent structures). Required token kinds: Int, Float, Str, Bool, Ident, Eof, and all keywords and operators listed in section 2.1.

The minimal subset needed to compile the compiler itself:

  • Keywords: let, fn, return, if, else, while, for, in, import, from, true, false, extern
  • Literals: Int, Str, Bool, Ident
  • Operators: =, ==, !=, !, <, >, <=, >=, &&, ||, +, -, *, /, ->, =>, :, ,, ., (, ), {, }, [, ], @, ?
  • Special: Eof

The lexer in lexer.el walks a char array using native_list_get to avoid O(n²) string slicing. A Python implementation can use a simple index into a string. Escapes to handle: \", \n, \t, \r, \\.

Step 2: Write a Minimal Parser

The parser is a standard recursive descent parser. It produces AST maps as described in section 2.2.

The minimal statement forms needed to compile the compiler:

  • let name [: Type] = expr
  • fn name(params) [-> Type] { body }
  • extern fn name(params) [-> Type]
  • return expr
  • while cond { body }
  • for item in list { body }
  • if cond { body } [else [if] { body }]
  • import "path"
  • from module import { … }
  • @decorator stmt
  • name = expr (bare assignment)
  • bare expression statement

The minimal expression forms:

  • Integer, float, string, bool literals
  • Identifier
  • Binary operations with the precedence table from section 2.2
  • Unary ! and -
  • Function call: f(a, b, …)
  • Method call: obj.method(args) (parsed as Call with Field func)
  • Field access: obj.field
  • Index access: obj[i]
  • Array literal: [e1, e2, …]
  • Map literal: { "key": value, … }
  • if as expression
  • match expression
  • Postfix ? (can be a no-op)
  • Duration literal: N.unit

The __no_block_expr guard (section 2.4) is important: without it, if a || b { ... } will incorrectly parse { as a Map literal.

Step 3: Write a Minimal Codegen

The codegen emits C11 source. Required output structure:

#include <stdint.h>
#include <stdlib.h>
#include "el_runtime.h"

// Forward declarations for all non-main functions
el_val_t fn_name(el_val_t p1, el_val_t p2);
...

// File-scope let bindings (if any)
el_val_t GLOBAL_NAME;

// Function bodies
el_val_t fn_name(el_val_t p1, el_val_t p2) {
    ...
    return 0;
}

// Entry point
int main(int _argc, char** _argv) {
    el_runtime_init_args(_argc, _argv);
    ...
    return 0;
}

Critical codegen rules:

  1. All values are el_val_t. Every parameter, local variable, and return type is el_val_t unless the function has ret_type == "Void" (use void).

  2. Let-rebinding: track declared names per C scope. Emit el_val_t name = val; on first occurrence; emit name = val; on subsequent occurrences of the same name in the same scope.

  3. + dispatch: if either operand is a string literal → el_str_concat(a, b). If both are provably integers → (a + b). Default fallback → el_str_concat.

  4. == dispatch: if either operand is a string or identifier → str_eq(a, b). If both are integer literals or provably Int → (a == b).

  5. String literals: wrap in EL_STR("…") and escape: \"\\\", \n\\n, \t\\t, \\\\\\.

  6. Map literals: el_map_new(N, "k1", v1, "k2", v2, …). Empty map: el_map_new(0).

  7. Array literals: el_list_new(N, e1, e2, …). Empty: el_list_empty().

  8. Index access: string-literal index → el_get_field(obj, EL_STR("key")). Integer index → el_list_get(obj, idx).

  9. Field access obj.fieldel_get_field(obj, EL_STR("field")).

  10. Method call obj.method(args)method(obj, args).

  11. for item in list → emit:

    { el_val_t _el_lst = <list>; el_val_t _el_len = el_list_len(_el_lst);
      for (el_val_t _el_i = 0; _el_i < _el_len; _el_i++) {
        el_val_t item = el_list_get(_el_lst, _el_i);
        <body>
      }
    }
    
  12. match → GCC/Clang statement expression with goto:

    ({ el_val_t _s = <subject>; el_val_t _r = 0;
       if (_s == 42) { _r = <arm_body>; goto _done; }
       if (str_eq(_s, EL_STR("str"))) { _r = <arm_body>; goto _done; }
       { _r = <wildcard_body>; goto _done; }
       _done:; _r; })
    
  13. if as expression → similarly wrapped in a GCC/Clang statement expression.

  14. Implicit return: if the last statement in a function body is a bare Expr (not If or For), emit it as return <expr>; instead of <expr>;.

  15. Float literals: emit as el_from_float(<value>).

  16. Bool literals: true1, false0.

  17. fn main(): do not emit as a regular el_val_t function. Instead, fold its body into C's int main() after any top-level statements.

  18. extern fn: emit only a forward declaration (no body).

  19. Forward declarations: scan for all FnDef nodes before emitting bodies. This enables mutual recursion.

Step 4: Compile the El Compiler

Using your minimal implementation, compile elc-cli.el (which imports the entire compiler chain):

# Your minimal compiler
python3 minimal_elc.py elc-cli.el > elc-new.c

# Build with the runtime
cc -std=c11 -I el-compiler/runtime -lcurl -lpthread \
   -o elc-new elc-new.c el-compiler/runtime/el_runtime.c

Step 5: Verify Self-Hosting

# Compile elc-cli.el with the new compiler
./elc-new elc-cli.el elc-v2.c
cc -std=c11 -I el-compiler/runtime -lcurl -lpthread \
   -o elc-v2 elc-v2.c el-compiler/runtime/el_runtime.c

# Compile again with the second-generation compiler
./elc-v2 elc-cli.el elc-v3.c

# The outputs should be identical
diff elc-v2.c elc-v3.c

A clean diff confirms you have a stable fixed point: the compiler reproduces itself exactly.

Step 6: Replace the Bootstrap Binary

cp elc-v2 dist/platform/elc

You are bootstrapped.

Minimal El Subset for the Compiler Itself

The El compiler source (lexer.el, parser.el, codegen.el, compiler.el) uses:

  • fn, let, while, if/else, return, for/in, import
  • extern fn (for .elh headers)
  • String, Int, Bool, Void, Any, Map<String, Any>, [String], [Map<String, Any>]
  • Map literals { "key": val }
  • Array literals [...] (and native_list_empty())
  • List operations: native_list_empty(), native_list_append(), native_list_get(), native_list_len(), native_list_clone()
  • String operations: str_join(), str_eq(), str_contains(), str_starts_with(), str_slice(), str_trim(), str_split(), str_index_of(), str_len(), str_to_int(), native_string_chars(), native_int_to_str()
  • state_get(), state_set()
  • println(), fs_read(), fs_write(), exit()
  • el_release() (ARC cleanup)

The compiler does not use: HTTP, engram, dharma, LLM, crypto, UUID, float arithmetic.


5. The Long-Term Solution: elvm

Why a VM Makes Bootstrapping More Auditable

The current bootstrap chain relies on trusting a binary whose source we cannot fully audit by inspection alone. This is the classic "trusting trust" problem (Ken Thompson, 1984). A virtual machine breaks the chain:

  • elc targets elvm bytecode (instead of C)
  • elvm is a minimal interpreter hand-written in ~500 lines of C
  • The hand-written C is small enough to audit completely
  • Anyone can compile elvm.c with any C compiler
  • From there: elvm interprets elc.elvmelc compiles El → cc builds native binaries

The benefit: the trusted base shrinks from "a Mach-O binary" to "500 lines of straightforward C code that anyone can read in an afternoon."

The elvm Design

A minimal elvm needs:

  • A stack or register machine (stack is simpler)
  • Instructions: push, pop, add, sub, mul, div, cmp, jump, call, return, load, store
  • A string table (El strings are mostly literals)
  • A heap for ElList and ElMap
  • An FFI table mapping El runtime builtins to C functions

The El compiler would gain a --target=elvm flag in compile_dispatch(). Codegen would emit bytecode instead of C text. The runtime interface stays the same — builtins map to FFI slots by name.

This is the planned path. It does not exist yet.


6. Compiler Source Map

File Role Lines
elc-cli.el Entry point; imports compiler.el 7
el-compiler/src/compiler.el Pipeline wiring: lex → parse → codegen. Import resolution, --emit-header, fn main(). Defines compile(), compile_js(), compile_dispatch(), resolve_imports() 298
el-compiler/src/lexer.el Tokenizer. lex(source) → token list. Char helpers, keyword lookup, scan_digits, scan_ident, scan_string, strip_code_comments 747
el-compiler/src/parser.el Recursive descent parser. parse(tokens) → AST. All statement and expression forms 1071
el-compiler/src/codegen.el C code emitter. codegen(stmts, source) → (streams to stdout). Expression codegen, statement codegen, function codegen, type tracking, capability enforcement, temporal type dispatch 2721
el-compiler/src/codegen-js.el JavaScript backend. codegen_js(stmts, source) → JS source ~500
el-compiler/runtime/el_runtime.h Full runtime API declaration 755
el-compiler/runtime/el_runtime.c Full runtime implementation large
el-compiler/runtime/el_runtime.js JS runtime
elb.el Build coordinator. Reads manifest.el, walks import graph, compiles modules, links binary. The .NET-style incremental build model 367
elc-combined.el Pre-merged single-file bootstrap edition (for early bootstrap iterations) large
spec/language.md Language specification v1.2.0
dist/platform/elc Current bootstrap binary (Mach-O arm64)

7. Key Decisions and Gotchas

target is a Reserved Keyword

target is lexed as the Target token kind. It cannot be used as a variable or parameter name anywhere in El source. If you write fn compile(target: String), the parameter name will be tokenized as Target, which the parser does not recognize as an Ident in parameter position.

Workaround: use tgt, dest, backend, or any other name. The compiler source uses tgt specifically for this reason. This comes up whenever writing code that handles compilation targets.

let x = x + 1 is Let-Rebinding, Not Mutation

El has no mutable variables. let count = count + 1 re-introduces count into the current scope, shadowing the previous binding. At the C level, the codegen tracks declared names and emits plain assignment for subsequent bindings of the same name:

  • First let count = 0el_val_t count = 0;
  • Second let count = count + 1count = count + 1;

This means you cannot have two different values named count in the same C scope — the second binding overwrites the first. This is by design. Scoped shadowing works correctly because each block (if body, while body, for body) gets its own copy of the declared list.

Arena is Inactive in CLI Mode

The runtime includes an arena allocator designed for long-running server processes. In CLI mode (elc, elb) the arena is not activated. Memory is managed by ARC (reference counting via el_retain/el_release). The compiler source explicitly calls el_release(tokens) after parsing and el_release(stmt) after codegen to prevent memory exhaustion on large source files.

If you are implementing a new runtime or embedding El, be aware that the ARC model expects callers to release values they are done with.

The extern fn / .elh Separate Compilation Model

elb (the build coordinator) supports separate compilation. When a module changes:

  1. elc --emit-header module.el module.c compiles the module and writes module.elh
  2. module.elh contains extern fn declarations for all public functions
  3. Other modules that import module.el use the .elh header instead of re-parsing the source

The resolve_imports function in compiler.el checks for a .elh file before recursively inlining the .el source. If the header exists, it is used (and the .el is marked as seen to prevent double-inclusion).

This is important for bootstrap: if you have pre-compiled headers lying around from a broken build, they may shadow updated source. Delete .elh files (or use elb --clean) when debugging unexpected compilation behavior.

Import Resolution: Depth-First with Deduplication

resolve_imports in compiler.el:

  1. Walks imports depth-first (dependencies before dependents)
  2. Uses state_set("__elc_imp__:" + path, "1") to deduplicate: each file is included exactly once
  3. Builds the combined source string by concatenating import bodies ahead of the entry file's body
  4. If a .elh header exists for an import, uses that instead of recursing into the .el

The result is one large string that gets passed through lexparsecodegen as a single unit. The codegen emits forward declarations for all functions before any body, so declaration order within the combined source does not matter.

+ Operator Dispatch is Heuristic

El's + operator serves double duty: integer addition and string concatenation. The codegen dispatches based on static analysis of the AST:

  • If either operand is a Str literal → el_str_concat
  • If both operands are provably Int (via is_int_expr) → (a + b)
  • If either operand is a Call or Identel_str_concat (conservative fallback)

The is_int_expr predicate recurses through the AST: literal Int, names in __int_names (from : Int annotations), known Int-returning builtins, and arithmetic BinOps over Int operands all count as "provably Int."

If you write let result = some_int_var + 1 and some_int_var is not annotated : Int, the codegen may emit el_str_concat instead of integer addition. Fix by adding : Int to the variable declaration.

== Operator Dispatch is Also Heuristic

Similarly, == dispatches between str_eq(a, b) (string comparison) and (a == b) (integer comparison) based on operand types. The codegen tracks Int-typed names in __int_names. Two Ident operands where both are known Int-typed use ==; all other Ident-Ident comparisons use str_eq.

This means comparing two integer variables that were not annotated : Int can silently produce str_eq on what are actually integer values — and str_eq treats them as const char* pointers, producing incorrect results or segfaults.

Rule: always annotate variables : Int when they will participate in == comparisons or + arithmetic.

Capability Kind Enforcement

The codegen classifies programs into three capability tiers based on top-level declarations:

  • cgi block present → full capability (all primitives allowed)
  • service block present → restricted (no llm_call_agentic, llm_register_tool, dharma_emit, dharma_field)
  • Neither → utility (no DHARMA, no LLM)

Violations are collected during codegen and emitted as #error directives at the bottom of the generated C. The downstream cc step then fails with a clear message naming the forbidden call.

The __no_block_expr Parse Guard

When parsing the condition of if, while, for, and match, the parser sets state_set("__no_block_expr", "1"). This prevents parse_primary from treating a { as the start of a Map literal — instead it returns { "expr": "Nil" } and the caller sees the { and treats it as the block delimiter.

Without this guard, if a || b { ... } would recurse into parse_expr for b, hit {, try to parse it as a Map literal, fail to find string keys, loop in error-recovery mode, and hang.

Codegen Streams Output via println

The codegen does not build the output as a string — it calls println() for each line as it is emitted. The compile() / compile_js() / codegen() functions return "". Output goes to stdout.

This design avoids O(n²) string concatenation for large programs. It also means you cannot capture the compiler's output in a variable within El itself — you must redirect stdout at the OS level (elc source.el > output.c).

When writing to a file, elc detects the output path argument, redirects C's stdout to the file (via freopen in the runtime), and the println calls go there instead.