# El Language Bootstrap Guide This document is the authoritative guide for reconstructing the El compiler toolchain from scratch. If the bootstrap binary at `dist/platform/elc` is ever lost, this document is the path back. --- ## 1. The Bootstrap Chain (Current State) ### The Trust Chain El is a self-hosting language. The compiler is written in El. This creates a circular dependency: you need an El compiler to compile the El compiler. The chain is resolved by a seed binary: ``` dist/platform/elc (Mach-O arm64 native binary) ↓ compiles elc-cli.el ↓ new self-hosted elc binary ↓ compiles itself again (identity check) ↓ stable self-hosted compiler ``` The binary at `dist/platform/elc` is a **Mach-O 64-bit arm64 executable**. The `elc.preselfhost` and `elc.legacy` files in the same directory are older snapshots kept as fallback checkpoints. The key property: every binary in `dist/platform/` was produced by compiling the El source in `el-compiler/src/` using a previous version of that same binary. The chain is auditable: the source is the ground truth, not the binary. ### The Self-Hosting Pipeline ``` elc-cli.el imports → el-compiler/src/compiler.el imports → el-compiler/src/lexer.el imports → el-compiler/src/parser.el imports → el-compiler/src/codegen.el imports → el-compiler/src/codegen-js.el ``` Import resolution is textual. `compiler.el` recursively inlines all imported `.el` files before lex/parse. The result is one large unified source string that the compiler then processes in a single pass. `elc-combined.el` in the repo root is a pre-merged single-file edition used during early bootstrap iterations. ### What the Bootstrap Binary Actually Is The `dist/platform/elc` binary is a compiled El program that was produced by running an earlier version of itself on `elc-cli.el`. It is not a Rust binary. The `elc.legacy` and `elc.preselfhost` checkpoints suggest the chain has been continuously self-hosting and re-stamped. The original genesis compiler (referenced in the language spec as a "Rust genesis compiler") was used to produce the first self-hosted binary; that Rust binary is not present in this repo. To rebuild the current binary from source using the current binary: ```bash cd /path/to/el ./dist/platform/elc elc-cli.el elc-new.c cc -std=c11 -I el-compiler/runtime -lcurl -lpthread \ -o dist/platform/elc-new \ elc-new.c el-compiler/runtime/el_runtime.c ``` Verify self-hosting by using `elc-new` to recompile itself and diffing the outputs. --- ## 2. The Language ### 2.1 Lexical Structure El source is UTF-8. File extension `.el`. Comments are single-line only: `//` to end of line. **Token representation:** every token is a map `{ "kind": String, "value": String }`. **Keywords** — from `keyword_kind()` in `lexer.el`: | Keyword | Token Kind | Notes | |---------|-----------|-------| | `let` | `Let` | variable binding | | `fn` | `Fn` | function definition | | `type` | `Type` | struct definition | | `enum` | `Enum` | enum definition | | `match` | `Match` | pattern match | | `return` | `Return` | function return | | `if` | `If` | conditional | | `else` | `Else` | | | `for` | `For` | iteration | | `in` | `In` | used in `for x in list` | | `while` | `While` | loop | | `import` | `Import` | module import | | `from` | `From` | `from mod import { Name }` | | `as` | `As` | (reserved, no parse form) | | `with` | `With` | (reserved) | | `sealed` | `Sealed` | (reserved) | | `activate` | `Activate` | (reserved) | | `where` | `Where` | (reserved) | | `test` | `Test` | (reserved) | | `seed` | `Seed` | (reserved) | | `assert` | `Assert` | (reserved) | | `protocol` | `Protocol` | (reserved) | | `impl` | `Impl` | (reserved) | | `retry` | `Retry` | reserved / soft keyword in expr position | | `times` | `Times` | reserved / soft keyword | | `fallback` | `Fallback` | reserved / soft keyword | | `reason` | `Reason` | reserved / soft keyword | | `parallel` | `Parallel` | reserved / soft keyword | | `trace` | `Trace` | reserved / soft keyword | | `requires` | `Requires` | reserved / soft keyword | | `deploy` | `Deploy` | reserved / soft keyword | | `to` | `To` | reserved / soft keyword | | `via` | `Via` | reserved / soft keyword | | `target` | `Target` | **RESERVED — cannot use as identifier** | | `true` | `Bool` | literal value `true` | | `false` | `Bool` | literal value `false` | | `cgi` | `Cgi` | CGI identity block | | `service` | `Service` | service declaration block | | `manager` | `Manager` | VBD role decorator / soft keyword | | `engine` | `Engine` | VBD role decorator / soft keyword | | `accessor` | `Accessor` | VBD role decorator / soft keyword | | `vessel` | `Vessel` | soft keyword | | `extern` | `Extern` | `extern fn` forward declaration | **Soft keywords** (`target`, `to`, `via`, `deploy`, `reason`, `times`, `fallback`, `retry`, `parallel`, `trace`, `requires`, `where`, `as`, `with`, `manager`, `engine`, `accessor`, `vessel`): these have dedicated token kinds but the parser re-interprets them as `Ident` nodes when they appear in expression position (e.g., as parameter names or local variable names). **All token kinds:** | Kind | Pattern | |------|---------| | `Int` | `[0-9]+` | | `Float` | `[0-9]+ '.' [0-9]+` | | `Str` | `"…"` with `\"`, `\n`, `\t`, `\r`, `\\` escapes | | `Bool` | `true` or `false` | | `Ident` | `[a-zA-Z_][a-zA-Z0-9_]*` (not a keyword) | | keyword tokens | one per keyword above | | `Eq` | `=` | | `EqEq` | `==` | | `NotEq` | `!=` | | `Not` | `!` | | `Lt` / `LtEq` / `Gt` / `GtEq` | `<` `<=` `>` `>=` | | `And` | `&&` (single `&` is consumed and discarded) | | `Or` | `\|\|` | | `Pipe` | `\|` | | `PipeOp` | `\|>` | | `Plus` / `Minus` / `Star` / `Slash` | `+` `-` `*` `/` | | `Percent` | `%` | | `Arrow` | `->` | | `FatArrow` | `=>` | | `Colon` / `ColonColon` | `:` `::` | | `LParen` / `RParen` | `(` `)` | | `LBrace` / `RBrace` | `{` `}` | | `LBracket` / `RBracket` | `[` `]` | | `Comma` / `Dot` / `Semicolon` | `,` `.` `;` | | `At` | `@` | | `QuestionMark` | `?` | | `Eof` | end-of-input sentinel | **String comment stripping:** the lexer contains a special heuristic for string literals that embed JavaScript or CSS (`looks_like_code`). If a string contains ``. The `"expr"` or `"stmt"` key names the node type. **Expression nodes:** | `expr` value | Fields | Meaning | |-------------|--------|---------| | `Int` | `value: String` | integer literal | | `Float` | `value: String` | float literal | | `Str` | `value: String` | string literal | | `Bool` | `value: String` | `"true"` or `"false"` | | `Nil` | — | null / missing | | `Ident` | `name: String` | identifier reference | | `BinOp` | `op: String`, `left`, `right` | binary operation | | `Not` | `inner` | unary `!` | | `Neg` | `inner` | unary `-` | | `Call` | `func`, `args: [expr]` | function call | | `Field` | `object`, `field: String` | `obj.field` | | `Index` | `object`, `index` | `obj[idx]` | | `Array` | `elems: [expr]` | `[e1, e2, …]` | | `Map` | `pairs: [{ key: String, value: expr }]` | `{ "k": v, … }` | | `If` | `cond`, `then: [stmt]`, `else: [stmt]`, `has_else: Bool` | conditional expression | | `For` | `item: String`, `list`, `body: [stmt]` | for-in expression | | `Match` | `subject`, `arms: [{ pattern, body }]` | pattern match | | `DurationLit` | `count: String`, `unit: String` | `30.seconds`, `1.hour` | | `Try` | `inner` | postfix `?` (no-op passthrough today) | **Binary operators** (`op` field values): `Plus`, `Minus`, `Star`, `Slash`, `EqEq`, `NotEq`, `Lt`, `Gt`, `LtEq`, `GtEq`, `And`, `Or`. **Operator precedence** (higher = tighter binding): | Level | Operators | |-------|-----------| | 6 | `Star`, `Slash` | | 5 | `Plus`, `Minus` | | 4 | `Lt`, `Gt`, `LtEq`, `GtEq` | | 3 | `EqEq`, `NotEq` | | 2 | `And` | | 1 | `Or` | **Pattern nodes** (used inside `Match` arms): | `pattern` value | Fields | Meaning | |----------------|--------|---------| | `Wildcard` | — | `_` — always matches | | `Binding` | `name: String` | binds subject to name | | `LitInt` | `value: String` | integer literal pattern | | `LitStr` | `value: String` | string literal pattern | | `LitBool` | `value: String` | boolean literal pattern | **Statement nodes:** | `stmt` value | Fields | Meaning | |-------------|--------|---------| | `Let` | `name: String`, `value: expr`, `type: String` | variable binding | | `Assign` | `name: String`, `value: expr` | bare reassignment `name = expr` | | `Return` | `value: expr` | return statement | | `While` | `cond: expr`, `body: [stmt]` | while loop | | `For` | `item: String`, `list: expr`, `body: [stmt]` | for-in loop | | `FnDef` | `name: String`, `params: [param]`, `body: [stmt]`, `ret_type: String`, `decorator?: String` | function definition | | `ExternFn` | `name: String`, `params: [param]`, `ret_type: String` | forward declaration | | `TypeDef` | `name: String`, `fields: [{ name: String }]` | struct type definition | | `EnumDef` | `name: String`, `variants: [{ name: String }]` | enum definition | | `Import` | `path: String` | `import "file.el"` or `from mod import { … }` | | `CgiBlock` | `name`, `dharma_id`, `principal`, `network`, `engram`, `has_*: Bool` | CGI identity declaration | | `ServiceBlock` | `name`, `sponsor`, `domain` | service declaration | | `Expr` | `value: expr` | bare expression statement | **Param nodes:** `{ "name": String, "type": String }` where `type` is the leading identifier of the type annotation (e.g., `"Int"`, `"String"`, `"Map"`) or `""` if unannotated. ### 2.3 The Type System Type annotations are parsed and stored but not type-checked at compile time. They serve as documentation and as hints to the codegen for arithmetic dispatch. **Built-in types:** | Type | C representation | Notes | |------|-----------------|-------| | `String` | `const char*` cast to `el_val_t` | via `EL_STR()` macro | | `Int` | `int64_t` | direct | | `Bool` | `int64_t` | `0` = false, nonzero = true | | `Float` | `int64_t` | bit-cast double via `el_from_float()` | | `Void` | `void` | functions returning nothing | | `Any` | `void*` cast to `el_val_t` | generic containers | | `[T]` | `el_val_t` | pointer to ElList struct | | `Map` | `el_val_t` | pointer to ElMap struct | **Temporal types** (first-class in codegen): | Type | Representation | Notes | |------|---------------|-------| | `Instant` | nanoseconds since Unix epoch as `int64_t` | `now()` returns this | | `Duration` | signed nanoseconds as `int64_t` | `30.seconds` = `30 * 1000000000` | | `Calendar` | pointer to heap-allocated struct | `earth_calendar(zone)` | | `CalendarTime` | pointer to heap-allocated struct | `now_in(cal)` | | `LocalDate` | pointer to heap-allocated struct | `local_date(y, m, d)` | | `LocalTime` | nanoseconds since midnight, direct `int64_t` | `local_time(h, m, s, ns)` | | `Zone` | pointer to heap-allocated struct | `zone("America/New_York")` | | `Rhythm` | pointer to heap-allocated struct | recurrence pattern | The codegen tracks type-annotated variable names in per-function process state (`__int_names`, `__instant_names`, `__duration_names`, etc.) to dispatch arithmetic and comparisons through the correct runtime wrappers. Type-mismatched operations (e.g., `Instant + Instant`) are emitted as `#error` directives. **Duration postfix literals:** `30.seconds`, `1.hour`, `500.millis`, `30.nanos` are parsed as `DurationLit` AST nodes and compiled to `el_duration_from_nanos(count * multiplier)`. The multipliers: | Unit | Nanoseconds | |------|------------| | `nano` / `nanos` | 1 | | `milli` / `millis` / `millisecond` / `milliseconds` | 1,000,000 | | `second` / `seconds` | 1,000,000,000 | | `minute` / `minutes` | 60,000,000,000 | | `hour` / `hours` | 3,600,000,000,000 | | `day` / `days` | 86,400,000,000,000 | ### 2.4 Key Language Semantics **Implicit return.** The final expression in a function body becomes the return value if it is not a control-flow construct (`If`, `For`). The codegen's `transform_implicit_return` rewrites the last `Expr` statement into a `Return` statement before emitting. **Let-rebinding, not mutation.** El uses `let` for both initial binding and rebinding: ```el let count = 0 let count = count + 1 // NOT mutation — creates a new binding in the same scope ``` The codegen tracks declared names per C scope. When `count` is already in `declared`, it emits `count = count + 1;` (plain assignment). When it is new, it emits `el_val_t count = 0;`. This means **El does not have mutable variables in the traditional sense** — every `let` is a potential redeclaration. The practical effect is that shadowing and in-place update use identical syntax. **Bare reassignment.** The parser also handles `name = expr` (without `let`) when an `Ident` is immediately followed by `Eq`. This emits a plain C assignment. **`target` is reserved.** The word `target` is lexed as the `Target` token kind — it cannot be used as a variable or parameter name. Use `tgt` or another name instead. This is a live gotcha in `compiler.el` itself, which uses `tgt` for exactly this reason. **`__no_block_expr` guard.** The parser uses process state key `__no_block_expr` to suppress Map-literal parsing when parsing the condition of `if`, `while`, `for`, and `match`. This prevents a stray `{` (the start of the then-block) from being parsed as a Map literal. **Arena memory model.** The runtime includes an arena allocator that is activated in server/long-running contexts. In CLI mode (`elc`, `elb`) the arena is inactive. Memory is managed via ARC (reference counting): `el_retain()` and `el_release()` on Lists and Maps. Strings and ints are not refcounted — the retain/release functions are safe no-ops on non-tagged values. --- ## 3. The Runtime API All runtime functions are declared in `el-compiler/runtime/el_runtime.h`. Every compiled El program links against `el-compiler/runtime/el_runtime.c`. All values are `el_val_t` (`int64_t`). Strings are pointers cast through `int64_t` using `EL_STR(s)` / `EL_CSTR(v)` macros. Canonical compile command: ```bash cc -std=c11 -I el-compiler/runtime -lcurl -lpthread \ -o .c el-compiler/runtime/el_runtime.c ``` ### I/O | Function | Signature | Description | |----------|-----------|-------------| | `println` | `(s) -> Void` | print string + newline to stdout | | `print` | `(s) -> Void` | print string without newline | | `readline` | `() -> String` | read one line from stdin | ### String Operations | Function | Signature | Description | |----------|-----------|-------------| | `el_str_concat` | `(a, b) -> String` | concatenate two strings | | `str_concat` | `(a, b) -> String` | alias for `el_str_concat` | | `str_eq` | `(a, b) -> Bool` | string equality comparison | | `str_starts_with` | `(s, prefix) -> Bool` | prefix test | | `str_ends_with` | `(s, suffix) -> Bool` | suffix test | | `str_contains` | `(s, sub) -> Bool` | substring test | | `str_len` | `(s) -> Int` | byte length | | `str_slice` | `(s, start, end) -> String` | substring (byte offsets) | | `str_replace` | `(s, from, to) -> String` | replace all occurrences | | `str_to_upper` / `str_upper` | `(s) -> String` | uppercase | | `str_to_lower` / `str_lower` | `(s) -> String` | lowercase | | `str_trim` | `(s) -> String` | strip leading/trailing whitespace | | `str_lstrip` / `str_rstrip` | `(s) -> String` | one-sided strip | | `str_index_of` | `(s, sub) -> Int` | position of substring; `-1` if absent | | `str_last_index_of` | `(s, sub) -> Int` | last position | | `str_index_of_all` | `(s, sub) -> [Int]` | all byte offsets (non-overlapping) | | `str_find_chars` | `(s, any_of) -> Int` | first index of any char in set | | `str_split` | `(s, sep) -> [String]` | split on separator | | `str_split_lines` | `(s) -> [String]` | split on newlines | | `str_split_chars` | `(s) -> [String]` | split into individual characters | | `str_split_n` | `(s, sep, n) -> [String]` | split at most `n` times | | `str_join` | `(list, sep) -> String` | join list with separator | | `str_char_at` | `(s, i) -> String` | character at byte index | | `str_char_code` | `(s, i) -> Int` | Unicode code point at index | | `str_pad_left` | `(s, width, pad) -> String` | left-pad to width | | `str_pad_right` | `(s, width, pad) -> String` | right-pad to width | | `str_format` | `(fmt, data) -> String` | `{key}` interpolation | | `str_repeat` | `(s, n) -> String` | repeat string n times | | `str_reverse` | `(s) -> String` | reverse by codepoint | | `str_strip_prefix` | `(s, prefix) -> String` | remove prefix if present | | `str_strip_suffix` | `(s, suffix) -> String` | remove suffix if present | | `str_strip_chars` | `(s, chars) -> String` | strip characters from both ends | | `str_count` | `(s, sub) -> Int` | count non-overlapping occurrences | | `str_count_chars` | `(s) -> Int` | codepoint count | | `str_count_bytes` | `(s) -> Int` | alias for `str_len` | | `str_count_lines` | `(s) -> Int` | line count | | `str_count_words` | `(s) -> Int` | word count | | `str_count_letters` | `(s) -> Int` | ASCII letter count | | `str_count_digits` | `(s) -> Int` | ASCII digit count | | `is_letter` / `is_digit` / `is_alphanumeric` | `(s) -> Bool` | ASCII char classification | | `is_whitespace` / `is_punctuation` | `(s) -> Bool` | | | `is_uppercase` / `is_lowercase` | `(s) -> Bool` | | | `int_to_str` | `(n) -> String` | format integer | | `str_to_int` | `(s) -> Int` | parse integer | | `str_to_float` | `(s) -> Float` | parse float | | `parse_int` | `(s, default) -> Int` | parse with fallback | | `bool_to_str` | `(b) -> String` | format bool | ### Integer/Float Math | Function | Description | |----------|-------------| | `el_abs(n)` | absolute value | | `el_max(a, b)` | maximum | | `el_min(a, b)` | minimum | | `float_to_str(f)` | format float as string | | `int_to_float(n)` | widen Int to Float | | `float_to_int(f)` | truncate Float to Int | | `format_float(f, decimals)` | format with N decimal places | | `decimal_round(f, decimals)` | round to N decimals | | `math_sqrt(f)` | square root | | `math_log(f)` / `math_ln(f)` | logarithms | | `math_sin(f)` / `math_cos(f)` / `math_pi()` | trigonometry | ### List Operations | Function | Description | |----------|-------------| | `el_list_empty()` | create empty list | | `el_list_new(count, …)` | create list from N values (varargs) | | `el_list_len(list)` | length | | `el_list_get(list, i)` | element at index; `0` on out-of-bounds | | `el_list_append(list, e)` | append; returns updated list | | `el_list_clone(list)` | shallow copy | | `list_push(list, e)` | alias for `el_list_append` | | `list_push_front(list, e)` | prepend | | `list_join(list, sep)` | join to string | | `list_range(start, end)` | integer range `[start, end)` | | `native_list_empty()` | alias for `el_list_empty` (used in compiler source) | | `native_list_append(l, v)` | alias for `el_list_append` | | `native_list_get(l, idx)` | alias for `el_list_get` | | `native_list_len(l)` | alias for `el_list_len` | | `native_list_clone(l)` | alias for `el_list_clone` | | `append(l, e)` | method-call alias: `list.append(e)` | | `len(l)` | method-call alias: `list.len()` | | `get(l, i)` | method-call alias: `list.get(i)` | ### Map Operations | Function | Description | |----------|-------------| | `el_map_new(count, …)` | create map from key/value pairs (varargs) | | `el_map_get(map, key)` | get value by key | | `el_map_set(map, key, value)` | set key; returns map | | `el_get_field(map, key)` | alias; emitted for `.field` access | | `map_get(map, key)` | method-call alias | | `map_set(map, key, value)` | method-call alias | ### ARC (Reference Counting) | Function | Description | |----------|-------------| | `el_retain(v)` | increment refcount; no-op for non-heap values | | `el_release(v)` | decrement refcount; free when zero | ### In-Process State | Function | Description | |----------|-------------| | `state_set(key, value)` | store in process-global key/value table | | `state_get(key)` | retrieve; `""` if absent | | `state_del(key)` | delete key | | `state_keys()` | all keys as `[String]` | ### Filesystem | Function | Description | |----------|-------------| | `fs_read(path)` | read file to string; `""` on error | | `fs_write(path, content)` | write string; returns `1` on success | | `fs_write_bytes(path, bytes, length)` | write raw bytes of known length | | `fs_list(path)` | list directory entries | | `fs_exists(path)` | check if path exists | | `fs_mkdir(path)` | mkdir -p | ### HTTP Client | Function | Description | |----------|-------------| | `http_get(url)` | GET; returns body string | | `http_post(url, body)` | POST; returns body string | | `http_post_json(url, json_body)` | POST with Content-Type: application/json | | `http_get_with_headers(url, headers_map)` | GET with custom headers | | `http_post_with_headers(url, body, headers_map)` | POST with custom headers | | `http_post_form_auth(url, form_body, auth_header)` | POST with auth | | `http_delete(url)` | DELETE | | `http_get_to_file(url, headers_map, output_path)` | stream response to file | | `http_post_to_file(url, body, headers_map, output_path)` | stream POST response to file | | `http_response(status, headers_json, body)` | build response envelope | | `url_encode(s)` | RFC 3986 percent-encoding | | `url_decode(s)` | URL decode | | `el_html_sanitize(html, allowlist_json)` | allowlist HTML sanitizer | ### HTTP Server | Function | Description | |----------|-------------| | `http_serve(port, handler)` | start server; handler: `(method, path, body) -> String` | | `http_serve_v2(port, handler)` | start server; handler: `(method, path, headers_map, body) -> String` | | `http_set_handler(name)` | set handler by symbol name | | `http_set_handler_v2(name)` | v2 variant | ### JSON | Function | Description | |----------|-------------| | `json_get(json, key)` | substring lookup of `"key": value` | | `json_parse(s)` | parse JSON string to List/Map | | `json_stringify(v)` | serialize Any to JSON string | | `json_get_string(j, key)` | typed extract: String | | `json_get_int(j, key)` | typed extract: Int | | `json_get_float(j, key)` | typed extract: Float | | `json_get_bool(j, key)` | typed extract: Bool | | `json_get_raw(j, key)` | extract nested object/array as JSON string | | `json_set(j, key, value)` | update field, return new JSON string | | `json_array_len(j)` | length of JSON array string | | `json_array_get(j, index)` | element at index | | `json_array_get_string(j, index)` | string element at index | ### Time (Epoch-Based) | Function | Description | |----------|-------------| | `time_now()` | Unix epoch milliseconds | | `time_now_utc()` | same, explicit UTC | | `time_format(ts, fmt)` | format timestamp | | `time_to_parts(ts)` | decompose to Map of fields | | `time_from_parts(secs, ns, tz)` | construct timestamp | | `time_add(ts, n, unit)` | add duration | | `time_diff(ts1, ts2, unit)` | difference | | `unix_timestamp()` | Unix seconds as Int | | `sleep_secs(secs)` | sleep N seconds | | `sleep_ms(ms)` | sleep N milliseconds | ### Time (First-Class Instant/Duration) | Function | Description | |----------|-------------| | `now()` / `el_now_instant()` | current time as Instant (nanoseconds) | | `unix_seconds(n)` | construct Instant from Unix seconds | | `unix_millis(n)` | construct Instant from Unix milliseconds | | `instant_from_iso8601(s)` | parse ISO 8601 string | | `instant_to_unix_seconds(i)` | extract Unix seconds | | `instant_to_unix_millis(i)` | extract Unix milliseconds | | `instant_to_iso8601(i)` | format as ISO 8601 | | `el_duration_from_nanos(ns)` | construct Duration from nanoseconds | | `duration_seconds(n)` | Duration from seconds | | `duration_millis(n)` | Duration from milliseconds | | `duration_nanos(n)` | Duration from nanoseconds | | `duration_to_seconds(d)` | extract seconds | | `duration_to_millis(d)` | extract milliseconds | | `duration_to_nanos(d)` | extract nanoseconds | | `el_instant_add_dur(inst, dur)` | Instant + Duration | | `el_instant_sub_dur(inst, dur)` | Instant - Duration | | `el_instant_diff(a, b)` | Instant - Instant = Duration | | `el_duration_add/sub/scale/div` | Duration arithmetic | | `el_instant_lt/le/gt/ge/eq/ne` | Instant comparison | | `el_duration_lt/le/gt/ge/eq/ne` | Duration comparison | | `el_sleep_duration(dur)` | sleep for a Duration | | `ttl_cache_set(key, value)` | store with TTL | | `ttl_cache_get(key, max_age)` | retrieve if within max_age | | `ttl_cache_age(key)` | age of cached value as Duration | ### Calendar System | Function | Description | |----------|-------------| | `zone(id)` | IANA zone or fixed offset | | `zone_utc()` / `zone_local()` | UTC and local zone | | `zone_offset(hours, minutes)` | fixed offset zone | | `earth_calendar(z)` | Gregorian calendar in zone | | `earth_calendar_default()` | system default | | `mars_calendar()` / `cycle_calendar(period)` | non-Earth calendars | | `no_cycle_calendar()` / `relative_calendar(epoch)` | abstract calendars | | `now_in(cal)` | current time as CalendarTime | | `in_calendar(inst, cal)` | project Instant into Calendar | | `cal_format(ct, pattern)` | format CalendarTime | | `cal_to_instant(ct)` | extract underlying Instant | | `cal_cycle_phase(ct)` / `cal_in(ct, cal)` | calendar ops | | `local_date(y, m, d)` | construct LocalDate | | `local_time(h, m, s, ns)` | construct LocalTime | | `local_datetime(date, time)` | construct LocalDateTime | | `zoned(date, time, cal)` | zoned datetime | | `local_date_year/month/day` | LocalDate accessors | | `local_time_hour/minute/second/nanos` | LocalTime accessors | | `el_local_date_add_dur` / `el_local_time_add_dur` | date/time arithmetic | | `el_local_date_lt` / `el_local_date_eq` | date comparison | | `rhythm_*` | recurrence patterns (cycle_start, weekday, weekly_at, next_after, matches, …) | ### Process / Execution | Function | Description | |----------|-------------| | `args()` | command-line arguments as `[String]` (excludes argv[0]) | | `env(key)` | read environment variable; `""` if unset | | `exit(code)` | exit process with code | | `exit_program(code)` | alias for `exit` | | `getpid_now()` | current process ID | | `exec_command(cmd)` | run shell command; return exit code | | `exec_capture(cmd)` | run shell command; capture and return stdout | | `uuid_new()` / `uuid_v4()` | generate UUID v4 | | `native_int_to_str(n)` | format integer (alias, used in compiler source) | | `native_string_chars(s)` | split string into `[String]` of single characters | ### Crypto | Function | Description | |----------|-------------| | `sha256_hex(input)` | SHA-256, hex output | | `sha256_bytes(input)` | SHA-256, raw bytes | | `hmac_sha256_hex(key, msg)` | HMAC-SHA-256, hex | | `hmac_sha256_bytes(key, msg)` | HMAC-SHA-256, raw bytes | | `base64_encode(input)` / `base64_decode(input)` | standard base64 | | `base64url_encode(input)` / `base64url_decode(input)` | URL-safe base64 | | `sha3_256_hex(input)` | SHA3-256 (Keccak) | | `pq_keygen_signature()` | Dilithium-3 key pair | | `pq_sign(sk_hex, msg)` / `pq_verify(pk_hex, msg, sig_hex)` | PQ signatures | | `pq_kem_keygen()` / `pq_kem_encaps(pk)` / `pq_kem_decaps(sk, ct)` | Kyber-768 KEM | | `pq_hybrid_keygen()` / `pq_hybrid_handshake(remote_pub)` | X25519 + Kyber hybrid | | `aead_encrypt(key_hex, plaintext)` | AES-256-GCM encrypt | | `aead_decrypt(key_hex, nonce_hex, ct_hex)` | AES-256-GCM decrypt | ### DHARMA Network (CGI programs only) | Function | Description | |----------|-------------| | `el_cgi_init(name, dharma_id, principal, network, engram)` | initialize CGI identity (called by generated `main()`) | | `dharma_connect(cgi_id)` | open channel to peer | | `dharma_send(channel, content)` | send message; blocks for response | | `dharma_activate(query)` | spreading activation across DHARMA network | | `dharma_emit(event_type, payload)` | emit network event (@manager only) | | `dharma_field(event_type)` | wait for event (@manager only) | | `dharma_strengthen(cgi_id, weight)` | Hebbian potentiation | | `dharma_relationship(cgi_id)` | current relationship weight | | `dharma_peers()` | all connected peers sorted by weight | ### Engram Knowledge Graph | Function | Description | |----------|-------------| | `engram_node(content, type, salience)` | create node; returns ID | | `engram_node_full(content, type, label, salience, importance, confidence, tier, tags)` | full node creation | | `engram_node_layered(…, layer_id)` | create node in specific layer | | `engram_get_node(id)` | retrieve node by ID | | `engram_strengthen(node_id)` | Hebbian potentiation | | `engram_forget(node_id)` | delete node and edges | | `engram_node_count()` | total node count | | `engram_edge_count()` | total edge count | | `engram_search(query, limit)` | full-text search | | `engram_scan_nodes(limit, offset)` | paginated node scan | | `engram_connect(from, to, weight, relation)` | create directed edge | | `engram_edge_between(from, to)` | get edge | | `engram_neighbors(node_id)` | BFS neighbors | | `engram_neighbors_filtered(node_id, max_depth, direction)` | filtered BFS | | `engram_activate(query, depth)` | spreading activation | | `engram_save(path)` / `engram_load(path)` | snapshot to/from disk | | `engram_add_layer(name, priority, suppressible, transparent, injectable)` | add consciousness layer | | `engram_remove_layer(layer_id)` / `engram_list_layers()` | layer management | | `engram_*_json` variants | JSON-string versions of search/scan/activate | | `engram_compile_layered_json(intent, depth)` | prompt-ready context block | ### LLM (Anthropic API) | Function | Description | |----------|-------------| | `llm_call(model, prompt)` | single-turn call | | `llm_call_system(model, system, user)` | call with system prompt | | `llm_call_agentic(model, system, user, tools)` | agentic call with tools (CGI only) | | `llm_vision(model, system, prompt, image)` | vision call | | `llm_models()` | list available models | | `llm_register_tool(name, handler_fn_name)` | register tool handler (CGI only) | ### Observability | Function | Description | |----------|-------------| | `emit_log(level, msg, fields_json)` | emit OTLP log | | `emit_metric(name, value, tags_json)` | emit OTLP metric | | `trace_span_start(name)` | start trace span | | `trace_span_end(span_handle)` | end trace span | | `emit_event(name, duration_ms)` | emit event | --- ## 4. How to Re-Bootstrap from Zero This section assumes the bootstrap binary is gone. Everything else (source files, runtime) is intact. ### What You Need to Implement A minimal El compiler has three parts: lexer, parser, codegen. Each can be written in any language. The goal is to compile `elc-cli.el` into a working `elc` binary, after which El is self-hosting again. ### Step 1: Write a Minimal Lexer The lexer must produce a list of `{ "kind": String, "value": String }` maps (or equivalent structures). Required token kinds: `Int`, `Float`, `Str`, `Bool`, `Ident`, `Eof`, and all keywords and operators listed in section 2.1. The minimal subset needed to compile the compiler itself: - Keywords: `let`, `fn`, `return`, `if`, `else`, `while`, `for`, `in`, `import`, `from`, `true`, `false`, `extern` - Literals: `Int`, `Str`, `Bool`, `Ident` - Operators: `=`, `==`, `!=`, `!`, `<`, `>`, `<=`, `>=`, `&&`, `||`, `+`, `-`, `*`, `/`, `->`, `=>`, `:`, `,`, `.`, `(`, `)`, `{`, `}`, `[`, `]`, `@`, `?` - Special: `Eof` The lexer in `lexer.el` walks a char array using `native_list_get` to avoid O(n²) string slicing. A Python implementation can use a simple index into a string. Escapes to handle: `\"`, `\n`, `\t`, `\r`, `\\`. ### Step 2: Write a Minimal Parser The parser is a standard recursive descent parser. It produces AST maps as described in section 2.2. The minimal statement forms needed to compile the compiler: - `let name [: Type] = expr` - `fn name(params) [-> Type] { body }` - `extern fn name(params) [-> Type]` - `return expr` - `while cond { body }` - `for item in list { body }` - `if cond { body } [else [if] { body }]` - `import "path"` - `from module import { … }` - `@decorator stmt` - `name = expr` (bare assignment) - bare expression statement The minimal expression forms: - Integer, float, string, bool literals - Identifier - Binary operations with the precedence table from section 2.2 - Unary `!` and `-` - Function call: `f(a, b, …)` - Method call: `obj.method(args)` (parsed as Call with Field func) - Field access: `obj.field` - Index access: `obj[i]` - Array literal: `[e1, e2, …]` - Map literal: `{ "key": value, … }` - `if` as expression - `match` expression - Postfix `?` (can be a no-op) - Duration literal: `N.unit` The `__no_block_expr` guard (section 2.4) is important: without it, `if a || b { ... }` will incorrectly parse `{` as a Map literal. ### Step 3: Write a Minimal Codegen The codegen emits C11 source. Required output structure: ```c #include #include #include "el_runtime.h" // Forward declarations for all non-main functions el_val_t fn_name(el_val_t p1, el_val_t p2); ... // File-scope let bindings (if any) el_val_t GLOBAL_NAME; // Function bodies el_val_t fn_name(el_val_t p1, el_val_t p2) { ... return 0; } // Entry point int main(int _argc, char** _argv) { el_runtime_init_args(_argc, _argv); ... return 0; } ``` Critical codegen rules: 1. **All values are `el_val_t`**. Every parameter, local variable, and return type is `el_val_t` unless the function has `ret_type == "Void"` (use `void`). 2. **Let-rebinding**: track declared names per C scope. Emit `el_val_t name = val;` on first occurrence; emit `name = val;` on subsequent occurrences of the same name in the same scope. 3. **`+` dispatch**: if either operand is a string literal → `el_str_concat(a, b)`. If both are provably integers → `(a + b)`. Default fallback → `el_str_concat`. 4. **`==` dispatch**: if either operand is a string or identifier → `str_eq(a, b)`. If both are integer literals or provably Int → `(a == b)`. 5. **String literals**: wrap in `EL_STR("…")` and escape: `\"` → `\\\"`, `\n` → `\\n`, `\t` → `\\t`, `\\` → `\\\\`. 6. **Map literals**: `el_map_new(N, "k1", v1, "k2", v2, …)`. Empty map: `el_map_new(0)`. 7. **Array literals**: `el_list_new(N, e1, e2, …)`. Empty: `el_list_empty()`. 8. **Index access**: string-literal index → `el_get_field(obj, EL_STR("key"))`. Integer index → `el_list_get(obj, idx)`. 9. **Field access** `obj.field` → `el_get_field(obj, EL_STR("field"))`. 10. **Method call** `obj.method(args)` → `method(obj, args)`. 11. **`for item in list`** → emit: ```c { el_val_t _el_lst = ; el_val_t _el_len = el_list_len(_el_lst); for (el_val_t _el_i = 0; _el_i < _el_len; _el_i++) { el_val_t item = el_list_get(_el_lst, _el_i); } } ``` 12. **`match`** → GCC/Clang statement expression with `goto`: ```c ({ el_val_t _s = ; el_val_t _r = 0; if (_s == 42) { _r = ; goto _done; } if (str_eq(_s, EL_STR("str"))) { _r = ; goto _done; } { _r = ; goto _done; } _done:; _r; }) ``` 13. **`if` as expression** → similarly wrapped in a GCC/Clang statement expression. 14. **Implicit return**: if the last statement in a function body is a bare `Expr` (not `If` or `For`), emit it as `return ;` instead of `;`. 15. **Float literals**: emit as `el_from_float()`. 16. **Bool literals**: `true` → `1`, `false` → `0`. 17. **`fn main()`**: do not emit as a regular `el_val_t` function. Instead, fold its body into C's `int main()` after any top-level statements. 18. **`extern fn`**: emit only a forward declaration (no body). 19. **Forward declarations**: scan for all `FnDef` nodes before emitting bodies. This enables mutual recursion. ### Step 4: Compile the El Compiler Using your minimal implementation, compile `elc-cli.el` (which imports the entire compiler chain): ```bash # Your minimal compiler python3 minimal_elc.py elc-cli.el > elc-new.c # Build with the runtime cc -std=c11 -I el-compiler/runtime -lcurl -lpthread \ -o elc-new elc-new.c el-compiler/runtime/el_runtime.c ``` ### Step 5: Verify Self-Hosting ```bash # Compile elc-cli.el with the new compiler ./elc-new elc-cli.el elc-v2.c cc -std=c11 -I el-compiler/runtime -lcurl -lpthread \ -o elc-v2 elc-v2.c el-compiler/runtime/el_runtime.c # Compile again with the second-generation compiler ./elc-v2 elc-cli.el elc-v3.c # The outputs should be identical diff elc-v2.c elc-v3.c ``` A clean diff confirms you have a stable fixed point: the compiler reproduces itself exactly. ### Step 6: Replace the Bootstrap Binary ```bash cp elc-v2 dist/platform/elc ``` You are bootstrapped. ### Minimal El Subset for the Compiler Itself The El compiler source (`lexer.el`, `parser.el`, `codegen.el`, `compiler.el`) uses: - `fn`, `let`, `while`, `if`/`else`, `return`, `for`/`in`, `import` - `extern fn` (for `.elh` headers) - `String`, `Int`, `Bool`, `Void`, `Any`, `Map`, `[String]`, `[Map]` - Map literals `{ "key": val }` - Array literals `[...]` (and `native_list_empty()`) - List operations: `native_list_empty()`, `native_list_append()`, `native_list_get()`, `native_list_len()`, `native_list_clone()` - String operations: `str_join()`, `str_eq()`, `str_contains()`, `str_starts_with()`, `str_slice()`, `str_trim()`, `str_split()`, `str_index_of()`, `str_len()`, `str_to_int()`, `native_string_chars()`, `native_int_to_str()` - `state_get()`, `state_set()` - `println()`, `fs_read()`, `fs_write()`, `exit()` - `el_release()` (ARC cleanup) The compiler does not use: HTTP, engram, dharma, LLM, crypto, UUID, float arithmetic. --- ## 5. The Long-Term Solution: elvm ### Why a VM Makes Bootstrapping More Auditable The current bootstrap chain relies on trusting a binary whose source we cannot fully audit by inspection alone. This is the classic "trusting trust" problem (Ken Thompson, 1984). A virtual machine breaks the chain: - `elc` targets `elvm` bytecode (instead of C) - `elvm` is a minimal interpreter hand-written in ~500 lines of C - The hand-written C is small enough to audit completely - Anyone can compile `elvm.c` with any C compiler - From there: `elvm` interprets `elc.elvm` → `elc` compiles El → `cc` builds native binaries The benefit: the trusted base shrinks from "a Mach-O binary" to "500 lines of straightforward C code that anyone can read in an afternoon." ### The elvm Design A minimal elvm needs: - A stack or register machine (stack is simpler) - Instructions: push, pop, add, sub, mul, div, cmp, jump, call, return, load, store - A string table (El strings are mostly literals) - A heap for ElList and ElMap - An FFI table mapping El runtime builtins to C functions The El compiler would gain a `--target=elvm` flag in `compile_dispatch()`. Codegen would emit bytecode instead of C text. The runtime interface stays the same — builtins map to FFI slots by name. This is the planned path. It does not exist yet. --- ## 6. Compiler Source Map | File | Role | Lines | |------|------|-------| | `elc-cli.el` | Entry point; imports compiler.el | 7 | | `el-compiler/src/compiler.el` | Pipeline wiring: lex → parse → codegen. Import resolution, `--emit-header`, `fn main()`. Defines `compile()`, `compile_js()`, `compile_dispatch()`, `resolve_imports()` | 298 | | `el-compiler/src/lexer.el` | Tokenizer. `lex(source)` → token list. Char helpers, keyword lookup, scan_digits, scan_ident, scan_string, strip_code_comments | 747 | | `el-compiler/src/parser.el` | Recursive descent parser. `parse(tokens)` → AST. All statement and expression forms | 1071 | | `el-compiler/src/codegen.el` | C code emitter. `codegen(stmts, source)` → (streams to stdout). Expression codegen, statement codegen, function codegen, type tracking, capability enforcement, temporal type dispatch | 2721 | | `el-compiler/src/codegen-js.el` | JavaScript backend. `codegen_js(stmts, source)` → JS source | ~500 | | `el-compiler/runtime/el_runtime.h` | Full runtime API declaration | 755 | | `el-compiler/runtime/el_runtime.c` | Full runtime implementation | large | | `el-compiler/runtime/el_runtime.js` | JS runtime | — | | `elb.el` | Build coordinator. Reads `manifest.el`, walks import graph, compiles modules, links binary. The `.NET`-style incremental build model | 367 | | `elc-combined.el` | Pre-merged single-file bootstrap edition (for early bootstrap iterations) | large | | `spec/language.md` | Language specification v1.2.0 | — | | `dist/platform/elc` | Current bootstrap binary (Mach-O arm64) | — | --- ## 7. Key Decisions and Gotchas ### `target` is a Reserved Keyword `target` is lexed as the `Target` token kind. It cannot be used as a variable or parameter name anywhere in El source. If you write `fn compile(target: String)`, the parameter name will be tokenized as `Target`, which the parser does not recognize as an `Ident` in parameter position. **Workaround:** use `tgt`, `dest`, `backend`, or any other name. The compiler source uses `tgt` specifically for this reason. This comes up whenever writing code that handles compilation targets. ### `let x = x + 1` is Let-Rebinding, Not Mutation El has no mutable variables. `let count = count + 1` re-introduces `count` into the current scope, shadowing the previous binding. At the C level, the codegen tracks declared names and emits plain assignment for subsequent bindings of the same name: - First `let count = 0` → `el_val_t count = 0;` - Second `let count = count + 1` → `count = count + 1;` This means you cannot have two different values named `count` in the same C scope — the second binding overwrites the first. This is by design. Scoped shadowing works correctly because each block (if body, while body, for body) gets its own copy of the `declared` list. ### Arena is Inactive in CLI Mode The runtime includes an arena allocator designed for long-running server processes. In CLI mode (`elc`, `elb`) the arena is not activated. Memory is managed by ARC (reference counting via `el_retain`/`el_release`). The compiler source explicitly calls `el_release(tokens)` after parsing and `el_release(stmt)` after codegen to prevent memory exhaustion on large source files. If you are implementing a new runtime or embedding El, be aware that the ARC model expects callers to release values they are done with. ### The `extern fn` / `.elh` Separate Compilation Model `elb` (the build coordinator) supports separate compilation. When a module changes: 1. `elc --emit-header module.el module.c` compiles the module and writes `module.elh` 2. `module.elh` contains `extern fn` declarations for all public functions 3. Other modules that import `module.el` use the `.elh` header instead of re-parsing the source The `resolve_imports` function in `compiler.el` checks for a `.elh` file before recursively inlining the `.el` source. If the header exists, it is used (and the `.el` is marked as seen to prevent double-inclusion). This is important for bootstrap: if you have pre-compiled headers lying around from a broken build, they may shadow updated source. Delete `.elh` files (or use `elb --clean`) when debugging unexpected compilation behavior. ### Import Resolution: Depth-First with Deduplication `resolve_imports` in `compiler.el`: 1. Walks imports depth-first (dependencies before dependents) 2. Uses `state_set("__elc_imp__:" + path, "1")` to deduplicate: each file is included exactly once 3. Builds the combined source string by concatenating import bodies ahead of the entry file's body 4. If a `.elh` header exists for an import, uses that instead of recursing into the `.el` The result is one large string that gets passed through `lex` → `parse` → `codegen` as a single unit. The codegen emits forward declarations for all functions before any body, so declaration order within the combined source does not matter. ### `+` Operator Dispatch is Heuristic El's `+` operator serves double duty: integer addition and string concatenation. The codegen dispatches based on static analysis of the AST: - If either operand is a `Str` literal → `el_str_concat` - If both operands are provably `Int` (via `is_int_expr`) → `(a + b)` - If either operand is a `Call` or `Ident` → `el_str_concat` (conservative fallback) The `is_int_expr` predicate recurses through the AST: literal `Int`, names in `__int_names` (from `: Int` annotations), known Int-returning builtins, and arithmetic BinOps over Int operands all count as "provably Int." If you write `let result = some_int_var + 1` and `some_int_var` is not annotated `: Int`, the codegen may emit `el_str_concat` instead of integer addition. Fix by adding `: Int` to the variable declaration. ### `==` Operator Dispatch is Also Heuristic Similarly, `==` dispatches between `str_eq(a, b)` (string comparison) and `(a == b)` (integer comparison) based on operand types. The codegen tracks Int-typed names in `__int_names`. Two `Ident` operands where both are known Int-typed use `==`; all other Ident-Ident comparisons use `str_eq`. This means comparing two integer variables that were not annotated `: Int` can silently produce `str_eq` on what are actually integer values — and `str_eq` treats them as `const char*` pointers, producing incorrect results or segfaults. **Rule:** always annotate variables `: Int` when they will participate in `==` comparisons or `+` arithmetic. ### Capability Kind Enforcement The codegen classifies programs into three capability tiers based on top-level declarations: - `cgi` block present → full capability (all primitives allowed) - `service` block present → restricted (no `llm_call_agentic`, `llm_register_tool`, `dharma_emit`, `dharma_field`) - Neither → `utility` (no DHARMA, no LLM) Violations are collected during codegen and emitted as `#error` directives at the bottom of the generated C. The downstream `cc` step then fails with a clear message naming the forbidden call. ### The `__no_block_expr` Parse Guard When parsing the condition of `if`, `while`, `for`, and `match`, the parser sets `state_set("__no_block_expr", "1")`. This prevents `parse_primary` from treating a `{` as the start of a Map literal — instead it returns `{ "expr": "Nil" }` and the caller sees the `{` and treats it as the block delimiter. Without this guard, `if a || b { ... }` would recurse into `parse_expr` for `b`, hit `{`, try to parse it as a Map literal, fail to find string keys, loop in error-recovery mode, and hang. ### Codegen Streams Output via `println` The codegen does not build the output as a string — it calls `println()` for each line as it is emitted. The `compile()` / `compile_js()` / `codegen()` functions return `""`. Output goes to stdout. This design avoids O(n²) string concatenation for large programs. It also means you cannot capture the compiler's output in a variable within El itself — you must redirect stdout at the OS level (`elc source.el > output.c`). When writing to a file, `elc` detects the output path argument, redirects C's `stdout` to the file (via `freopen` in the runtime), and the `println` calls go there instead.