Files
el/lang/BOOTSTRAP.md
T
2026-05-05 01:38:51 -05:00

980 lines
46 KiB
Markdown

# El Language Bootstrap Guide
This document is the authoritative guide for reconstructing the El compiler toolchain from scratch. If the bootstrap binary at `dist/platform/elc` is ever lost, this document is the path back.
---
## 1. The Bootstrap Chain (Current State)
### The Trust Chain
El is a self-hosting language. The compiler is written in El. This creates a circular dependency: you need an El compiler to compile the El compiler. The chain is resolved by a seed binary:
```
dist/platform/elc (Mach-O arm64 native binary)
compiles elc-cli.el
new self-hosted elc binary
compiles itself again (identity check)
stable self-hosted compiler
```
The binary at `dist/platform/elc` is a **Mach-O 64-bit arm64 executable**. The `elc.preselfhost` and `elc.legacy` files in the same directory are older snapshots kept as fallback checkpoints.
The key property: every binary in `dist/platform/` was produced by compiling the El source in `el-compiler/src/` using a previous version of that same binary. The chain is auditable: the source is the ground truth, not the binary.
### The Self-Hosting Pipeline
```
elc-cli.el
imports → el-compiler/src/compiler.el
imports → el-compiler/src/lexer.el
imports → el-compiler/src/parser.el
imports → el-compiler/src/codegen.el
imports → el-compiler/src/codegen-js.el
```
Import resolution is textual. `compiler.el` recursively inlines all imported `.el` files before lex/parse. The result is one large unified source string that the compiler then processes in a single pass.
`elc-combined.el` in the repo root is a pre-merged single-file edition used during early bootstrap iterations.
### What the Bootstrap Binary Actually Is
The `dist/platform/elc` binary is a compiled El program that was produced by running an earlier version of itself on `elc-cli.el`. It is not a Rust binary. The `elc.legacy` and `elc.preselfhost` checkpoints suggest the chain has been continuously self-hosting and re-stamped. The original genesis compiler (referenced in the language spec as a "Rust genesis compiler") was used to produce the first self-hosted binary; that Rust binary is not present in this repo.
To rebuild the current binary from source using the current binary:
```bash
cd /path/to/el
./dist/platform/elc elc-cli.el elc-new.c
cc -std=c11 -I el-compiler/runtime -lcurl -lpthread \
-o dist/platform/elc-new \
elc-new.c el-compiler/runtime/el_runtime.c
```
Verify self-hosting by using `elc-new` to recompile itself and diffing the outputs.
---
## 2. The Language
### 2.1 Lexical Structure
El source is UTF-8. File extension `.el`. Comments are single-line only: `//` to end of line.
**Token representation:** every token is a map `{ "kind": String, "value": String }`.
**Keywords** — from `keyword_kind()` in `lexer.el`:
| Keyword | Token Kind | Notes |
|---------|-----------|-------|
| `let` | `Let` | variable binding |
| `fn` | `Fn` | function definition |
| `type` | `Type` | struct definition |
| `enum` | `Enum` | enum definition |
| `match` | `Match` | pattern match |
| `return` | `Return` | function return |
| `if` | `If` | conditional |
| `else` | `Else` | |
| `for` | `For` | iteration |
| `in` | `In` | used in `for x in list` |
| `while` | `While` | loop |
| `import` | `Import` | module import |
| `from` | `From` | `from mod import { Name }` |
| `as` | `As` | (reserved, no parse form) |
| `with` | `With` | (reserved) |
| `sealed` | `Sealed` | (reserved) |
| `activate` | `Activate` | (reserved) |
| `where` | `Where` | (reserved) |
| `test` | `Test` | (reserved) |
| `seed` | `Seed` | (reserved) |
| `assert` | `Assert` | (reserved) |
| `protocol` | `Protocol` | (reserved) |
| `impl` | `Impl` | (reserved) |
| `retry` | `Retry` | reserved / soft keyword in expr position |
| `times` | `Times` | reserved / soft keyword |
| `fallback` | `Fallback` | reserved / soft keyword |
| `reason` | `Reason` | reserved / soft keyword |
| `parallel` | `Parallel` | reserved / soft keyword |
| `trace` | `Trace` | reserved / soft keyword |
| `requires` | `Requires` | reserved / soft keyword |
| `deploy` | `Deploy` | reserved / soft keyword |
| `to` | `To` | reserved / soft keyword |
| `via` | `Via` | reserved / soft keyword |
| `target` | `Target` | **RESERVED — cannot use as identifier** |
| `true` | `Bool` | literal value `true` |
| `false` | `Bool` | literal value `false` |
| `cgi` | `Cgi` | CGI identity block |
| `service` | `Service` | service declaration block |
| `manager` | `Manager` | VBD role decorator / soft keyword |
| `engine` | `Engine` | VBD role decorator / soft keyword |
| `accessor` | `Accessor` | VBD role decorator / soft keyword |
| `vessel` | `Vessel` | soft keyword |
| `extern` | `Extern` | `extern fn` forward declaration |
**Soft keywords** (`target`, `to`, `via`, `deploy`, `reason`, `times`, `fallback`, `retry`, `parallel`, `trace`, `requires`, `where`, `as`, `with`, `manager`, `engine`, `accessor`, `vessel`): these have dedicated token kinds but the parser re-interprets them as `Ident` nodes when they appear in expression position (e.g., as parameter names or local variable names).
**All token kinds:**
| Kind | Pattern |
|------|---------|
| `Int` | `[0-9]+` |
| `Float` | `[0-9]+ '.' [0-9]+` |
| `Str` | `"…"` with `\"`, `\n`, `\t`, `\r`, `\\` escapes |
| `Bool` | `true` or `false` |
| `Ident` | `[a-zA-Z_][a-zA-Z0-9_]*` (not a keyword) |
| keyword tokens | one per keyword above |
| `Eq` | `=` |
| `EqEq` | `==` |
| `NotEq` | `!=` |
| `Not` | `!` |
| `Lt` / `LtEq` / `Gt` / `GtEq` | `<` `<=` `>` `>=` |
| `And` | `&&` (single `&` is consumed and discarded) |
| `Or` | `\|\|` |
| `Pipe` | `\|` |
| `PipeOp` | `\|>` |
| `Plus` / `Minus` / `Star` / `Slash` | `+` `-` `*` `/` |
| `Percent` | `%` |
| `Arrow` | `->` |
| `FatArrow` | `=>` |
| `Colon` / `ColonColon` | `:` `::` |
| `LParen` / `RParen` | `(` `)` |
| `LBrace` / `RBrace` | `{` `}` |
| `LBracket` / `RBracket` | `[` `]` |
| `Comma` / `Dot` / `Semicolon` | `,` `.` `;` |
| `At` | `@` |
| `QuestionMark` | `?` |
| `Eof` | end-of-input sentinel |
**String comment stripping:** the lexer contains a special heuristic for string literals that embed JavaScript or CSS (`looks_like_code`). If a string contains `<script`, `<style`, or `function` + `;`, the lexer strips `//` and `/* */` comments from the string value before producing the `Str` token. This is a compile-time content sanitization pass.
### 2.2 AST Node Types
Every AST node is a `Map<String, Any>`. The `"expr"` or `"stmt"` key names the node type.
**Expression nodes:**
| `expr` value | Fields | Meaning |
|-------------|--------|---------|
| `Int` | `value: String` | integer literal |
| `Float` | `value: String` | float literal |
| `Str` | `value: String` | string literal |
| `Bool` | `value: String` | `"true"` or `"false"` |
| `Nil` | — | null / missing |
| `Ident` | `name: String` | identifier reference |
| `BinOp` | `op: String`, `left`, `right` | binary operation |
| `Not` | `inner` | unary `!` |
| `Neg` | `inner` | unary `-` |
| `Call` | `func`, `args: [expr]` | function call |
| `Field` | `object`, `field: String` | `obj.field` |
| `Index` | `object`, `index` | `obj[idx]` |
| `Array` | `elems: [expr]` | `[e1, e2, …]` |
| `Map` | `pairs: [{ key: String, value: expr }]` | `{ "k": v, … }` |
| `If` | `cond`, `then: [stmt]`, `else: [stmt]`, `has_else: Bool` | conditional expression |
| `For` | `item: String`, `list`, `body: [stmt]` | for-in expression |
| `Match` | `subject`, `arms: [{ pattern, body }]` | pattern match |
| `DurationLit` | `count: String`, `unit: String` | `30.seconds`, `1.hour` |
| `Try` | `inner` | postfix `?` (no-op passthrough today) |
**Binary operators** (`op` field values): `Plus`, `Minus`, `Star`, `Slash`, `EqEq`, `NotEq`, `Lt`, `Gt`, `LtEq`, `GtEq`, `And`, `Or`.
**Operator precedence** (higher = tighter binding):
| Level | Operators |
|-------|-----------|
| 6 | `Star`, `Slash` |
| 5 | `Plus`, `Minus` |
| 4 | `Lt`, `Gt`, `LtEq`, `GtEq` |
| 3 | `EqEq`, `NotEq` |
| 2 | `And` |
| 1 | `Or` |
**Pattern nodes** (used inside `Match` arms):
| `pattern` value | Fields | Meaning |
|----------------|--------|---------|
| `Wildcard` | — | `_` — always matches |
| `Binding` | `name: String` | binds subject to name |
| `LitInt` | `value: String` | integer literal pattern |
| `LitStr` | `value: String` | string literal pattern |
| `LitBool` | `value: String` | boolean literal pattern |
**Statement nodes:**
| `stmt` value | Fields | Meaning |
|-------------|--------|---------|
| `Let` | `name: String`, `value: expr`, `type: String` | variable binding |
| `Assign` | `name: String`, `value: expr` | bare reassignment `name = expr` |
| `Return` | `value: expr` | return statement |
| `While` | `cond: expr`, `body: [stmt]` | while loop |
| `For` | `item: String`, `list: expr`, `body: [stmt]` | for-in loop |
| `FnDef` | `name: String`, `params: [param]`, `body: [stmt]`, `ret_type: String`, `decorator?: String` | function definition |
| `ExternFn` | `name: String`, `params: [param]`, `ret_type: String` | forward declaration |
| `TypeDef` | `name: String`, `fields: [{ name: String }]` | struct type definition |
| `EnumDef` | `name: String`, `variants: [{ name: String }]` | enum definition |
| `Import` | `path: String` | `import "file.el"` or `from mod import { … }` |
| `CgiBlock` | `name`, `dharma_id`, `principal`, `network`, `engram`, `has_*: Bool` | CGI identity declaration |
| `ServiceBlock` | `name`, `sponsor`, `domain` | service declaration |
| `Expr` | `value: expr` | bare expression statement |
**Param nodes:** `{ "name": String, "type": String }` where `type` is the leading identifier of the type annotation (e.g., `"Int"`, `"String"`, `"Map"`) or `""` if unannotated.
### 2.3 The Type System
Type annotations are parsed and stored but not type-checked at compile time. They serve as documentation and as hints to the codegen for arithmetic dispatch.
**Built-in types:**
| Type | C representation | Notes |
|------|-----------------|-------|
| `String` | `const char*` cast to `el_val_t` | via `EL_STR()` macro |
| `Int` | `int64_t` | direct |
| `Bool` | `int64_t` | `0` = false, nonzero = true |
| `Float` | `int64_t` | bit-cast double via `el_from_float()` |
| `Void` | `void` | functions returning nothing |
| `Any` | `void*` cast to `el_val_t` | generic containers |
| `[T]` | `el_val_t` | pointer to ElList struct |
| `Map<K,V>` | `el_val_t` | pointer to ElMap struct |
**Temporal types** (first-class in codegen):
| Type | Representation | Notes |
|------|---------------|-------|
| `Instant` | nanoseconds since Unix epoch as `int64_t` | `now()` returns this |
| `Duration` | signed nanoseconds as `int64_t` | `30.seconds` = `30 * 1000000000` |
| `Calendar` | pointer to heap-allocated struct | `earth_calendar(zone)` |
| `CalendarTime` | pointer to heap-allocated struct | `now_in(cal)` |
| `LocalDate` | pointer to heap-allocated struct | `local_date(y, m, d)` |
| `LocalTime` | nanoseconds since midnight, direct `int64_t` | `local_time(h, m, s, ns)` |
| `Zone` | pointer to heap-allocated struct | `zone("America/New_York")` |
| `Rhythm` | pointer to heap-allocated struct | recurrence pattern |
The codegen tracks type-annotated variable names in per-function process state (`__int_names`, `__instant_names`, `__duration_names`, etc.) to dispatch arithmetic and comparisons through the correct runtime wrappers. Type-mismatched operations (e.g., `Instant + Instant`) are emitted as `#error` directives.
**Duration postfix literals:** `30.seconds`, `1.hour`, `500.millis`, `30.nanos` are parsed as `DurationLit` AST nodes and compiled to `el_duration_from_nanos(count * multiplier)`. The multipliers:
| Unit | Nanoseconds |
|------|------------|
| `nano` / `nanos` | 1 |
| `milli` / `millis` / `millisecond` / `milliseconds` | 1,000,000 |
| `second` / `seconds` | 1,000,000,000 |
| `minute` / `minutes` | 60,000,000,000 |
| `hour` / `hours` | 3,600,000,000,000 |
| `day` / `days` | 86,400,000,000,000 |
### 2.4 Key Language Semantics
**Implicit return.** The final expression in a function body becomes the return value if it is not a control-flow construct (`If`, `For`). The codegen's `transform_implicit_return` rewrites the last `Expr` statement into a `Return` statement before emitting.
**Let-rebinding, not mutation.** El uses `let` for both initial binding and rebinding:
```el
let count = 0
let count = count + 1 // NOT mutation creates a new binding in the same scope
```
The codegen tracks declared names per C scope. When `count` is already in `declared`, it emits `count = count + 1;` (plain assignment). When it is new, it emits `el_val_t count = 0;`. This means **El does not have mutable variables in the traditional sense** — every `let` is a potential redeclaration. The practical effect is that shadowing and in-place update use identical syntax.
**Bare reassignment.** The parser also handles `name = expr` (without `let`) when an `Ident` is immediately followed by `Eq`. This emits a plain C assignment.
**`target` is reserved.** The word `target` is lexed as the `Target` token kind — it cannot be used as a variable or parameter name. Use `tgt` or another name instead. This is a live gotcha in `compiler.el` itself, which uses `tgt` for exactly this reason.
**`__no_block_expr` guard.** The parser uses process state key `__no_block_expr` to suppress Map-literal parsing when parsing the condition of `if`, `while`, `for`, and `match`. This prevents a stray `{` (the start of the then-block) from being parsed as a Map literal.
**Arena memory model.** The runtime includes an arena allocator that is activated in server/long-running contexts. In CLI mode (`elc`, `elb`) the arena is inactive. Memory is managed via ARC (reference counting): `el_retain()` and `el_release()` on Lists and Maps. Strings and ints are not refcounted — the retain/release functions are safe no-ops on non-tagged values.
---
## 3. The Runtime API
All runtime functions are declared in `el-compiler/runtime/el_runtime.h`. Every compiled El program links against `el-compiler/runtime/el_runtime.c`.
All values are `el_val_t` (`int64_t`). Strings are pointers cast through `int64_t` using `EL_STR(s)` / `EL_CSTR(v)` macros.
Canonical compile command:
```bash
cc -std=c11 -I el-compiler/runtime -lcurl -lpthread \
-o <out> <prog>.c el-compiler/runtime/el_runtime.c
```
### I/O
| Function | Signature | Description |
|----------|-----------|-------------|
| `println` | `(s) -> Void` | print string + newline to stdout |
| `print` | `(s) -> Void` | print string without newline |
| `readline` | `() -> String` | read one line from stdin |
### String Operations
| Function | Signature | Description |
|----------|-----------|-------------|
| `el_str_concat` | `(a, b) -> String` | concatenate two strings |
| `str_concat` | `(a, b) -> String` | alias for `el_str_concat` |
| `str_eq` | `(a, b) -> Bool` | string equality comparison |
| `str_starts_with` | `(s, prefix) -> Bool` | prefix test |
| `str_ends_with` | `(s, suffix) -> Bool` | suffix test |
| `str_contains` | `(s, sub) -> Bool` | substring test |
| `str_len` | `(s) -> Int` | byte length |
| `str_slice` | `(s, start, end) -> String` | substring (byte offsets) |
| `str_replace` | `(s, from, to) -> String` | replace all occurrences |
| `str_to_upper` / `str_upper` | `(s) -> String` | uppercase |
| `str_to_lower` / `str_lower` | `(s) -> String` | lowercase |
| `str_trim` | `(s) -> String` | strip leading/trailing whitespace |
| `str_lstrip` / `str_rstrip` | `(s) -> String` | one-sided strip |
| `str_index_of` | `(s, sub) -> Int` | position of substring; `-1` if absent |
| `str_last_index_of` | `(s, sub) -> Int` | last position |
| `str_index_of_all` | `(s, sub) -> [Int]` | all byte offsets (non-overlapping) |
| `str_find_chars` | `(s, any_of) -> Int` | first index of any char in set |
| `str_split` | `(s, sep) -> [String]` | split on separator |
| `str_split_lines` | `(s) -> [String]` | split on newlines |
| `str_split_chars` | `(s) -> [String]` | split into individual characters |
| `str_split_n` | `(s, sep, n) -> [String]` | split at most `n` times |
| `str_join` | `(list, sep) -> String` | join list with separator |
| `str_char_at` | `(s, i) -> String` | character at byte index |
| `str_char_code` | `(s, i) -> Int` | Unicode code point at index |
| `str_pad_left` | `(s, width, pad) -> String` | left-pad to width |
| `str_pad_right` | `(s, width, pad) -> String` | right-pad to width |
| `str_format` | `(fmt, data) -> String` | `{key}` interpolation |
| `str_repeat` | `(s, n) -> String` | repeat string n times |
| `str_reverse` | `(s) -> String` | reverse by codepoint |
| `str_strip_prefix` | `(s, prefix) -> String` | remove prefix if present |
| `str_strip_suffix` | `(s, suffix) -> String` | remove suffix if present |
| `str_strip_chars` | `(s, chars) -> String` | strip characters from both ends |
| `str_count` | `(s, sub) -> Int` | count non-overlapping occurrences |
| `str_count_chars` | `(s) -> Int` | codepoint count |
| `str_count_bytes` | `(s) -> Int` | alias for `str_len` |
| `str_count_lines` | `(s) -> Int` | line count |
| `str_count_words` | `(s) -> Int` | word count |
| `str_count_letters` | `(s) -> Int` | ASCII letter count |
| `str_count_digits` | `(s) -> Int` | ASCII digit count |
| `is_letter` / `is_digit` / `is_alphanumeric` | `(s) -> Bool` | ASCII char classification |
| `is_whitespace` / `is_punctuation` | `(s) -> Bool` | |
| `is_uppercase` / `is_lowercase` | `(s) -> Bool` | |
| `int_to_str` | `(n) -> String` | format integer |
| `str_to_int` | `(s) -> Int` | parse integer |
| `str_to_float` | `(s) -> Float` | parse float |
| `parse_int` | `(s, default) -> Int` | parse with fallback |
| `bool_to_str` | `(b) -> String` | format bool |
### Integer/Float Math
| Function | Description |
|----------|-------------|
| `el_abs(n)` | absolute value |
| `el_max(a, b)` | maximum |
| `el_min(a, b)` | minimum |
| `float_to_str(f)` | format float as string |
| `int_to_float(n)` | widen Int to Float |
| `float_to_int(f)` | truncate Float to Int |
| `format_float(f, decimals)` | format with N decimal places |
| `decimal_round(f, decimals)` | round to N decimals |
| `math_sqrt(f)` | square root |
| `math_log(f)` / `math_ln(f)` | logarithms |
| `math_sin(f)` / `math_cos(f)` / `math_pi()` | trigonometry |
### List Operations
| Function | Description |
|----------|-------------|
| `el_list_empty()` | create empty list |
| `el_list_new(count, …)` | create list from N values (varargs) |
| `el_list_len(list)` | length |
| `el_list_get(list, i)` | element at index; `0` on out-of-bounds |
| `el_list_append(list, e)` | append; returns updated list |
| `el_list_clone(list)` | shallow copy |
| `list_push(list, e)` | alias for `el_list_append` |
| `list_push_front(list, e)` | prepend |
| `list_join(list, sep)` | join to string |
| `list_range(start, end)` | integer range `[start, end)` |
| `native_list_empty()` | alias for `el_list_empty` (used in compiler source) |
| `native_list_append(l, v)` | alias for `el_list_append` |
| `native_list_get(l, idx)` | alias for `el_list_get` |
| `native_list_len(l)` | alias for `el_list_len` |
| `native_list_clone(l)` | alias for `el_list_clone` |
| `append(l, e)` | method-call alias: `list.append(e)` |
| `len(l)` | method-call alias: `list.len()` |
| `get(l, i)` | method-call alias: `list.get(i)` |
### Map Operations
| Function | Description |
|----------|-------------|
| `el_map_new(count, …)` | create map from key/value pairs (varargs) |
| `el_map_get(map, key)` | get value by key |
| `el_map_set(map, key, value)` | set key; returns map |
| `el_get_field(map, key)` | alias; emitted for `.field` access |
| `map_get(map, key)` | method-call alias |
| `map_set(map, key, value)` | method-call alias |
### ARC (Reference Counting)
| Function | Description |
|----------|-------------|
| `el_retain(v)` | increment refcount; no-op for non-heap values |
| `el_release(v)` | decrement refcount; free when zero |
### In-Process State
| Function | Description |
|----------|-------------|
| `state_set(key, value)` | store in process-global key/value table |
| `state_get(key)` | retrieve; `""` if absent |
| `state_del(key)` | delete key |
| `state_keys()` | all keys as `[String]` |
### Filesystem
| Function | Description |
|----------|-------------|
| `fs_read(path)` | read file to string; `""` on error |
| `fs_write(path, content)` | write string; returns `1` on success |
| `fs_write_bytes(path, bytes, length)` | write raw bytes of known length |
| `fs_list(path)` | list directory entries |
| `fs_exists(path)` | check if path exists |
| `fs_mkdir(path)` | mkdir -p |
### HTTP Client
| Function | Description |
|----------|-------------|
| `http_get(url)` | GET; returns body string |
| `http_post(url, body)` | POST; returns body string |
| `http_post_json(url, json_body)` | POST with Content-Type: application/json |
| `http_get_with_headers(url, headers_map)` | GET with custom headers |
| `http_post_with_headers(url, body, headers_map)` | POST with custom headers |
| `http_post_form_auth(url, form_body, auth_header)` | POST with auth |
| `http_delete(url)` | DELETE |
| `http_get_to_file(url, headers_map, output_path)` | stream response to file |
| `http_post_to_file(url, body, headers_map, output_path)` | stream POST response to file |
| `http_response(status, headers_json, body)` | build response envelope |
| `url_encode(s)` | RFC 3986 percent-encoding |
| `url_decode(s)` | URL decode |
| `el_html_sanitize(html, allowlist_json)` | allowlist HTML sanitizer |
### HTTP Server
| Function | Description |
|----------|-------------|
| `http_serve(port, handler)` | start server; handler: `(method, path, body) -> String` |
| `http_serve_v2(port, handler)` | start server; handler: `(method, path, headers_map, body) -> String` |
| `http_set_handler(name)` | set handler by symbol name |
| `http_set_handler_v2(name)` | v2 variant |
### JSON
| Function | Description |
|----------|-------------|
| `json_get(json, key)` | substring lookup of `"key": value` |
| `json_parse(s)` | parse JSON string to List/Map |
| `json_stringify(v)` | serialize Any to JSON string |
| `json_get_string(j, key)` | typed extract: String |
| `json_get_int(j, key)` | typed extract: Int |
| `json_get_float(j, key)` | typed extract: Float |
| `json_get_bool(j, key)` | typed extract: Bool |
| `json_get_raw(j, key)` | extract nested object/array as JSON string |
| `json_set(j, key, value)` | update field, return new JSON string |
| `json_array_len(j)` | length of JSON array string |
| `json_array_get(j, index)` | element at index |
| `json_array_get_string(j, index)` | string element at index |
### Time (Epoch-Based)
| Function | Description |
|----------|-------------|
| `time_now()` | Unix epoch milliseconds |
| `time_now_utc()` | same, explicit UTC |
| `time_format(ts, fmt)` | format timestamp |
| `time_to_parts(ts)` | decompose to Map of fields |
| `time_from_parts(secs, ns, tz)` | construct timestamp |
| `time_add(ts, n, unit)` | add duration |
| `time_diff(ts1, ts2, unit)` | difference |
| `unix_timestamp()` | Unix seconds as Int |
| `sleep_secs(secs)` | sleep N seconds |
| `sleep_ms(ms)` | sleep N milliseconds |
### Time (First-Class Instant/Duration)
| Function | Description |
|----------|-------------|
| `now()` / `el_now_instant()` | current time as Instant (nanoseconds) |
| `unix_seconds(n)` | construct Instant from Unix seconds |
| `unix_millis(n)` | construct Instant from Unix milliseconds |
| `instant_from_iso8601(s)` | parse ISO 8601 string |
| `instant_to_unix_seconds(i)` | extract Unix seconds |
| `instant_to_unix_millis(i)` | extract Unix milliseconds |
| `instant_to_iso8601(i)` | format as ISO 8601 |
| `el_duration_from_nanos(ns)` | construct Duration from nanoseconds |
| `duration_seconds(n)` | Duration from seconds |
| `duration_millis(n)` | Duration from milliseconds |
| `duration_nanos(n)` | Duration from nanoseconds |
| `duration_to_seconds(d)` | extract seconds |
| `duration_to_millis(d)` | extract milliseconds |
| `duration_to_nanos(d)` | extract nanoseconds |
| `el_instant_add_dur(inst, dur)` | Instant + Duration |
| `el_instant_sub_dur(inst, dur)` | Instant - Duration |
| `el_instant_diff(a, b)` | Instant - Instant = Duration |
| `el_duration_add/sub/scale/div` | Duration arithmetic |
| `el_instant_lt/le/gt/ge/eq/ne` | Instant comparison |
| `el_duration_lt/le/gt/ge/eq/ne` | Duration comparison |
| `el_sleep_duration(dur)` | sleep for a Duration |
| `ttl_cache_set(key, value)` | store with TTL |
| `ttl_cache_get(key, max_age)` | retrieve if within max_age |
| `ttl_cache_age(key)` | age of cached value as Duration |
### Calendar System
| Function | Description |
|----------|-------------|
| `zone(id)` | IANA zone or fixed offset |
| `zone_utc()` / `zone_local()` | UTC and local zone |
| `zone_offset(hours, minutes)` | fixed offset zone |
| `earth_calendar(z)` | Gregorian calendar in zone |
| `earth_calendar_default()` | system default |
| `mars_calendar()` / `cycle_calendar(period)` | non-Earth calendars |
| `no_cycle_calendar()` / `relative_calendar(epoch)` | abstract calendars |
| `now_in(cal)` | current time as CalendarTime |
| `in_calendar(inst, cal)` | project Instant into Calendar |
| `cal_format(ct, pattern)` | format CalendarTime |
| `cal_to_instant(ct)` | extract underlying Instant |
| `cal_cycle_phase(ct)` / `cal_in(ct, cal)` | calendar ops |
| `local_date(y, m, d)` | construct LocalDate |
| `local_time(h, m, s, ns)` | construct LocalTime |
| `local_datetime(date, time)` | construct LocalDateTime |
| `zoned(date, time, cal)` | zoned datetime |
| `local_date_year/month/day` | LocalDate accessors |
| `local_time_hour/minute/second/nanos` | LocalTime accessors |
| `el_local_date_add_dur` / `el_local_time_add_dur` | date/time arithmetic |
| `el_local_date_lt` / `el_local_date_eq` | date comparison |
| `rhythm_*` | recurrence patterns (cycle_start, weekday, weekly_at, next_after, matches, …) |
### Process / Execution
| Function | Description |
|----------|-------------|
| `args()` | command-line arguments as `[String]` (excludes argv[0]) |
| `env(key)` | read environment variable; `""` if unset |
| `exit(code)` | exit process with code |
| `exit_program(code)` | alias for `exit` |
| `getpid_now()` | current process ID |
| `exec_command(cmd)` | run shell command; return exit code |
| `exec_capture(cmd)` | run shell command; capture and return stdout |
| `uuid_new()` / `uuid_v4()` | generate UUID v4 |
| `native_int_to_str(n)` | format integer (alias, used in compiler source) |
| `native_string_chars(s)` | split string into `[String]` of single characters |
### Crypto
| Function | Description |
|----------|-------------|
| `sha256_hex(input)` | SHA-256, hex output |
| `sha256_bytes(input)` | SHA-256, raw bytes |
| `hmac_sha256_hex(key, msg)` | HMAC-SHA-256, hex |
| `hmac_sha256_bytes(key, msg)` | HMAC-SHA-256, raw bytes |
| `base64_encode(input)` / `base64_decode(input)` | standard base64 |
| `base64url_encode(input)` / `base64url_decode(input)` | URL-safe base64 |
| `sha3_256_hex(input)` | SHA3-256 (Keccak) |
| `pq_keygen_signature()` | Dilithium-3 key pair |
| `pq_sign(sk_hex, msg)` / `pq_verify(pk_hex, msg, sig_hex)` | PQ signatures |
| `pq_kem_keygen()` / `pq_kem_encaps(pk)` / `pq_kem_decaps(sk, ct)` | Kyber-768 KEM |
| `pq_hybrid_keygen()` / `pq_hybrid_handshake(remote_pub)` | X25519 + Kyber hybrid |
| `aead_encrypt(key_hex, plaintext)` | AES-256-GCM encrypt |
| `aead_decrypt(key_hex, nonce_hex, ct_hex)` | AES-256-GCM decrypt |
### DHARMA Network (CGI programs only)
| Function | Description |
|----------|-------------|
| `el_cgi_init(name, dharma_id, principal, network, engram)` | initialize CGI identity (called by generated `main()`) |
| `dharma_connect(cgi_id)` | open channel to peer |
| `dharma_send(channel, content)` | send message; blocks for response |
| `dharma_activate(query)` | spreading activation across DHARMA network |
| `dharma_emit(event_type, payload)` | emit network event (@manager only) |
| `dharma_field(event_type)` | wait for event (@manager only) |
| `dharma_strengthen(cgi_id, weight)` | Hebbian potentiation |
| `dharma_relationship(cgi_id)` | current relationship weight |
| `dharma_peers()` | all connected peers sorted by weight |
### Engram Knowledge Graph
| Function | Description |
|----------|-------------|
| `engram_node(content, type, salience)` | create node; returns ID |
| `engram_node_full(content, type, label, salience, importance, confidence, tier, tags)` | full node creation |
| `engram_node_layered(…, layer_id)` | create node in specific layer |
| `engram_get_node(id)` | retrieve node by ID |
| `engram_strengthen(node_id)` | Hebbian potentiation |
| `engram_forget(node_id)` | delete node and edges |
| `engram_node_count()` | total node count |
| `engram_edge_count()` | total edge count |
| `engram_search(query, limit)` | full-text search |
| `engram_scan_nodes(limit, offset)` | paginated node scan |
| `engram_connect(from, to, weight, relation)` | create directed edge |
| `engram_edge_between(from, to)` | get edge |
| `engram_neighbors(node_id)` | BFS neighbors |
| `engram_neighbors_filtered(node_id, max_depth, direction)` | filtered BFS |
| `engram_activate(query, depth)` | spreading activation |
| `engram_save(path)` / `engram_load(path)` | snapshot to/from disk |
| `engram_add_layer(name, priority, suppressible, transparent, injectable)` | add consciousness layer |
| `engram_remove_layer(layer_id)` / `engram_list_layers()` | layer management |
| `engram_*_json` variants | JSON-string versions of search/scan/activate |
| `engram_compile_layered_json(intent, depth)` | prompt-ready context block |
### LLM (Anthropic API)
| Function | Description |
|----------|-------------|
| `llm_call(model, prompt)` | single-turn call |
| `llm_call_system(model, system, user)` | call with system prompt |
| `llm_call_agentic(model, system, user, tools)` | agentic call with tools (CGI only) |
| `llm_vision(model, system, prompt, image)` | vision call |
| `llm_models()` | list available models |
| `llm_register_tool(name, handler_fn_name)` | register tool handler (CGI only) |
### Observability
| Function | Description |
|----------|-------------|
| `emit_log(level, msg, fields_json)` | emit OTLP log |
| `emit_metric(name, value, tags_json)` | emit OTLP metric |
| `trace_span_start(name)` | start trace span |
| `trace_span_end(span_handle)` | end trace span |
| `emit_event(name, duration_ms)` | emit event |
---
## 4. How to Re-Bootstrap from Zero
This section assumes the bootstrap binary is gone. Everything else (source files, runtime) is intact.
### What You Need to Implement
A minimal El compiler has three parts: lexer, parser, codegen. Each can be written in any language. The goal is to compile `elc-cli.el` into a working `elc` binary, after which El is self-hosting again.
### Step 1: Write a Minimal Lexer
The lexer must produce a list of `{ "kind": String, "value": String }` maps (or equivalent structures). Required token kinds: `Int`, `Float`, `Str`, `Bool`, `Ident`, `Eof`, and all keywords and operators listed in section 2.1.
The minimal subset needed to compile the compiler itself:
- Keywords: `let`, `fn`, `return`, `if`, `else`, `while`, `for`, `in`, `import`, `from`, `true`, `false`, `extern`
- Literals: `Int`, `Str`, `Bool`, `Ident`
- Operators: `=`, `==`, `!=`, `!`, `<`, `>`, `<=`, `>=`, `&&`, `||`, `+`, `-`, `*`, `/`, `->`, `=>`, `:`, `,`, `.`, `(`, `)`, `{`, `}`, `[`, `]`, `@`, `?`
- Special: `Eof`
The lexer in `lexer.el` walks a char array using `native_list_get` to avoid O(n²) string slicing. A Python implementation can use a simple index into a string. Escapes to handle: `\"`, `\n`, `\t`, `\r`, `\\`.
### Step 2: Write a Minimal Parser
The parser is a standard recursive descent parser. It produces AST maps as described in section 2.2.
The minimal statement forms needed to compile the compiler:
- `let name [: Type] = expr`
- `fn name(params) [-> Type] { body }`
- `extern fn name(params) [-> Type]`
- `return expr`
- `while cond { body }`
- `for item in list { body }`
- `if cond { body } [else [if] { body }]`
- `import "path"`
- `from module import { … }`
- `@decorator stmt`
- `name = expr` (bare assignment)
- bare expression statement
The minimal expression forms:
- Integer, float, string, bool literals
- Identifier
- Binary operations with the precedence table from section 2.2
- Unary `!` and `-`
- Function call: `f(a, b, …)`
- Method call: `obj.method(args)` (parsed as Call with Field func)
- Field access: `obj.field`
- Index access: `obj[i]`
- Array literal: `[e1, e2, …]`
- Map literal: `{ "key": value, … }`
- `if` as expression
- `match` expression
- Postfix `?` (can be a no-op)
- Duration literal: `N.unit`
The `__no_block_expr` guard (section 2.4) is important: without it, `if a || b { ... }` will incorrectly parse `{` as a Map literal.
### Step 3: Write a Minimal Codegen
The codegen emits C11 source. Required output structure:
```c
#include <stdint.h>
#include <stdlib.h>
#include "el_runtime.h"
// Forward declarations for all non-main functions
el_val_t fn_name(el_val_t p1, el_val_t p2);
...
// File-scope let bindings (if any)
el_val_t GLOBAL_NAME;
// Function bodies
el_val_t fn_name(el_val_t p1, el_val_t p2) {
...
return 0;
}
// Entry point
int main(int _argc, char** _argv) {
el_runtime_init_args(_argc, _argv);
...
return 0;
}
```
Critical codegen rules:
1. **All values are `el_val_t`**. Every parameter, local variable, and return type is `el_val_t` unless the function has `ret_type == "Void"` (use `void`).
2. **Let-rebinding**: track declared names per C scope. Emit `el_val_t name = val;` on first occurrence; emit `name = val;` on subsequent occurrences of the same name in the same scope.
3. **`+` dispatch**: if either operand is a string literal → `el_str_concat(a, b)`. If both are provably integers → `(a + b)`. Default fallback → `el_str_concat`.
4. **`==` dispatch**: if either operand is a string or identifier → `str_eq(a, b)`. If both are integer literals or provably Int → `(a == b)`.
5. **String literals**: wrap in `EL_STR("…")` and escape: `\"``\\\"`, `\n``\\n`, `\t``\\t`, `\\``\\\\`.
6. **Map literals**: `el_map_new(N, "k1", v1, "k2", v2, …)`. Empty map: `el_map_new(0)`.
7. **Array literals**: `el_list_new(N, e1, e2, …)`. Empty: `el_list_empty()`.
8. **Index access**: string-literal index → `el_get_field(obj, EL_STR("key"))`. Integer index → `el_list_get(obj, idx)`.
9. **Field access** `obj.field``el_get_field(obj, EL_STR("field"))`.
10. **Method call** `obj.method(args)``method(obj, args)`.
11. **`for item in list`** → emit:
```c
{ el_val_t _el_lst = <list>; el_val_t _el_len = el_list_len(_el_lst);
for (el_val_t _el_i = 0; _el_i < _el_len; _el_i++) {
el_val_t item = el_list_get(_el_lst, _el_i);
<body>
}
}
```
12. **`match`** → GCC/Clang statement expression with `goto`:
```c
({ el_val_t _s = <subject>; el_val_t _r = 0;
if (_s == 42) { _r = <arm_body>; goto _done; }
if (str_eq(_s, EL_STR("str"))) { _r = <arm_body>; goto _done; }
{ _r = <wildcard_body>; goto _done; }
_done:; _r; })
```
13. **`if` as expression** → similarly wrapped in a GCC/Clang statement expression.
14. **Implicit return**: if the last statement in a function body is a bare `Expr` (not `If` or `For`), emit it as `return <expr>;` instead of `<expr>;`.
15. **Float literals**: emit as `el_from_float(<value>)`.
16. **Bool literals**: `true` → `1`, `false` → `0`.
17. **`fn main()`**: do not emit as a regular `el_val_t` function. Instead, fold its body into C's `int main()` after any top-level statements.
18. **`extern fn`**: emit only a forward declaration (no body).
19. **Forward declarations**: scan for all `FnDef` nodes before emitting bodies. This enables mutual recursion.
### Step 4: Compile the El Compiler
Using your minimal implementation, compile `elc-cli.el` (which imports the entire compiler chain):
```bash
# Your minimal compiler
python3 minimal_elc.py elc-cli.el > elc-new.c
# Build with the runtime
cc -std=c11 -I el-compiler/runtime -lcurl -lpthread \
-o elc-new elc-new.c el-compiler/runtime/el_runtime.c
```
### Step 5: Verify Self-Hosting
```bash
# Compile elc-cli.el with the new compiler
./elc-new elc-cli.el elc-v2.c
cc -std=c11 -I el-compiler/runtime -lcurl -lpthread \
-o elc-v2 elc-v2.c el-compiler/runtime/el_runtime.c
# Compile again with the second-generation compiler
./elc-v2 elc-cli.el elc-v3.c
# The outputs should be identical
diff elc-v2.c elc-v3.c
```
A clean diff confirms you have a stable fixed point: the compiler reproduces itself exactly.
### Step 6: Replace the Bootstrap Binary
```bash
cp elc-v2 dist/platform/elc
```
You are bootstrapped.
### Minimal El Subset for the Compiler Itself
The El compiler source (`lexer.el`, `parser.el`, `codegen.el`, `compiler.el`) uses:
- `fn`, `let`, `while`, `if`/`else`, `return`, `for`/`in`, `import`
- `extern fn` (for `.elh` headers)
- `String`, `Int`, `Bool`, `Void`, `Any`, `Map<String, Any>`, `[String]`, `[Map<String, Any>]`
- Map literals `{ "key": val }`
- Array literals `[...]` (and `native_list_empty()`)
- List operations: `native_list_empty()`, `native_list_append()`, `native_list_get()`, `native_list_len()`, `native_list_clone()`
- String operations: `str_join()`, `str_eq()`, `str_contains()`, `str_starts_with()`, `str_slice()`, `str_trim()`, `str_split()`, `str_index_of()`, `str_len()`, `str_to_int()`, `native_string_chars()`, `native_int_to_str()`
- `state_get()`, `state_set()`
- `println()`, `fs_read()`, `fs_write()`, `exit()`
- `el_release()` (ARC cleanup)
The compiler does not use: HTTP, engram, dharma, LLM, crypto, UUID, float arithmetic.
---
## 5. The Long-Term Solution: elvm
### Why a VM Makes Bootstrapping More Auditable
The current bootstrap chain relies on trusting a binary whose source we cannot fully audit by inspection alone. This is the classic "trusting trust" problem (Ken Thompson, 1984). A virtual machine breaks the chain:
- `elc` targets `elvm` bytecode (instead of C)
- `elvm` is a minimal interpreter hand-written in ~500 lines of C
- The hand-written C is small enough to audit completely
- Anyone can compile `elvm.c` with any C compiler
- From there: `elvm` interprets `elc.elvm` → `elc` compiles El → `cc` builds native binaries
The benefit: the trusted base shrinks from "a Mach-O binary" to "500 lines of straightforward C code that anyone can read in an afternoon."
### The elvm Design
A minimal elvm needs:
- A stack or register machine (stack is simpler)
- Instructions: push, pop, add, sub, mul, div, cmp, jump, call, return, load, store
- A string table (El strings are mostly literals)
- A heap for ElList and ElMap
- An FFI table mapping El runtime builtins to C functions
The El compiler would gain a `--target=elvm` flag in `compile_dispatch()`. Codegen would emit bytecode instead of C text. The runtime interface stays the same — builtins map to FFI slots by name.
This is the planned path. It does not exist yet.
---
## 6. Compiler Source Map
| File | Role | Lines |
|------|------|-------|
| `elc-cli.el` | Entry point; imports compiler.el | 7 |
| `el-compiler/src/compiler.el` | Pipeline wiring: lex → parse → codegen. Import resolution, `--emit-header`, `fn main()`. Defines `compile()`, `compile_js()`, `compile_dispatch()`, `resolve_imports()` | 298 |
| `el-compiler/src/lexer.el` | Tokenizer. `lex(source)` → token list. Char helpers, keyword lookup, scan_digits, scan_ident, scan_string, strip_code_comments | 747 |
| `el-compiler/src/parser.el` | Recursive descent parser. `parse(tokens)` → AST. All statement and expression forms | 1071 |
| `el-compiler/src/codegen.el` | C code emitter. `codegen(stmts, source)` → (streams to stdout). Expression codegen, statement codegen, function codegen, type tracking, capability enforcement, temporal type dispatch | 2721 |
| `el-compiler/src/codegen-js.el` | JavaScript backend. `codegen_js(stmts, source)` → JS source | ~500 |
| `el-compiler/runtime/el_runtime.h` | Full runtime API declaration | 755 |
| `el-compiler/runtime/el_runtime.c` | Full runtime implementation | large |
| `el-compiler/runtime/el_runtime.js` | JS runtime | — |
| `elb.el` | Build coordinator. Reads `manifest.el`, walks import graph, compiles modules, links binary. The `.NET`-style incremental build model | 367 |
| `elc-combined.el` | Pre-merged single-file bootstrap edition (for early bootstrap iterations) | large |
| `spec/language.md` | Language specification v1.2.0 | — |
| `dist/platform/elc` | Current bootstrap binary (Mach-O arm64) | — |
---
## 7. Key Decisions and Gotchas
### `target` is a Reserved Keyword
`target` is lexed as the `Target` token kind. It cannot be used as a variable or parameter name anywhere in El source. If you write `fn compile(target: String)`, the parameter name will be tokenized as `Target`, which the parser does not recognize as an `Ident` in parameter position.
**Workaround:** use `tgt`, `dest`, `backend`, or any other name. The compiler source uses `tgt` specifically for this reason. This comes up whenever writing code that handles compilation targets.
### `let x = x + 1` is Let-Rebinding, Not Mutation
El has no mutable variables. `let count = count + 1` re-introduces `count` into the current scope, shadowing the previous binding. At the C level, the codegen tracks declared names and emits plain assignment for subsequent bindings of the same name:
- First `let count = 0` → `el_val_t count = 0;`
- Second `let count = count + 1` → `count = count + 1;`
This means you cannot have two different values named `count` in the same C scope — the second binding overwrites the first. This is by design. Scoped shadowing works correctly because each block (if body, while body, for body) gets its own copy of the `declared` list.
### Arena is Inactive in CLI Mode
The runtime includes an arena allocator designed for long-running server processes. In CLI mode (`elc`, `elb`) the arena is not activated. Memory is managed by ARC (reference counting via `el_retain`/`el_release`). The compiler source explicitly calls `el_release(tokens)` after parsing and `el_release(stmt)` after codegen to prevent memory exhaustion on large source files.
If you are implementing a new runtime or embedding El, be aware that the ARC model expects callers to release values they are done with.
### The `extern fn` / `.elh` Separate Compilation Model
`elb` (the build coordinator) supports separate compilation. When a module changes:
1. `elc --emit-header module.el module.c` compiles the module and writes `module.elh`
2. `module.elh` contains `extern fn` declarations for all public functions
3. Other modules that import `module.el` use the `.elh` header instead of re-parsing the source
The `resolve_imports` function in `compiler.el` checks for a `.elh` file before recursively inlining the `.el` source. If the header exists, it is used (and the `.el` is marked as seen to prevent double-inclusion).
This is important for bootstrap: if you have pre-compiled headers lying around from a broken build, they may shadow updated source. Delete `.elh` files (or use `elb --clean`) when debugging unexpected compilation behavior.
### Import Resolution: Depth-First with Deduplication
`resolve_imports` in `compiler.el`:
1. Walks imports depth-first (dependencies before dependents)
2. Uses `state_set("__elc_imp__:" + path, "1")` to deduplicate: each file is included exactly once
3. Builds the combined source string by concatenating import bodies ahead of the entry file's body
4. If a `.elh` header exists for an import, uses that instead of recursing into the `.el`
The result is one large string that gets passed through `lex` → `parse` → `codegen` as a single unit. The codegen emits forward declarations for all functions before any body, so declaration order within the combined source does not matter.
### `+` Operator Dispatch is Heuristic
El's `+` operator serves double duty: integer addition and string concatenation. The codegen dispatches based on static analysis of the AST:
- If either operand is a `Str` literal → `el_str_concat`
- If both operands are provably `Int` (via `is_int_expr`) → `(a + b)`
- If either operand is a `Call` or `Ident` → `el_str_concat` (conservative fallback)
The `is_int_expr` predicate recurses through the AST: literal `Int`, names in `__int_names` (from `: Int` annotations), known Int-returning builtins, and arithmetic BinOps over Int operands all count as "provably Int."
If you write `let result = some_int_var + 1` and `some_int_var` is not annotated `: Int`, the codegen may emit `el_str_concat` instead of integer addition. Fix by adding `: Int` to the variable declaration.
### `==` Operator Dispatch is Also Heuristic
Similarly, `==` dispatches between `str_eq(a, b)` (string comparison) and `(a == b)` (integer comparison) based on operand types. The codegen tracks Int-typed names in `__int_names`. Two `Ident` operands where both are known Int-typed use `==`; all other Ident-Ident comparisons use `str_eq`.
This means comparing two integer variables that were not annotated `: Int` can silently produce `str_eq` on what are actually integer values — and `str_eq` treats them as `const char*` pointers, producing incorrect results or segfaults.
**Rule:** always annotate variables `: Int` when they will participate in `==` comparisons or `+` arithmetic.
### Capability Kind Enforcement
The codegen classifies programs into three capability tiers based on top-level declarations:
- `cgi` block present → full capability (all primitives allowed)
- `service` block present → restricted (no `llm_call_agentic`, `llm_register_tool`, `dharma_emit`, `dharma_field`)
- Neither → `utility` (no DHARMA, no LLM)
Violations are collected during codegen and emitted as `#error` directives at the bottom of the generated C. The downstream `cc` step then fails with a clear message naming the forbidden call.
### The `__no_block_expr` Parse Guard
When parsing the condition of `if`, `while`, `for`, and `match`, the parser sets `state_set("__no_block_expr", "1")`. This prevents `parse_primary` from treating a `{` as the start of a Map literal — instead it returns `{ "expr": "Nil" }` and the caller sees the `{` and treats it as the block delimiter.
Without this guard, `if a || b { ... }` would recurse into `parse_expr` for `b`, hit `{`, try to parse it as a Map literal, fail to find string keys, loop in error-recovery mode, and hang.
### Codegen Streams Output via `println`
The codegen does not build the output as a string — it calls `println()` for each line as it is emitted. The `compile()` / `compile_js()` / `codegen()` functions return `""`. Output goes to stdout.
This design avoids O(n²) string concatenation for large programs. It also means you cannot capture the compiler's output in a variable within El itself — you must redirect stdout at the OS level (`elc source.el > output.c`).
When writing to a file, `elc` detects the output path argument, redirects C's `stdout` to the file (via `freopen` in the runtime), and the `println` calls go there instead.