46 KiB
El Language Bootstrap Guide
This document is the authoritative guide for reconstructing the El compiler toolchain from scratch. If the bootstrap binary at dist/platform/elc is ever lost, this document is the path back.
1. The Bootstrap Chain (Current State)
The Trust Chain
El is a self-hosting language. The compiler is written in El. This creates a circular dependency: you need an El compiler to compile the El compiler. The chain is resolved by a seed binary:
dist/platform/elc (Mach-O arm64 native binary)
↓
compiles elc-cli.el
↓
new self-hosted elc binary
↓
compiles itself again (identity check)
↓
stable self-hosted compiler
The binary at dist/platform/elc is a Mach-O 64-bit arm64 executable. The elc.preselfhost and elc.legacy files in the same directory are older snapshots kept as fallback checkpoints.
The key property: every binary in dist/platform/ was produced by compiling the El source in el-compiler/src/ using a previous version of that same binary. The chain is auditable: the source is the ground truth, not the binary.
The Self-Hosting Pipeline
elc-cli.el
imports → el-compiler/src/compiler.el
imports → el-compiler/src/lexer.el
imports → el-compiler/src/parser.el
imports → el-compiler/src/codegen.el
imports → el-compiler/src/codegen-js.el
Import resolution is textual. compiler.el recursively inlines all imported .el files before lex/parse. The result is one large unified source string that the compiler then processes in a single pass.
elc-combined.el in the repo root is a pre-merged single-file edition used during early bootstrap iterations.
What the Bootstrap Binary Actually Is
The dist/platform/elc binary is a compiled El program that was produced by running an earlier version of itself on elc-cli.el. It is not a Rust binary. The elc.legacy and elc.preselfhost checkpoints suggest the chain has been continuously self-hosting and re-stamped. The original genesis compiler (referenced in the language spec as a "Rust genesis compiler") was used to produce the first self-hosted binary; that Rust binary is not present in this repo.
To rebuild the current binary from source using the current binary:
cd /path/to/el
./dist/platform/elc elc-cli.el elc-new.c
cc -std=c11 -I el-compiler/runtime -lcurl -lpthread \
-o dist/platform/elc-new \
elc-new.c el-compiler/runtime/el_runtime.c
Verify self-hosting by using elc-new to recompile itself and diffing the outputs.
2. The Language
2.1 Lexical Structure
El source is UTF-8. File extension .el. Comments are single-line only: // to end of line.
Token representation: every token is a map { "kind": String, "value": String }.
Keywords — from keyword_kind() in lexer.el:
| Keyword | Token Kind | Notes |
|---|---|---|
let |
Let |
variable binding |
fn |
Fn |
function definition |
type |
Type |
struct definition |
enum |
Enum |
enum definition |
match |
Match |
pattern match |
return |
Return |
function return |
if |
If |
conditional |
else |
Else |
|
for |
For |
iteration |
in |
In |
used in for x in list |
while |
While |
loop |
import |
Import |
module import |
from |
From |
from mod import { Name } |
as |
As |
(reserved, no parse form) |
with |
With |
(reserved) |
sealed |
Sealed |
(reserved) |
activate |
Activate |
(reserved) |
where |
Where |
(reserved) |
test |
Test |
(reserved) |
seed |
Seed |
(reserved) |
assert |
Assert |
(reserved) |
protocol |
Protocol |
(reserved) |
impl |
Impl |
(reserved) |
retry |
Retry |
reserved / soft keyword in expr position |
times |
Times |
reserved / soft keyword |
fallback |
Fallback |
reserved / soft keyword |
reason |
Reason |
reserved / soft keyword |
parallel |
Parallel |
reserved / soft keyword |
trace |
Trace |
reserved / soft keyword |
requires |
Requires |
reserved / soft keyword |
deploy |
Deploy |
reserved / soft keyword |
to |
To |
reserved / soft keyword |
via |
Via |
reserved / soft keyword |
target |
Target |
RESERVED — cannot use as identifier |
true |
Bool |
literal value true |
false |
Bool |
literal value false |
cgi |
Cgi |
CGI identity block |
service |
Service |
service declaration block |
manager |
Manager |
VBD role decorator / soft keyword |
engine |
Engine |
VBD role decorator / soft keyword |
accessor |
Accessor |
VBD role decorator / soft keyword |
vessel |
Vessel |
soft keyword |
extern |
Extern |
extern fn forward declaration |
Soft keywords (target, to, via, deploy, reason, times, fallback, retry, parallel, trace, requires, where, as, with, manager, engine, accessor, vessel): these have dedicated token kinds but the parser re-interprets them as Ident nodes when they appear in expression position (e.g., as parameter names or local variable names).
All token kinds:
| Kind | Pattern |
|---|---|
Int |
[0-9]+ |
Float |
[0-9]+ '.' [0-9]+ |
Str |
"…" with \", \n, \t, \r, \\ escapes |
Bool |
true or false |
Ident |
[a-zA-Z_][a-zA-Z0-9_]* (not a keyword) |
| keyword tokens | one per keyword above |
Eq |
= |
EqEq |
== |
NotEq |
!= |
Not |
! |
Lt / LtEq / Gt / GtEq |
< <= > >= |
And |
&& (single & is consumed and discarded) |
Or |
|| |
Pipe |
| |
PipeOp |
|> |
Plus / Minus / Star / Slash |
+ - * / |
Percent |
% |
Arrow |
-> |
FatArrow |
=> |
Colon / ColonColon |
: :: |
LParen / RParen |
( ) |
LBrace / RBrace |
{ } |
LBracket / RBracket |
[ ] |
Comma / Dot / Semicolon |
, . ; |
At |
@ |
QuestionMark |
? |
Eof |
end-of-input sentinel |
String comment stripping: the lexer contains a special heuristic for string literals that embed JavaScript or CSS (looks_like_code). If a string contains <script, <style, or function + ;, the lexer strips // and /* */ comments from the string value before producing the Str token. This is a compile-time content sanitization pass.
2.2 AST Node Types
Every AST node is a Map<String, Any>. The "expr" or "stmt" key names the node type.
Expression nodes:
expr value |
Fields | Meaning |
|---|---|---|
Int |
value: String |
integer literal |
Float |
value: String |
float literal |
Str |
value: String |
string literal |
Bool |
value: String |
"true" or "false" |
Nil |
— | null / missing |
Ident |
name: String |
identifier reference |
BinOp |
op: String, left, right |
binary operation |
Not |
inner |
unary ! |
Neg |
inner |
unary - |
Call |
func, args: [expr] |
function call |
Field |
object, field: String |
obj.field |
Index |
object, index |
obj[idx] |
Array |
elems: [expr] |
[e1, e2, …] |
Map |
pairs: [{ key: String, value: expr }] |
{ "k": v, … } |
If |
cond, then: [stmt], else: [stmt], has_else: Bool |
conditional expression |
For |
item: String, list, body: [stmt] |
for-in expression |
Match |
subject, arms: [{ pattern, body }] |
pattern match |
DurationLit |
count: String, unit: String |
30.seconds, 1.hour |
Try |
inner |
postfix ? (no-op passthrough today) |
Binary operators (op field values): Plus, Minus, Star, Slash, EqEq, NotEq, Lt, Gt, LtEq, GtEq, And, Or.
Operator precedence (higher = tighter binding):
| Level | Operators |
|---|---|
| 6 | Star, Slash |
| 5 | Plus, Minus |
| 4 | Lt, Gt, LtEq, GtEq |
| 3 | EqEq, NotEq |
| 2 | And |
| 1 | Or |
Pattern nodes (used inside Match arms):
pattern value |
Fields | Meaning |
|---|---|---|
Wildcard |
— | _ — always matches |
Binding |
name: String |
binds subject to name |
LitInt |
value: String |
integer literal pattern |
LitStr |
value: String |
string literal pattern |
LitBool |
value: String |
boolean literal pattern |
Statement nodes:
stmt value |
Fields | Meaning |
|---|---|---|
Let |
name: String, value: expr, type: String |
variable binding |
Assign |
name: String, value: expr |
bare reassignment name = expr |
Return |
value: expr |
return statement |
While |
cond: expr, body: [stmt] |
while loop |
For |
item: String, list: expr, body: [stmt] |
for-in loop |
FnDef |
name: String, params: [param], body: [stmt], ret_type: String, decorator?: String |
function definition |
ExternFn |
name: String, params: [param], ret_type: String |
forward declaration |
TypeDef |
name: String, fields: [{ name: String }] |
struct type definition |
EnumDef |
name: String, variants: [{ name: String }] |
enum definition |
Import |
path: String |
import "file.el" or from mod import { … } |
CgiBlock |
name, dharma_id, principal, network, engram, has_*: Bool |
CGI identity declaration |
ServiceBlock |
name, sponsor, domain |
service declaration |
Expr |
value: expr |
bare expression statement |
Param nodes: { "name": String, "type": String } where type is the leading identifier of the type annotation (e.g., "Int", "String", "Map") or "" if unannotated.
2.3 The Type System
Type annotations are parsed and stored but not type-checked at compile time. They serve as documentation and as hints to the codegen for arithmetic dispatch.
Built-in types:
| Type | C representation | Notes |
|---|---|---|
String |
const char* cast to el_val_t |
via EL_STR() macro |
Int |
int64_t |
direct |
Bool |
int64_t |
0 = false, nonzero = true |
Float |
int64_t |
bit-cast double via el_from_float() |
Void |
void |
functions returning nothing |
Any |
void* cast to el_val_t |
generic containers |
[T] |
el_val_t |
pointer to ElList struct |
Map<K,V> |
el_val_t |
pointer to ElMap struct |
Temporal types (first-class in codegen):
| Type | Representation | Notes |
|---|---|---|
Instant |
nanoseconds since Unix epoch as int64_t |
now() returns this |
Duration |
signed nanoseconds as int64_t |
30.seconds = 30 * 1000000000 |
Calendar |
pointer to heap-allocated struct | earth_calendar(zone) |
CalendarTime |
pointer to heap-allocated struct | now_in(cal) |
LocalDate |
pointer to heap-allocated struct | local_date(y, m, d) |
LocalTime |
nanoseconds since midnight, direct int64_t |
local_time(h, m, s, ns) |
Zone |
pointer to heap-allocated struct | zone("America/New_York") |
Rhythm |
pointer to heap-allocated struct | recurrence pattern |
The codegen tracks type-annotated variable names in per-function process state (__int_names, __instant_names, __duration_names, etc.) to dispatch arithmetic and comparisons through the correct runtime wrappers. Type-mismatched operations (e.g., Instant + Instant) are emitted as #error directives.
Duration postfix literals: 30.seconds, 1.hour, 500.millis, 30.nanos are parsed as DurationLit AST nodes and compiled to el_duration_from_nanos(count * multiplier). The multipliers:
| Unit | Nanoseconds |
|---|---|
nano / nanos |
1 |
milli / millis / millisecond / milliseconds |
1,000,000 |
second / seconds |
1,000,000,000 |
minute / minutes |
60,000,000,000 |
hour / hours |
3,600,000,000,000 |
day / days |
86,400,000,000,000 |
2.4 Key Language Semantics
Implicit return. The final expression in a function body becomes the return value if it is not a control-flow construct (If, For). The codegen's transform_implicit_return rewrites the last Expr statement into a Return statement before emitting.
Let-rebinding, not mutation. El uses let for both initial binding and rebinding:
let count = 0
let count = count + 1 // NOT mutation — creates a new binding in the same scope
The codegen tracks declared names per C scope. When count is already in declared, it emits count = count + 1; (plain assignment). When it is new, it emits el_val_t count = 0;. This means El does not have mutable variables in the traditional sense — every let is a potential redeclaration. The practical effect is that shadowing and in-place update use identical syntax.
Bare reassignment. The parser also handles name = expr (without let) when an Ident is immediately followed by Eq. This emits a plain C assignment.
target is reserved. The word target is lexed as the Target token kind — it cannot be used as a variable or parameter name. Use tgt or another name instead. This is a live gotcha in compiler.el itself, which uses tgt for exactly this reason.
__no_block_expr guard. The parser uses process state key __no_block_expr to suppress Map-literal parsing when parsing the condition of if, while, for, and match. This prevents a stray { (the start of the then-block) from being parsed as a Map literal.
Arena memory model. The runtime includes an arena allocator that is activated in server/long-running contexts. In CLI mode (elc, elb) the arena is inactive. Memory is managed via ARC (reference counting): el_retain() and el_release() on Lists and Maps. Strings and ints are not refcounted — the retain/release functions are safe no-ops on non-tagged values.
3. The Runtime API
All runtime functions are declared in el-compiler/runtime/el_runtime.h. Every compiled El program links against el-compiler/runtime/el_runtime.c.
All values are el_val_t (int64_t). Strings are pointers cast through int64_t using EL_STR(s) / EL_CSTR(v) macros.
Canonical compile command:
cc -std=c11 -I el-compiler/runtime -lcurl -lpthread \
-o <out> <prog>.c el-compiler/runtime/el_runtime.c
I/O
| Function | Signature | Description |
|---|---|---|
println |
(s) -> Void |
print string + newline to stdout |
print |
(s) -> Void |
print string without newline |
readline |
() -> String |
read one line from stdin |
String Operations
| Function | Signature | Description |
|---|---|---|
el_str_concat |
(a, b) -> String |
concatenate two strings |
str_concat |
(a, b) -> String |
alias for el_str_concat |
str_eq |
(a, b) -> Bool |
string equality comparison |
str_starts_with |
(s, prefix) -> Bool |
prefix test |
str_ends_with |
(s, suffix) -> Bool |
suffix test |
str_contains |
(s, sub) -> Bool |
substring test |
str_len |
(s) -> Int |
byte length |
str_slice |
(s, start, end) -> String |
substring (byte offsets) |
str_replace |
(s, from, to) -> String |
replace all occurrences |
str_to_upper / str_upper |
(s) -> String |
uppercase |
str_to_lower / str_lower |
(s) -> String |
lowercase |
str_trim |
(s) -> String |
strip leading/trailing whitespace |
str_lstrip / str_rstrip |
(s) -> String |
one-sided strip |
str_index_of |
(s, sub) -> Int |
position of substring; -1 if absent |
str_last_index_of |
(s, sub) -> Int |
last position |
str_index_of_all |
(s, sub) -> [Int] |
all byte offsets (non-overlapping) |
str_find_chars |
(s, any_of) -> Int |
first index of any char in set |
str_split |
(s, sep) -> [String] |
split on separator |
str_split_lines |
(s) -> [String] |
split on newlines |
str_split_chars |
(s) -> [String] |
split into individual characters |
str_split_n |
(s, sep, n) -> [String] |
split at most n times |
str_join |
(list, sep) -> String |
join list with separator |
str_char_at |
(s, i) -> String |
character at byte index |
str_char_code |
(s, i) -> Int |
Unicode code point at index |
str_pad_left |
(s, width, pad) -> String |
left-pad to width |
str_pad_right |
(s, width, pad) -> String |
right-pad to width |
str_format |
(fmt, data) -> String |
{key} interpolation |
str_repeat |
(s, n) -> String |
repeat string n times |
str_reverse |
(s) -> String |
reverse by codepoint |
str_strip_prefix |
(s, prefix) -> String |
remove prefix if present |
str_strip_suffix |
(s, suffix) -> String |
remove suffix if present |
str_strip_chars |
(s, chars) -> String |
strip characters from both ends |
str_count |
(s, sub) -> Int |
count non-overlapping occurrences |
str_count_chars |
(s) -> Int |
codepoint count |
str_count_bytes |
(s) -> Int |
alias for str_len |
str_count_lines |
(s) -> Int |
line count |
str_count_words |
(s) -> Int |
word count |
str_count_letters |
(s) -> Int |
ASCII letter count |
str_count_digits |
(s) -> Int |
ASCII digit count |
is_letter / is_digit / is_alphanumeric |
(s) -> Bool |
ASCII char classification |
is_whitespace / is_punctuation |
(s) -> Bool |
|
is_uppercase / is_lowercase |
(s) -> Bool |
|
int_to_str |
(n) -> String |
format integer |
str_to_int |
(s) -> Int |
parse integer |
str_to_float |
(s) -> Float |
parse float |
parse_int |
(s, default) -> Int |
parse with fallback |
bool_to_str |
(b) -> String |
format bool |
Integer/Float Math
| Function | Description |
|---|---|
el_abs(n) |
absolute value |
el_max(a, b) |
maximum |
el_min(a, b) |
minimum |
float_to_str(f) |
format float as string |
int_to_float(n) |
widen Int to Float |
float_to_int(f) |
truncate Float to Int |
format_float(f, decimals) |
format with N decimal places |
decimal_round(f, decimals) |
round to N decimals |
math_sqrt(f) |
square root |
math_log(f) / math_ln(f) |
logarithms |
math_sin(f) / math_cos(f) / math_pi() |
trigonometry |
List Operations
| Function | Description |
|---|---|
el_list_empty() |
create empty list |
el_list_new(count, …) |
create list from N values (varargs) |
el_list_len(list) |
length |
el_list_get(list, i) |
element at index; 0 on out-of-bounds |
el_list_append(list, e) |
append; returns updated list |
el_list_clone(list) |
shallow copy |
list_push(list, e) |
alias for el_list_append |
list_push_front(list, e) |
prepend |
list_join(list, sep) |
join to string |
list_range(start, end) |
integer range [start, end) |
native_list_empty() |
alias for el_list_empty (used in compiler source) |
native_list_append(l, v) |
alias for el_list_append |
native_list_get(l, idx) |
alias for el_list_get |
native_list_len(l) |
alias for el_list_len |
native_list_clone(l) |
alias for el_list_clone |
append(l, e) |
method-call alias: list.append(e) |
len(l) |
method-call alias: list.len() |
get(l, i) |
method-call alias: list.get(i) |
Map Operations
| Function | Description |
|---|---|
el_map_new(count, …) |
create map from key/value pairs (varargs) |
el_map_get(map, key) |
get value by key |
el_map_set(map, key, value) |
set key; returns map |
el_get_field(map, key) |
alias; emitted for .field access |
map_get(map, key) |
method-call alias |
map_set(map, key, value) |
method-call alias |
ARC (Reference Counting)
| Function | Description |
|---|---|
el_retain(v) |
increment refcount; no-op for non-heap values |
el_release(v) |
decrement refcount; free when zero |
In-Process State
| Function | Description |
|---|---|
state_set(key, value) |
store in process-global key/value table |
state_get(key) |
retrieve; "" if absent |
state_del(key) |
delete key |
state_keys() |
all keys as [String] |
Filesystem
| Function | Description |
|---|---|
fs_read(path) |
read file to string; "" on error |
fs_write(path, content) |
write string; returns 1 on success |
fs_write_bytes(path, bytes, length) |
write raw bytes of known length |
fs_list(path) |
list directory entries |
fs_exists(path) |
check if path exists |
fs_mkdir(path) |
mkdir -p |
HTTP Client
| Function | Description |
|---|---|
http_get(url) |
GET; returns body string |
http_post(url, body) |
POST; returns body string |
http_post_json(url, json_body) |
POST with Content-Type: application/json |
http_get_with_headers(url, headers_map) |
GET with custom headers |
http_post_with_headers(url, body, headers_map) |
POST with custom headers |
http_post_form_auth(url, form_body, auth_header) |
POST with auth |
http_delete(url) |
DELETE |
http_get_to_file(url, headers_map, output_path) |
stream response to file |
http_post_to_file(url, body, headers_map, output_path) |
stream POST response to file |
http_response(status, headers_json, body) |
build response envelope |
url_encode(s) |
RFC 3986 percent-encoding |
url_decode(s) |
URL decode |
el_html_sanitize(html, allowlist_json) |
allowlist HTML sanitizer |
HTTP Server
| Function | Description |
|---|---|
http_serve(port, handler) |
start server; handler: (method, path, body) -> String |
http_serve_v2(port, handler) |
start server; handler: (method, path, headers_map, body) -> String |
http_set_handler(name) |
set handler by symbol name |
http_set_handler_v2(name) |
v2 variant |
JSON
| Function | Description |
|---|---|
json_get(json, key) |
substring lookup of "key": value |
json_parse(s) |
parse JSON string to List/Map |
json_stringify(v) |
serialize Any to JSON string |
json_get_string(j, key) |
typed extract: String |
json_get_int(j, key) |
typed extract: Int |
json_get_float(j, key) |
typed extract: Float |
json_get_bool(j, key) |
typed extract: Bool |
json_get_raw(j, key) |
extract nested object/array as JSON string |
json_set(j, key, value) |
update field, return new JSON string |
json_array_len(j) |
length of JSON array string |
json_array_get(j, index) |
element at index |
json_array_get_string(j, index) |
string element at index |
Time (Epoch-Based)
| Function | Description |
|---|---|
time_now() |
Unix epoch milliseconds |
time_now_utc() |
same, explicit UTC |
time_format(ts, fmt) |
format timestamp |
time_to_parts(ts) |
decompose to Map of fields |
time_from_parts(secs, ns, tz) |
construct timestamp |
time_add(ts, n, unit) |
add duration |
time_diff(ts1, ts2, unit) |
difference |
unix_timestamp() |
Unix seconds as Int |
sleep_secs(secs) |
sleep N seconds |
sleep_ms(ms) |
sleep N milliseconds |
Time (First-Class Instant/Duration)
| Function | Description |
|---|---|
now() / el_now_instant() |
current time as Instant (nanoseconds) |
unix_seconds(n) |
construct Instant from Unix seconds |
unix_millis(n) |
construct Instant from Unix milliseconds |
instant_from_iso8601(s) |
parse ISO 8601 string |
instant_to_unix_seconds(i) |
extract Unix seconds |
instant_to_unix_millis(i) |
extract Unix milliseconds |
instant_to_iso8601(i) |
format as ISO 8601 |
el_duration_from_nanos(ns) |
construct Duration from nanoseconds |
duration_seconds(n) |
Duration from seconds |
duration_millis(n) |
Duration from milliseconds |
duration_nanos(n) |
Duration from nanoseconds |
duration_to_seconds(d) |
extract seconds |
duration_to_millis(d) |
extract milliseconds |
duration_to_nanos(d) |
extract nanoseconds |
el_instant_add_dur(inst, dur) |
Instant + Duration |
el_instant_sub_dur(inst, dur) |
Instant - Duration |
el_instant_diff(a, b) |
Instant - Instant = Duration |
el_duration_add/sub/scale/div |
Duration arithmetic |
el_instant_lt/le/gt/ge/eq/ne |
Instant comparison |
el_duration_lt/le/gt/ge/eq/ne |
Duration comparison |
el_sleep_duration(dur) |
sleep for a Duration |
ttl_cache_set(key, value) |
store with TTL |
ttl_cache_get(key, max_age) |
retrieve if within max_age |
ttl_cache_age(key) |
age of cached value as Duration |
Calendar System
| Function | Description |
|---|---|
zone(id) |
IANA zone or fixed offset |
zone_utc() / zone_local() |
UTC and local zone |
zone_offset(hours, minutes) |
fixed offset zone |
earth_calendar(z) |
Gregorian calendar in zone |
earth_calendar_default() |
system default |
mars_calendar() / cycle_calendar(period) |
non-Earth calendars |
no_cycle_calendar() / relative_calendar(epoch) |
abstract calendars |
now_in(cal) |
current time as CalendarTime |
in_calendar(inst, cal) |
project Instant into Calendar |
cal_format(ct, pattern) |
format CalendarTime |
cal_to_instant(ct) |
extract underlying Instant |
cal_cycle_phase(ct) / cal_in(ct, cal) |
calendar ops |
local_date(y, m, d) |
construct LocalDate |
local_time(h, m, s, ns) |
construct LocalTime |
local_datetime(date, time) |
construct LocalDateTime |
zoned(date, time, cal) |
zoned datetime |
local_date_year/month/day |
LocalDate accessors |
local_time_hour/minute/second/nanos |
LocalTime accessors |
el_local_date_add_dur / el_local_time_add_dur |
date/time arithmetic |
el_local_date_lt / el_local_date_eq |
date comparison |
rhythm_* |
recurrence patterns (cycle_start, weekday, weekly_at, next_after, matches, …) |
Process / Execution
| Function | Description |
|---|---|
args() |
command-line arguments as [String] (excludes argv[0]) |
env(key) |
read environment variable; "" if unset |
exit(code) |
exit process with code |
exit_program(code) |
alias for exit |
getpid_now() |
current process ID |
exec_command(cmd) |
run shell command; return exit code |
exec_capture(cmd) |
run shell command; capture and return stdout |
uuid_new() / uuid_v4() |
generate UUID v4 |
native_int_to_str(n) |
format integer (alias, used in compiler source) |
native_string_chars(s) |
split string into [String] of single characters |
Crypto
| Function | Description |
|---|---|
sha256_hex(input) |
SHA-256, hex output |
sha256_bytes(input) |
SHA-256, raw bytes |
hmac_sha256_hex(key, msg) |
HMAC-SHA-256, hex |
hmac_sha256_bytes(key, msg) |
HMAC-SHA-256, raw bytes |
base64_encode(input) / base64_decode(input) |
standard base64 |
base64url_encode(input) / base64url_decode(input) |
URL-safe base64 |
sha3_256_hex(input) |
SHA3-256 (Keccak) |
pq_keygen_signature() |
Dilithium-3 key pair |
pq_sign(sk_hex, msg) / pq_verify(pk_hex, msg, sig_hex) |
PQ signatures |
pq_kem_keygen() / pq_kem_encaps(pk) / pq_kem_decaps(sk, ct) |
Kyber-768 KEM |
pq_hybrid_keygen() / pq_hybrid_handshake(remote_pub) |
X25519 + Kyber hybrid |
aead_encrypt(key_hex, plaintext) |
AES-256-GCM encrypt |
aead_decrypt(key_hex, nonce_hex, ct_hex) |
AES-256-GCM decrypt |
DHARMA Network (CGI programs only)
| Function | Description |
|---|---|
el_cgi_init(name, dharma_id, principal, network, engram) |
initialize CGI identity (called by generated main()) |
dharma_connect(cgi_id) |
open channel to peer |
dharma_send(channel, content) |
send message; blocks for response |
dharma_activate(query) |
spreading activation across DHARMA network |
dharma_emit(event_type, payload) |
emit network event (@manager only) |
dharma_field(event_type) |
wait for event (@manager only) |
dharma_strengthen(cgi_id, weight) |
Hebbian potentiation |
dharma_relationship(cgi_id) |
current relationship weight |
dharma_peers() |
all connected peers sorted by weight |
Engram Knowledge Graph
| Function | Description |
|---|---|
engram_node(content, type, salience) |
create node; returns ID |
engram_node_full(content, type, label, salience, importance, confidence, tier, tags) |
full node creation |
engram_node_layered(…, layer_id) |
create node in specific layer |
engram_get_node(id) |
retrieve node by ID |
engram_strengthen(node_id) |
Hebbian potentiation |
engram_forget(node_id) |
delete node and edges |
engram_node_count() |
total node count |
engram_edge_count() |
total edge count |
engram_search(query, limit) |
full-text search |
engram_scan_nodes(limit, offset) |
paginated node scan |
engram_connect(from, to, weight, relation) |
create directed edge |
engram_edge_between(from, to) |
get edge |
engram_neighbors(node_id) |
BFS neighbors |
engram_neighbors_filtered(node_id, max_depth, direction) |
filtered BFS |
engram_activate(query, depth) |
spreading activation |
engram_save(path) / engram_load(path) |
snapshot to/from disk |
engram_add_layer(name, priority, suppressible, transparent, injectable) |
add consciousness layer |
engram_remove_layer(layer_id) / engram_list_layers() |
layer management |
engram_*_json variants |
JSON-string versions of search/scan/activate |
engram_compile_layered_json(intent, depth) |
prompt-ready context block |
LLM (Anthropic API)
| Function | Description |
|---|---|
llm_call(model, prompt) |
single-turn call |
llm_call_system(model, system, user) |
call with system prompt |
llm_call_agentic(model, system, user, tools) |
agentic call with tools (CGI only) |
llm_vision(model, system, prompt, image) |
vision call |
llm_models() |
list available models |
llm_register_tool(name, handler_fn_name) |
register tool handler (CGI only) |
Observability
| Function | Description |
|---|---|
emit_log(level, msg, fields_json) |
emit OTLP log |
emit_metric(name, value, tags_json) |
emit OTLP metric |
trace_span_start(name) |
start trace span |
trace_span_end(span_handle) |
end trace span |
emit_event(name, duration_ms) |
emit event |
4. How to Re-Bootstrap from Zero
This section assumes the bootstrap binary is gone. Everything else (source files, runtime) is intact.
What You Need to Implement
A minimal El compiler has three parts: lexer, parser, codegen. Each can be written in any language. The goal is to compile elc-cli.el into a working elc binary, after which El is self-hosting again.
Step 1: Write a Minimal Lexer
The lexer must produce a list of { "kind": String, "value": String } maps (or equivalent structures). Required token kinds: Int, Float, Str, Bool, Ident, Eof, and all keywords and operators listed in section 2.1.
The minimal subset needed to compile the compiler itself:
- Keywords:
let,fn,return,if,else,while,for,in,import,from,true,false,extern - Literals:
Int,Str,Bool,Ident - Operators:
=,==,!=,!,<,>,<=,>=,&&,||,+,-,*,/,->,=>,:,,,.,(,),{,},[,],@,? - Special:
Eof
The lexer in lexer.el walks a char array using native_list_get to avoid O(n²) string slicing. A Python implementation can use a simple index into a string. Escapes to handle: \", \n, \t, \r, \\.
Step 2: Write a Minimal Parser
The parser is a standard recursive descent parser. It produces AST maps as described in section 2.2.
The minimal statement forms needed to compile the compiler:
let name [: Type] = exprfn name(params) [-> Type] { body }extern fn name(params) [-> Type]return exprwhile cond { body }for item in list { body }if cond { body } [else [if] { body }]import "path"from module import { … }@decorator stmtname = expr(bare assignment)- bare expression statement
The minimal expression forms:
- Integer, float, string, bool literals
- Identifier
- Binary operations with the precedence table from section 2.2
- Unary
!and- - Function call:
f(a, b, …) - Method call:
obj.method(args)(parsed as Call with Field func) - Field access:
obj.field - Index access:
obj[i] - Array literal:
[e1, e2, …] - Map literal:
{ "key": value, … } ifas expressionmatchexpression- Postfix
?(can be a no-op) - Duration literal:
N.unit
The __no_block_expr guard (section 2.4) is important: without it, if a || b { ... } will incorrectly parse { as a Map literal.
Step 3: Write a Minimal Codegen
The codegen emits C11 source. Required output structure:
#include <stdint.h>
#include <stdlib.h>
#include "el_runtime.h"
// Forward declarations for all non-main functions
el_val_t fn_name(el_val_t p1, el_val_t p2);
...
// File-scope let bindings (if any)
el_val_t GLOBAL_NAME;
// Function bodies
el_val_t fn_name(el_val_t p1, el_val_t p2) {
...
return 0;
}
// Entry point
int main(int _argc, char** _argv) {
el_runtime_init_args(_argc, _argv);
...
return 0;
}
Critical codegen rules:
-
All values are
el_val_t. Every parameter, local variable, and return type isel_val_tunless the function hasret_type == "Void"(usevoid). -
Let-rebinding: track declared names per C scope. Emit
el_val_t name = val;on first occurrence; emitname = val;on subsequent occurrences of the same name in the same scope. -
+dispatch: if either operand is a string literal →el_str_concat(a, b). If both are provably integers →(a + b). Default fallback →el_str_concat. -
==dispatch: if either operand is a string or identifier →str_eq(a, b). If both are integer literals or provably Int →(a == b). -
String literals: wrap in
EL_STR("…")and escape:\"→\\\",\n→\\n,\t→\\t,\\→\\\\. -
Map literals:
el_map_new(N, "k1", v1, "k2", v2, …). Empty map:el_map_new(0). -
Array literals:
el_list_new(N, e1, e2, …). Empty:el_list_empty(). -
Index access: string-literal index →
el_get_field(obj, EL_STR("key")). Integer index →el_list_get(obj, idx). -
Field access
obj.field→el_get_field(obj, EL_STR("field")). -
Method call
obj.method(args)→method(obj, args). -
for item in list→ emit:{ el_val_t _el_lst = <list>; el_val_t _el_len = el_list_len(_el_lst); for (el_val_t _el_i = 0; _el_i < _el_len; _el_i++) { el_val_t item = el_list_get(_el_lst, _el_i); <body> } } -
match→ GCC/Clang statement expression withgoto:({ el_val_t _s = <subject>; el_val_t _r = 0; if (_s == 42) { _r = <arm_body>; goto _done; } if (str_eq(_s, EL_STR("str"))) { _r = <arm_body>; goto _done; } { _r = <wildcard_body>; goto _done; } _done:; _r; }) -
ifas expression → similarly wrapped in a GCC/Clang statement expression. -
Implicit return: if the last statement in a function body is a bare
Expr(notIforFor), emit it asreturn <expr>;instead of<expr>;. -
Float literals: emit as
el_from_float(<value>). -
Bool literals:
true→1,false→0. -
fn main(): do not emit as a regularel_val_tfunction. Instead, fold its body into C'sint main()after any top-level statements. -
extern fn: emit only a forward declaration (no body). -
Forward declarations: scan for all
FnDefnodes before emitting bodies. This enables mutual recursion.
Step 4: Compile the El Compiler
Using your minimal implementation, compile elc-cli.el (which imports the entire compiler chain):
# Your minimal compiler
python3 minimal_elc.py elc-cli.el > elc-new.c
# Build with the runtime
cc -std=c11 -I el-compiler/runtime -lcurl -lpthread \
-o elc-new elc-new.c el-compiler/runtime/el_runtime.c
Step 5: Verify Self-Hosting
# Compile elc-cli.el with the new compiler
./elc-new elc-cli.el elc-v2.c
cc -std=c11 -I el-compiler/runtime -lcurl -lpthread \
-o elc-v2 elc-v2.c el-compiler/runtime/el_runtime.c
# Compile again with the second-generation compiler
./elc-v2 elc-cli.el elc-v3.c
# The outputs should be identical
diff elc-v2.c elc-v3.c
A clean diff confirms you have a stable fixed point: the compiler reproduces itself exactly.
Step 6: Replace the Bootstrap Binary
cp elc-v2 dist/platform/elc
You are bootstrapped.
Minimal El Subset for the Compiler Itself
The El compiler source (lexer.el, parser.el, codegen.el, compiler.el) uses:
fn,let,while,if/else,return,for/in,importextern fn(for.elhheaders)String,Int,Bool,Void,Any,Map<String, Any>,[String],[Map<String, Any>]- Map literals
{ "key": val } - Array literals
[...](andnative_list_empty()) - List operations:
native_list_empty(),native_list_append(),native_list_get(),native_list_len(),native_list_clone() - String operations:
str_join(),str_eq(),str_contains(),str_starts_with(),str_slice(),str_trim(),str_split(),str_index_of(),str_len(),str_to_int(),native_string_chars(),native_int_to_str() state_get(),state_set()println(),fs_read(),fs_write(),exit()el_release()(ARC cleanup)
The compiler does not use: HTTP, engram, dharma, LLM, crypto, UUID, float arithmetic.
5. The Long-Term Solution: elvm
Why a VM Makes Bootstrapping More Auditable
The current bootstrap chain relies on trusting a binary whose source we cannot fully audit by inspection alone. This is the classic "trusting trust" problem (Ken Thompson, 1984). A virtual machine breaks the chain:
elctargetselvmbytecode (instead of C)elvmis a minimal interpreter hand-written in ~500 lines of C- The hand-written C is small enough to audit completely
- Anyone can compile
elvm.cwith any C compiler - From there:
elvminterpretselc.elvm→elccompiles El →ccbuilds native binaries
The benefit: the trusted base shrinks from "a Mach-O binary" to "500 lines of straightforward C code that anyone can read in an afternoon."
The elvm Design
A minimal elvm needs:
- A stack or register machine (stack is simpler)
- Instructions: push, pop, add, sub, mul, div, cmp, jump, call, return, load, store
- A string table (El strings are mostly literals)
- A heap for ElList and ElMap
- An FFI table mapping El runtime builtins to C functions
The El compiler would gain a --target=elvm flag in compile_dispatch(). Codegen would emit bytecode instead of C text. The runtime interface stays the same — builtins map to FFI slots by name.
This is the planned path. It does not exist yet.
6. Compiler Source Map
| File | Role | Lines |
|---|---|---|
elc-cli.el |
Entry point; imports compiler.el | 7 |
el-compiler/src/compiler.el |
Pipeline wiring: lex → parse → codegen. Import resolution, --emit-header, fn main(). Defines compile(), compile_js(), compile_dispatch(), resolve_imports() |
298 |
el-compiler/src/lexer.el |
Tokenizer. lex(source) → token list. Char helpers, keyword lookup, scan_digits, scan_ident, scan_string, strip_code_comments |
747 |
el-compiler/src/parser.el |
Recursive descent parser. parse(tokens) → AST. All statement and expression forms |
1071 |
el-compiler/src/codegen.el |
C code emitter. codegen(stmts, source) → (streams to stdout). Expression codegen, statement codegen, function codegen, type tracking, capability enforcement, temporal type dispatch |
2721 |
el-compiler/src/codegen-js.el |
JavaScript backend. codegen_js(stmts, source) → JS source |
~500 |
el-compiler/runtime/el_runtime.h |
Full runtime API declaration | 755 |
el-compiler/runtime/el_runtime.c |
Full runtime implementation | large |
el-compiler/runtime/el_runtime.js |
JS runtime | — |
elb.el |
Build coordinator. Reads manifest.el, walks import graph, compiles modules, links binary. The .NET-style incremental build model |
367 |
elc-combined.el |
Pre-merged single-file bootstrap edition (for early bootstrap iterations) | large |
spec/language.md |
Language specification v1.2.0 | — |
dist/platform/elc |
Current bootstrap binary (Mach-O arm64) | — |
7. Key Decisions and Gotchas
target is a Reserved Keyword
target is lexed as the Target token kind. It cannot be used as a variable or parameter name anywhere in El source. If you write fn compile(target: String), the parameter name will be tokenized as Target, which the parser does not recognize as an Ident in parameter position.
Workaround: use tgt, dest, backend, or any other name. The compiler source uses tgt specifically for this reason. This comes up whenever writing code that handles compilation targets.
let x = x + 1 is Let-Rebinding, Not Mutation
El has no mutable variables. let count = count + 1 re-introduces count into the current scope, shadowing the previous binding. At the C level, the codegen tracks declared names and emits plain assignment for subsequent bindings of the same name:
- First
let count = 0→el_val_t count = 0; - Second
let count = count + 1→count = count + 1;
This means you cannot have two different values named count in the same C scope — the second binding overwrites the first. This is by design. Scoped shadowing works correctly because each block (if body, while body, for body) gets its own copy of the declared list.
Arena is Inactive in CLI Mode
The runtime includes an arena allocator designed for long-running server processes. In CLI mode (elc, elb) the arena is not activated. Memory is managed by ARC (reference counting via el_retain/el_release). The compiler source explicitly calls el_release(tokens) after parsing and el_release(stmt) after codegen to prevent memory exhaustion on large source files.
If you are implementing a new runtime or embedding El, be aware that the ARC model expects callers to release values they are done with.
The extern fn / .elh Separate Compilation Model
elb (the build coordinator) supports separate compilation. When a module changes:
elc --emit-header module.el module.ccompiles the module and writesmodule.elhmodule.elhcontainsextern fndeclarations for all public functions- Other modules that import
module.eluse the.elhheader instead of re-parsing the source
The resolve_imports function in compiler.el checks for a .elh file before recursively inlining the .el source. If the header exists, it is used (and the .el is marked as seen to prevent double-inclusion).
This is important for bootstrap: if you have pre-compiled headers lying around from a broken build, they may shadow updated source. Delete .elh files (or use elb --clean) when debugging unexpected compilation behavior.
Import Resolution: Depth-First with Deduplication
resolve_imports in compiler.el:
- Walks imports depth-first (dependencies before dependents)
- Uses
state_set("__elc_imp__:" + path, "1")to deduplicate: each file is included exactly once - Builds the combined source string by concatenating import bodies ahead of the entry file's body
- If a
.elhheader exists for an import, uses that instead of recursing into the.el
The result is one large string that gets passed through lex → parse → codegen as a single unit. The codegen emits forward declarations for all functions before any body, so declaration order within the combined source does not matter.
+ Operator Dispatch is Heuristic
El's + operator serves double duty: integer addition and string concatenation. The codegen dispatches based on static analysis of the AST:
- If either operand is a
Strliteral →el_str_concat - If both operands are provably
Int(viais_int_expr) →(a + b) - If either operand is a
CallorIdent→el_str_concat(conservative fallback)
The is_int_expr predicate recurses through the AST: literal Int, names in __int_names (from : Int annotations), known Int-returning builtins, and arithmetic BinOps over Int operands all count as "provably Int."
If you write let result = some_int_var + 1 and some_int_var is not annotated : Int, the codegen may emit el_str_concat instead of integer addition. Fix by adding : Int to the variable declaration.
== Operator Dispatch is Also Heuristic
Similarly, == dispatches between str_eq(a, b) (string comparison) and (a == b) (integer comparison) based on operand types. The codegen tracks Int-typed names in __int_names. Two Ident operands where both are known Int-typed use ==; all other Ident-Ident comparisons use str_eq.
This means comparing two integer variables that were not annotated : Int can silently produce str_eq on what are actually integer values — and str_eq treats them as const char* pointers, producing incorrect results or segfaults.
Rule: always annotate variables : Int when they will participate in == comparisons or + arithmetic.
Capability Kind Enforcement
The codegen classifies programs into three capability tiers based on top-level declarations:
cgiblock present → full capability (all primitives allowed)serviceblock present → restricted (nollm_call_agentic,llm_register_tool,dharma_emit,dharma_field)- Neither →
utility(no DHARMA, no LLM)
Violations are collected during codegen and emitted as #error directives at the bottom of the generated C. The downstream cc step then fails with a clear message naming the forbidden call.
The __no_block_expr Parse Guard
When parsing the condition of if, while, for, and match, the parser sets state_set("__no_block_expr", "1"). This prevents parse_primary from treating a { as the start of a Map literal — instead it returns { "expr": "Nil" } and the caller sees the { and treats it as the block delimiter.
Without this guard, if a || b { ... } would recurse into parse_expr for b, hit {, try to parse it as a Map literal, fail to find string keys, loop in error-recovery mode, and hang.
Codegen Streams Output via println
The codegen does not build the output as a string — it calls println() for each line as it is emitted. The compile() / compile_js() / codegen() functions return "". Output goes to stdout.
This design avoids O(n²) string concatenation for large programs. It also means you cannot capture the compiler's output in a variable within El itself — you must redirect stdout at the OS level (elc source.el > output.c).
When writing to a file, elc detects the output path argument, redirects C's stdout to the file (via freopen in the runtime), and the println calls go there instead.