Commit Graph

51 Commits

Author SHA1 Message Date
Will Anderson 53f2df500d runtime: add dharma-required functions to el_runtime.c and runtime/*.el
Add the following functions that dharma registry calls but were missing
from the El runtime:

el_runtime.c (consumed by the old build system via released SDK):
  - list_len, list_get — aliases for el_list_len/el_list_get (handlers.el)
  - json_array_push — append pre-encoded element to JSON array string
  - now_millis, unix_timestamp_ms, time_now_ms — ms-since-epoch aliases
  - log_info, log_warn — structured stderr log helpers
  - config — reads config from environment (alias for getenv)
  - http_patch — HTTP PATCH with Content-Type: application/json
  - http_post_engram — HTTP POST with optional X-API-Key header
  - http_get_engram — HTTP GET with optional X-API-Key header
  - str_to_bytes — encode string as JSON byte array [72,101,...]
  - bytes_to_str — decode JSON byte array back to string
  - hash_sha256 — SHA-256 hex digest using built-in sha256 impl

runtime/*.el (consumed by the new build system):
  - http.el: http_patch, http_post_engram, http_get_engram
  - time.el: now_millis, unix_timestamp_ms, time_now_ms
  - env.el: config, log_info, log_warn, list_len, list_get
  - json.el: json_array_push, bytes_to_str
  - string.el: str_to_bytes, hash_sha256 (via __sha256_hex seed)

el_seed.h / el_seed.c:
  - __sha256_hex primitive with self-contained SHA-256 implementation
2026-05-04 19:07:08 -05:00
Will Anderson 245eb2898e runtime: declare __thread_create and __thread_join in header for C99 compliance 2026-05-03 18:00:47 -05:00
Will Anderson 6ede9e4379 runtime: restore el_runtime.c as build shim; fix el_seed.h self-contained types
el_runtime.c was deleted prematurely — elb still resolves the runtime at build time
via a hardcoded relative path, and the elc code generator still emits
#include "el_runtime.h" in generated C.

Restoring el_runtime.c + el_runtime.h as the working build runtime until the
compiler is updated to emit #include "el_seed.h" and link against el_seed.c
directly.

el_seed.h: remove #include "el_runtime.h" that broke after el_runtime.h deletion;
add inline el_val_t typedef + macros + float cast helpers so el_seed.h is fully
self-contained.
2026-05-03 17:35:14 -05:00
Will Anderson 4ae42ee7db runtime: native SSE streaming — http_sse_open/send/close
Add Server-Sent Events support to the El runtime. El v2 handlers can now
hold HTTP connections open and push events in real time.

New builtins in el_seed.c:
  __http_conn_fd()          — retrieve raw fd from thread-local set by worker
  __http_sse_open(fd)       — send SSE headers (text/event-stream), keep-alive
  __http_sse_send(fd, data) — write "data: <data>\n\n" frame
  __http_sse_close(fd)      — close the connection fd

http_worker_v2 in legacy/el_runtime.c now:
  - stashes the fd via el_seed_set_http_conn_fd() before calling the handler
  - detects the "__sse__" sentinel return value to skip http_send_response
    and skip close(fd) — SSE handler took ownership of the fd
  - clears the thread-local after the handler returns

El wrappers added to runtime/http.el:
  http_conn_fd() http_sse_open(fd) http_sse_send(fd, data)
  http_sse_close(fd) http_sse_sentinel()
2026-05-03 17:15:37 -05:00
Will Anderson 3e5130e98d remove el_runtime.c — runtime is 100% native El
el_runtime.c and el_runtime.h removed from the active runtime directory
(archived copies remain in el-compiler/runtime/legacy/).
tools/lsp/build.sh removed as it depended on el_runtime.c directly.
AGENTS.md updated to reflect el_seed.c as the sole C dependency.
2026-05-03 17:10:04 -05:00
Will Anderson beb4e436e1 archive el_runtime.c — native El runtime complete, seed is self-contained
el_seed.c now defines el_request_start/el_request_end directly (delegating
to its own seed arena) rather than declaring them as externs from el_runtime.c.
Header comment updated to reflect self-contained build.
2026-05-03 17:08:49 -05:00
Will Anderson 0c9154551f merge tools/lsp (full) — 184 completions, hover, go-to-def, live diagnostics, enhanced VSCode extension 2026-05-03 16:01:11 -05:00
Will Anderson 9aa0c49d0c add full El LSP — completions, hover, go-to-def, diagnostics, VSCode extension 2026-05-03 15:59:42 -05:00
Will Anderson c45744d8ca merge runtime/seed — el_seed.c minimal C OS boundary (968 lines) 2026-05-03 15:52:21 -05:00
Will Anderson 2e778ca664 merge compiler/string-interp — string interpolation via lexer desugaring 2026-05-03 15:52:21 -05:00
Will Anderson 641227a7d3 merge runtime/channels — MPMC buffered channels, channel_pipeline, channel_fan_out 2026-05-03 15:52:21 -05:00
Will Anderson ce9a2caff4 add string interpolation to El ("hello ${name}")
Lexer gains scan_interp_string which replaces scan_string in the main
lex loop. When no ${ is found it behaves identically to before (single
Str token). When interpolations are present it emits a flat token
sequence — Str, Plus, (expr tokens), Plus, Str, … — that the existing
parse_binop / cg_expr BinOp-Plus-string path assembles into nested
el_str_concat calls with zero parser or codegen changes.

Key design choices:
- scan_interp_brace tracks { depth so fn(a, b) inside ${} is safe
- inner expr tokens are wrapped in ( ) so operators like + in ${n+1}
  do not associate with the surrounding concat Plus tokens
- \$ escapes to a literal dollar sign; bare $ not before { passes through
- empty ${} emits an empty string segment
2026-05-03 15:50:23 -05:00
Will Anderson d1af4b0f8b add channels to El — buffered MPMC channel with send/recv/close
Introduces Go-style channels as El's mid-flight communication primitive,
completing the threading model: threads can now not only spawn/join but
also communicate while running.

Part 1 — seed layer (el_runtime.c / el_runtime.h):
- Add __thread_create/__thread_join/__mutex_new/__mutex_lock/__mutex_unlock
  as C seed primitives (dlsym-based thread dispatch, pthread mutex table)
- Add __channel_new/__channel_send/__channel_recv/__channel_try_recv/__channel_close
  as MPMC channel seed primitives backed by mutex + condvar + circular buffer
- Bounded channels (cap > 0): circular buffer, sender blocks when full
- Unbounded channels (cap == 0): dynamic array, grows on demand, never blocks
- channel_close wakes all blocked recvers/senders; recv drains then returns ""

Part 2 — El API (runtime/channel.el):
- channel_new/send/recv/try_recv/close — thin wrappers over seed layer
- channel_pipeline — spawn N worker threads reading from in_ch, applying
  fn_name, writing to out_ch; workers exit on "" sentinel from close
- channel_drain — collect all messages from a closed channel into [String]
- channel_fan_out — send a [String] list into a channel then close it

Part 3 — codegen.el:
- Register all 10 seed builtins (__thread_* + __channel_*) in builtin_arity
  so the arity checker validates call sites at compile time
2026-05-03 15:50:13 -05:00
Will Anderson cfcedff7f4 implement match statement codegen in El
Add cg_match_stmt() to lower match-as-statement to proper C if/else if/else
chains. Previously, match in statement position fell through to cg_expr() which
emitted a GCC statement-expression — fine for expression arms but wrong for the
statement form. Now matched using the same dispatch pattern as If and For in the
Expr handler of cg_stmt().

Pattern dispatch mirrors cg_match (expression form):
  LitStr  -> str_eq(subj, EL_STR("..."))
  LitInt  -> subj == N
  LitBool -> subj == 1 / 0
  Binding -> else { el_val_t name = subj; body; }
  Wildcard -> else { body; }

Subject is evaluated once into a scoped temporary to avoid double evaluation.
2026-05-03 15:47:50 -05:00
Will Anderson eab483ed4f add el_seed.c — minimal C OS boundary for El runtime migration
Introduces el_seed.c / el_seed.h as the clean OS-boundary layer for new-generation
El programs. All public symbols use the __ prefix convention; el_val_t (int64_t) is
the universal value type throughout.

Key additions over el_runtime.c:
- __thread_create / __thread_join: pthreads + dlsym(RTLD_DEFAULT) parallelism
  foundation. Static ElThread table (64 slots); worker resolves El fn symbols at
  runtime, stores result string for join to return.
- __mutex_new / __mutex_lock / __mutex_unlock: pooled pthread_mutex_t handles
- __http_do: unified curl call with JSON headers string (vs ElMap) and explicit
  timeout_ms parameter
- __fs_list_raw: returns newline-separated filename string (not ElList)
- __str_char_at: returns Int byte value (not single-char String)
- __args_json: CLI args as JSON array string; seeded by el_seed_init_args()

JSON, state, engram, HTML/URL, serve — thin __ wrappers over el_runtime.c.
Private seed arena (parallel to el_runtime.c arena) for standalone use.
2026-05-03 15:45:52 -05:00
Will Anderson f271f9d9d8 add for-range loops to El (for i in 0..n)
Adds `for i in start..end` (exclusive) and `for i in start..=end`
(inclusive) range loop syntax. Existing `for item in list` iteration
is preserved; the parser branches on DotDot/DotDotEq presence after
the start expression. Lexer adds DotDot and DotDotEq tokens with
longer-match-first priority. Codegen emits a C `for` loop with the
loop variable scoped to the statement; inclusive uses `<=`, exclusive `<`.
2026-05-03 15:44:58 -05:00
Will Anderson 49a8a1c24b add % modulo operator to El lexer, parser, codegen
Lexer and parser already had Percent token and precedence on the
compiler/string-interp branch. This commit adds the missing is_int_expr
case for Percent so that modulo expressions over Int operands are
correctly typed as Int (enabling arithmetic dispatch rather than
falling through to string concat or untyped paths).

binop_to_c already mapped Percent -> % at HEAD; only is_int_expr
needed the Percent arm.
2026-05-03 15:44:19 -05:00
Will Anderson 252ad04c96 add break and continue statements to El 2026-05-03 15:43:20 -05:00
Will Anderson 5b6915ec9e merge runtime/engram-build — engram wrappers, manifest, seed arity table 2026-05-03 15:37:42 -05:00
Will Anderson 33af4ed09e add runtime/engram.el, manifest.el; register seed builtins in codegen arity table
- runtime/engram.el: thin El wrappers over all __engram_* and __generate
  seed primitives (16 functions), matching the el_seed.c API exactly
- runtime/manifest.el: build manifest documenting module load order and
  the cat+compile+cc command for runtime builds
- el-compiler/src/codegen.el: add 77 __-prefix seed primitive entries to
  builtin_arity, covering str, fs, http, thread, exec, env, time, uuid,
  math, state, html, json, and engram seeds
2026-05-03 15:37:05 -05:00
Will Anderson f9cfe43f05 preserve original el_runtime.c/h in legacy/ for reference 2026-05-03 15:31:35 -05:00
Will Anderson f97354e96b add exec() and exec_bg() builtins to El runtime
- exec(cmd) -> String: runs shell command, captures stdout, 30s timeout
- exec_bg(cmd) -> String: forks command in background, returns PID string
- add both to codegen arity table (builtin_arity)
- rebuild elc with updated arity table (self-hosting, identity-verified)
- update release snapshot at releases/v1.0.0-20260501/
2026-05-03 02:57:53 -05:00
Will Anderson e180baf776 fix looks_like_string for empty strings and UTF-8, add cross-module includes in codegen 2026-05-03 00:27:20 -05:00
Will Anderson 3d71db4958 Fix O(n²) string construction in codegen-js, lexer, parser, elb
Replace accumulate-by-concatenation loops with native_list_append + str_join.
Eliminates quadratic memory growth when processing large source files.
This is the v2 compiler state — what produced /tmp/elc-v2.
2026-05-02 22:35:49 -05:00
Will Anderson a084feb812 Add separate compilation: extern fn, --emit-header, elb build coordinator 2026-05-02 21:10:44 -05:00
Will Anderson 64e870c207 add El SDK CI/CD pipeline and install script
- .gitea/workflows/sdk-release.yaml: build elc from bootstrap, run tests,
  publish latest release, dispatch el-sdk-updated to downstream repos
- install.sh: one-command El SDK install from Gitea release
2026-05-02 17:45:56 -05:00
Will Anderson beddf9acc2 fix: restore self-host fixed point after calendar type additions
elc-combined.el had drifted from el-compiler/src/ across three separate
commits that never synced the bundled flat file:

1. 13948f5 - fold fn main() body into C int main() + _argc/_argv rename
   (codegen.el updated, elc-combined.el not updated)
2. 742bd0b - bare reassignment Assign AST node
   (parser.el + codegen.el updated, elc-combined.el not updated)
3. ed564b6 - Calendar/CalendarTime/Rhythm/LocalDate/LocalTime types
   (codegen.el updated, elc-combined.el not updated)

The drift meant that the elc binary (which embeds the correct logic) could
compile test programs correctly, but a fresh self-host pass using gen2 (built
from the stale elc-combined.el) would produce a gen3 that differed in 39
lines: no fn main body fold and broken bare-assignment codegen.

Fix: regenerate elc-combined.el as a flat concatenation of the current
lexer.el + parser.el + codegen.el + codegen-js.el + compiler.el source
files. Self-host fixed point verified: gen2 == gen3 byte-identical at
6450 lines.

Also rebuild dist/platform/elc and dist/platform/elc.c from the fixed
gen2 pass, and carry the pending http dual-stack change in el_runtime.c.

All tests pass: time (6/6), calendar (10/10), text (8/8), html_sanitizer (29/29).
2026-05-02 14:14:52 -05:00
Will Anderson 3a83b6eb80 add text-processing primitives to el runtime
24 new functions covering counting (str_count, str_count_chars,
str_count_bytes, str_count_lines, str_count_words, str_count_letters,
str_count_digits), finding (str_index_of_all, str_last_index_of,
str_find_chars), transforming (str_repeat, str_reverse,
str_strip_prefix/suffix/chars, str_lstrip, str_rstrip), character
classification (is_letter, is_digit, is_alphanumeric, is_whitespace,
is_punctuation, is_uppercase, is_lowercase), and splitting/joining
(str_split_lines, str_split_chars, str_split_n, str_join).

Phase 1 is byte-level + ASCII character classes. Unicode-grapheme
awareness, normalization, and regex are Phase 2 (filed separately).

Lexer-internal helpers is_digit, is_alpha, is_whitespace renamed to
lex_is_digit, lex_is_alpha, lex_is_whitespace to free the public names
for the runtime exports. The El compiler's lexer.el and the bundled
elc-combined.el both updated.

Codegen registrations: builtin_arity entries for all 24 functions,
is_int_call entries for the Int-returning ones (str_count*,
str_last_index_of, str_find_chars) so the + operator dispatches as
arithmetic when applicable.

Tests: tests/text/ corpus with 8 acceptance cases covering the surface
(count-substring, count-overlap-skip, count-lines-words-letters,
index-of-all, transform-suite, char-classes, split-lines, join). All
pass against a fold-fn-main-aware elc bootstrap (see ELC env var
override in run.sh).

Self-host fixed point: elc-combined.el's emit-main pass does not
currently fold the fn main body into C's main, a pre-existing
condition that surfaces as a 39-line gen2/gen3 diff with empty main
in gen3. The committed dist/platform/elc binary has the fold logic
so all tests pass against it. Filing the elc-combined fold-fn-main
fix separately. This commit does not introduce new self-host drift.
2026-05-02 13:37:30 -05:00
Will Anderson ed564b6dda add Calendar + CalendarTime + Rhythm + LocalDate/Time as first-class
Phase 1.5 of time-system. Calendar is pluggable: EarthCalendar
(IANA zones, DST, Gregorian) is the default; MarsCalendar,
CycleCalendar(period), NoCycleCalendar handle non-Earth cases.

Rhythm abstracts recurrence from clock units - rhythm_cycle_phase(0.5)
means "midpoint of cycle" whether the cycle is 24 hours on Earth or
30 hours on a station or 300 years on a long-cycle world.

Phase 1 (Instant + Duration) unchanged. EarthCalendar(zone_local())
is the user-facing default; nobody who doesn't care about non-Earth
calendars sees the abstraction.

Self-host fixed point holds at 6339 lines.
Snapshot tagged at dist/platform/elc.20260502-1321-self-host.

Phase 2 (scheduling primitives every/after/at) lands next, now with
Calendar-aware grounding instead of Earth-time hardcoded.

Backlog: bl-297f66d8 (supersedes bl-b29b3e60)
2026-05-02 13:21:43 -05:00
Will Anderson af480f6266 add el_html_sanitize allowlist runtime primitive
Replaces the need for product-level denylist sanitizers. Small
state-machine parser; tag-and-attribute allowlist passed as JSON;
URL scheme validation on href/src attrs (http, https, mailto,
fragment, relative); whole-subtree drop for script/style/iframe/
object/embed/form (plus rarer media containers). No comment-
wrapping (was fragile to comment-injection bypass via a literal
--> inside an attacker-supplied attribute value).

Also picks up the codegen and parser changes for first-class
Instant/Duration types (postfix-literal time values, typed binop
dispatch) that were sitting in tree alongside this work.

Test corpus at tests/html_sanitizer/ covers the live attacker
probes (script, iframe, form, javascript:, about:, data:, img
onerror, onclick) plus structural attacks (comment-injection
bypass, tab-in-scheme bypass, encoded payloads, malformed input,
empty input, plain text). 29 cases, all green.

Self-host fixed point holds at 5720 lines via the canonical
el-compiler/src/compiler.el entry. Snapshot tagged at
dist/platform/elc.20260502-1249-self-host.

Backlog: bl-dc55ae07
2026-05-02 12:49:41 -05:00
Will Anderson 2e9d3247a6 runtime: actually rename str_format param to fmt
Previous commit 6d89728 had a misleading message - the rename
itself never landed (Edit-without-Read failure cascaded silently
in the parent shell). 6d89728 incidentally captured 810 lines of
in-flight work from concurrent runtime agents and shipped it under
the wrong message; the in-flight agents will land their final
verified state on top.

This commit is just the actual rename: str_format(template, data)
to str_format(fmt, data). C++ keyword conflict resolved.
2026-05-02 12:47:49 -05:00
Will Anderson 6d897289a3 runtime: rename str_format param 'template' to 'fmt'
template is a reserved keyword in C++; though not in C, it blocks
this header from ever being included from C++ code. Match printf-
family convention with fmt instead.

The deeper question of whether string-template substitution is the
right abstraction for our substrate is filed separately as backlog.
2026-05-02 12:45:48 -05:00
Will Anderson 742bd0b4f9 fix: three foundation/el root-cause bugs (no more bandaids)
1. Parser+codegen: bare reassignment `x = expr` inside an if-body
   was compiling to three orphan expressions with no store. Now
   emits a real assignment.

2. Runtime json_get: dot-path segments that are all digits now
   correctly traverse array indices. `json_get(s, "0.field")` works.

3. Runtime HTTP writer: response bodies starting with
   `{"__status__":<int>,...}` now set the HTTP status header to
   that value and strip the marker from the served body. Existing
   404/401/503 paths in product code now produce real status codes
   instead of HTTP 200 with the status hidden in the body.

Self-host fixed point holds: gen2 == gen3 byte-identical.
Snapshot tagged at dist/platform/elc.20260502-1231-self-host.

Backlog: bl-c121edda
2026-05-02 12:32:23 -05:00
Will Anderson 990ce72539 lexer: strip JS/CSS comments from code-bearing string literals at compile time
scan_string() is the right gate for this: every El source that embeds JS
or CSS does so as a quoted string literal, and the lexer is the single
chokepoint every backend reads. Strip there and the // line comments
and /* */ block comments never reach the parser, codegen, or the served
HTML.

looks_like_code is intentionally narrow:
  - contains "<script" or "<style" (the embedded-asset case), or
  - contains "function" AND ";" (a JS body without an opening tag)
Plain prose with stray // sequences passes through verbatim.

strip_code_comments tracks JS string state (single, double, backtick)
and never strips inside one. Backslash escapes inside JS strings consume
the next char verbatim. URL guard: when the char before / is ':', emit
the / literally and advance one — preserves https:// inside string
literals. Block-comment scan walks until the matching '*/' pair.

elc-cli.el is now a one-line `import "el-compiler/src/compiler.el"`
shim. Top-level `let _argv = args()` was clashing with C int main()'s
`char** _argv` parameter once compiler.el's fn main() body got folded
into C main. compiler.el owns the CLI entry point now.

Self-host fixed point reached: gen2 == gen3 byte-identical.
Tagged dist/platform/elc.20260502-1104-self-host alongside dist/platform/elc.
2026-05-02 11:14:18 -05:00
Will Anderson d527fa6065 elc: regenerate dist artifacts and ship parser.el state alongside self-host land 2026-05-02 01:30:31 -05:00
Will Anderson 13948f57a6 self-host: fold fn main() body into C int main(); rename C params
The El compiler self-host has been broken since `fn main()` landed in
compiler.el. Both bootstrap.py and codegen.el skipped emitting an
`el_val_t main()` (correct - it would collide with C's int main),
but neither folded the body anywhere. The C int main() got just
runtime init + return, so any El program that put its work inside
`fn main()` produced a binary that did nothing.

Fix in two places (bootstrap.py and codegen.el, kept symmetric):

  1. Capture the body of `fn main()` during the FnDef pass.
  2. Emit `int main(int _argc, char** _argv)` so El programs can
     declare their own local `argv` / `argc` (compiler.el itself
     does this) without colliding.
  3. After top-level statements, fold the captured fn main body
     into C main alongside them, then return 0.

Self-host fixed point reached: gen 2 and gen 3 of compiler.el's
output are byte-identical (md5 5b4eca2a...). The new elc compiles
products/web/src/main.el natively now - 24 imports resolved, 1,173
lines of C, every imported function (page_open, nav, pricing,
checkout_page, account_page, founding_badge…) emits its forward
decl + body without a concat preprocessor in sight.

Backup of the prior self-hosted binary is at
dist/platform/elc.preselfhost in case we need to fall back.
2026-05-02 01:30:04 -05:00
Will Anderson 276c0e5997 runtime: engram_scan_nodes_by_type_json() filters at the engine
Added a typed scan function: walks the live nodes once, skips
transparent layers, keeps only entries whose node_type matches the
filter, sorts the survivors by salience, paginates. Header forward
decl in el_runtime.h so callers can find it.

Empty / NULL filter falls through to engram_scan_nodes_json so the
existing GET /api/nodes contract is preserved exactly.

This is what every list-X tool in the MCP wrapper has been wanting:
listProcesses returning only Process nodes, not all of them, without
the wrapper having to fetch + filter client-side.
2026-05-02 01:25:10 -05:00
Will Anderson 62f4d56a62 runtime: HEAD method dispatches as GET, body suppressed in response
Per RFC 9110 §9.3.2, HEAD must mirror GET headers + Content-Length
without sending a body. Existing http_worker / http_worker_v2 dropped
HEAD straight to the El handler, which had no idea what to do and
returned the catch-all 404 envelope. Link checkers and SEO bots saw
the 404 and reported the site as broken.

Fix layer is in the runtime, not the El handler:

  * http_worker / http_worker_v2 detect HEAD before calling the
    handler, dispatch as method="GET" so handler logic is unchanged,
    record head_only in a thread-local, then call http_send_response.
  * http_send_response reads the thread-local and skips the
    final http_send_all of the body. Status line + headers +
    Content-Length still go out in full.

Verified locally on engram /health: HEAD returns
  HTTP/1.1 200 OK
  Content-Type: application/json; charset=utf-8
  Content-Length: 48
  Connection: close
  (no body — curl reports size_download=0)

compiler.el: rename `target` → `tgt` in main(); the lexer reserves
`target` as a keyword, and the let-binding position requires Ident.
The naming convention was already followed elsewhere in the file
(compile_dispatch's parameter is tgt for exactly this reason); main
was an outlier that the existing Rust-genesis-built elc happened to
parse but bootstrap.py refused, blocking self-host.
2026-05-02 01:15:11 -05:00
Will Anderson a2b9984127 elc/bootstrap: resolve imports textually (recursive, dedup, strict)
Both bootstrap.py and compiler.el now inline every imported .el file
into a single source string before lex/parse, depth-first with set
deduplication keyed on absolute path. Two forms supported:

  import "path/to/file.el"            (quoted relative path)
  from <module> import { ... }        (bare module → <module>.el)

Strict regex matching prevents false positives like CSS keyframes
("from { opacity: 0 }") embedded in El string literals - the prior
naive str.startswith pulled '{' out as a module name and tried to
load src/{.el.

This kills the bash concat preprocessor that web/build-local.sh
needed. A web full build is now just:

  python3 bootstrap.py src/main.el > dist/main.c
  cc -O2 ... -o dist/neuron-web dist/main.c dist/web_stubs.c \
      foundation/el/el-compiler/runtime/el_runtime.c \
      -lcurl -lpthread -lssl -lcrypto

Verified end-to-end: bootstrap.py produces 1,151 lines of C from
src/main.el's 24 imports, cc links a 667 KB binary.
2026-05-02 01:10:56 -05:00
Will Anderson 86b3ad070d compiler+runtime: codegen fixes for empty literal, == int idents, m.field; runtime body-loss fix and Linux feature macros
Three codegen bugs surfaced repeatedly across the parallel port-to-El
agents and were patched here:

1. Empty array literal '[]' was emitting el_list_new(0, ) — trailing
   comma in a varargs call, fails the C parse. Special-cased: n==0
   returns 'el_list_empty()' directly.

2. '==' between two identifiers both tracked in __int_names (typed
   Int via 'let x: Int = ...') was miscompiling to str_eq. With the
   tagged-pointer Int-as-int64 representation, str_eq strcmp's what
   are integer values dressed as char* and segfaults on the first
   non-printable byte. Added the int-name lookup, mirroring the
   dispatch already present for '+' between Int idents. NotEq got
   the same treatment.

3. 'm.field' codegen was passing the raw const char* field name to
   el_get_field, which expects el_val_t. C compiler warned about int
   conversion; runtime read garbage at the address. Wrapped in
   EL_STR(...) so the field name lands as a proper el_val_t.

Runtime additions in the same pass:

  - el_runtime.c http_read_request: the loop's boundary check was
    'line_end >= hdr_end' which broke before processing the LAST
    header line — its trailing \r\n IS hdr_end. Real curl clients
    put Content-Length last, so POST bodies were silently arriving
    as length 0. Changed to '> hdr_end' so the last line is processed.
    soma-server agent surfaced this during smoke testing.

  - _GNU_SOURCE feature macro: clock_gettime/CLOCK_REALTIME, strcasecmp,
    and the dlfcn extensions (RTLD_DEFAULT) all gated behind it on
    glibc/Debian. macOS is permissive without; the landing Docker
    build needed these for linux/amd64. Adds <strings.h> for
    strcasecmp.

  - Refactored slot semantics in el_runtime.c (already in tree from
    the morning ARC commit): magic-tagged ElHeader at offset 0,
    ElList/ElMap with separate elems/keys/values payload allocations,
    el_list_append and el_map_set mutate-in-place when refcount<=1
    and copy-on-write when shared.

Self-host fixpoint reached at v3: elc → elc.c → cc → elc binary →
elc.c reproduced byte-for-byte. dist/platform/elc and dist/platform/elc.c
updated. The codegen.el and elc-combined.el changes are mirror-edits;
both flow through the bootstrap chain to keep self-hosting clean.
2026-04-30 18:14:57 -05:00
Will Anderson 23bbc99e43 runtime: ARC scaffolding + indirection so el_list_append amortizes O(1)
The compiler used to OOM at ~8.7 GB on 4325-line inputs because every
el_list_append allocated a fresh ElList header + elements array. That
was the workaround for an aliasing bug in cg_if_stmt — codegen held a
stale pointer through a realloc. Persistent semantics fixed the bug
but turned every accumulator (decl in cg_stmts, AST construction, the
__int_names CSV) into O(N²) memory.

Real fix in two coordinated parts:

1. Runtime — ElList and ElMap now carry a magic-tagged ElHeader at
   offset 0 (uint32 magic, uint32 refcount). The payload arrays live in
   separate heap allocations behind a stable header pointer, so realloc-
   grow on append never invalidates the caller's reference. el_list_append
   and el_map_set mutate in place when refcount <= 1 (the common single-
   owner case, amortized O(1)) and copy-on-write when shared. Adds
   el_list_clone for explicit shallow copies, plus el_retain/el_release
   no-op-on-non-pointers so codegen can emit them on every let-binding
   without tracking types. The magic words (0xE1xxxxxx) live above the
   printable-ASCII range so they can never collide with a string's first
   byte, and looks_like_string in json_stringify already rejects them.

2. Codegen — every place that delegates to a child C scope now clones
   `declared` before passing it down: cg_if_stmt for both then/else
   branches, cg_for_body for the loop body (which also picks up the
   loop variable via append), and cg_stmt's While case. Without the
   clones, mutation-in-place would let a sibling scope's let-bindings
   leak into the parent's declared list and the parent would emit
   `x = ...` against an undeclared name. The clones are cheap shallow
   copies of a list of strings.

Result on the landing-combined.el (4325 lines): 8.7 GB → 3.5 GB peak,
0.26s wall clock, compile completes successfully where it previously
OOM'd. Self-hosting fixpoint reached: dist/platform/elc compiled from
elc-combined.el reproduces dist/platform/elc.c byte-for-byte on a
second pass through itself.

Strings still allocate fresh on every concat; that's the next layer of
optimization (probably an arena tied to function scope) but isn't
blocking. The persistent-list aliasing bug remains structurally fixed —
clones are explicit at the codegen sites where the persistence
guarantee matters; everywhere else the compiler runs at mutation speed.
2026-04-30 15:05:02 -05:00
Will Anderson 5adc05aa48 compiler: capability-kind enforcement (cgi / service / utility)
Capability becomes a compile-time structural property, not a runtime
convention. A program's top-level block determines what runtime
primitives it may call; the codegen rejects forbidden calls with
#error directives so cc fails with a clear message.

Three kinds:
  cgi      — full self-formation. All primitives.
  service  — bounded. Cannot call self-formation primitives:
             llm_call_agentic, llm_register_tool, dharma_emit,
             dharma_field. Single-turn LLM calls allowed.
  utility  — default (no top-level block). No DHARMA, no LLM.
             Pure compute + I/O.

Deep claim: the binary either CAN or CANNOT do a thing. There is no
runtime check, no opt-in, no override. A weather service compiled
with `service { ... }` is structurally incapable of becoming Neuron.
Sponsors of services know exactly what they're vouching for.

Implementation
- Lexer: `service` keyword.
- Parser: parse_service_block parallels parse_cgi_block. Produces
  ServiceBlock AST with name/sponsor/domain.
- Codegen entry: scans top-level for cgi/service blocks, sets
  __program_kind state ("cgi" / "service" / "utility"). Rejects
  programs declaring both kinds.
- cg_expr Call: cap_check_call(fn_name) per emission. Records
  violations in __cap_violations CSV. emit_cap_violations() writes
  one #error per violation at end of generated C.
- Helpers: is_self_formation_call, is_dharma_call, is_llm_call.

Tests verified:
  cgi + llm_call_agentic        → compiles ✓
  service + llm_call_agentic    → cc fails with capability violation
                                  for 'service' on 'llm_call_agentic'
  service + llm_call (1-turn)   → compiles ✓
  utility + dharma_send         → cc fails with capability violation
                                  for 'utility' on 'dharma_send'
  utility + http/json/state     → compiles + runs ✓ ("got: world")
  cgi + dharma_emit (manager)   → compiles ✓ (VBD also enforced)
  cgi + dharma_emit (engine)    → cc fails with VBD violation

Three-stage closure: stage1.c == stage2.c (byte-identical).
Engram rebuilt against new compiler — daemon on :8742 healthy,
{"node_count":0,"edge_count":0}.

A bug found and fixed during testing: cap_record_violation had
`csv = ","` (bare assignment, not valid in El) instead of
`let csv = ","`. Without the let, the leading comma never made
it into the accumulator, off-by-one'ing the kind extraction so
"service" appeared as "ervice" in error messages. Pattern
fixed; this confirms once more that El requires `let X = ...`
for all rebindings (codegen converts to assignment when X is
already declared).
2026-04-30 14:18:17 -05:00
Will Anderson 12d5e7777e runtime + compiler: dharma, match, cgi blocks, VBD, agentic LLM
Two parallel agent sweeps closing the remaining structural gaps.

== Compiler completions ==

- match codegen: lowers Match into GCC/Clang statement-expression
  ({ ... }). Patterns: Wildcard, Binding, LitInt (==), LitStr
  (str_eq), LitBool. Per-match unique label via state counter.
  Verified: classify(0)→"zero", classify(1)→"one", classify(7)→"other".

- cgi block parsing: `cgi "name" { dharma_id, principal, network,
  engram }` → CgiBlock AST node → el_cgi_init() emitted as the first
  call in main() after el_runtime_init_args. Multiple cgi blocks per
  program emit a #error directive. Missing optional fields → EL_NULL.

- VBD compile-time enforcement: parser attaches `decorator: <name>`
  to FnDef. Codegen recursively walks fn bodies (Call/BinOp/Not/Neg/
  Field/Index/Try/Array/Map/If/For/Match plus Let/Return/Expr/While/
  For). If a non-@manager function calls dharma_emit or dharma_field,
  emit `#error "VBD violation: ... fn '<name>'"` before the function
  body. Verified: @engine fn calling dharma_emit → cc fails with the
  message. @manager fn calling dharma_emit → compiles clean.

Three-stage closure: stage1.c == stage2.c == stage3.c (2791 lines
each, byte-identical). dist/platform/elc rebuilt at 165 KB; .prev5
preserved.

== Runtime completions ==

- Real dharma_* primitives, no more stubs. Channel registry,
  request/response over HTTP, network-wide spreading activation,
  fire-and-forget event emission, blocking dharma_field with
  pthread_cond_timedwait (30s default), Hebbian relationship
  weights stored as Engram edges between dharma:self and
  dharma:peer:<id>, sorted-by-weight peer list. URL/ID arrays
  snapshotted before network I/O so mutexes never block on socket.

- New public C contract: el_runtime_dharma_event_arrive(type, payload,
  source) — application HTTP handler calls this when /dharma/event
  arrives, runtime broadcasts on _dharma_event_cv. Keeps the HTTP
  server generic; events flow through the application's router.

- llm_call_agentic real multi-turn loop. Tool registry (mutex-
  protected, dlsym-resolved, mirroring http_set_handler). Loop:
  build request with tools+messages → POST → dispatch on stop_reason.
  end_turn → return text. max_tokens → text + "[truncated]". tool_use
  → walk content[], call registered handler per block, build
  tool_result message, append to conversation, loop. Iteration cap
  10. Tools not registered return {"error":"tool not registered: X"}
  with is_error: true.

- New builtin: llm_register_tool(name, handler_fn_name).

Compile clean: cc -std=c11 -Wall -Wextra -c → zero warnings, zero
errors. Smoke test exercises every new dharma_* primitive +
llm_register_tool round-trip.

Runtime grew 3309→4079 lines (.c, ~155 KB), 312→342 lines (.h).

== Integration ==

Engram rebuilt against the new runtime: 130 KB binary, daemon
swapped on :8742 cleanly, /health and /api/stats both returning
correctly under launchd. No regressions.

== Status of "planned" items in language.md ==

- match codegen → IMPLEMENTED
- cgi block parsing → IMPLEMENTED
- VBD enforcement → IMPLEMENTED
- % operator → IMPLEMENTED (earlier today)
- vessel keyword → lexed (codegen uses package compatible)
- activate construct → still planned (low priority; engram_activate
  builtin covers the use case for now)
- sealed block → still planned
- dharma_emit fanout parallelization → potential future work, current
  serial behavior matches spec
2026-04-30 14:06:19 -05:00
Will Anderson 0fa9e749e1 runtime: engram_*_json accessors, http_set_handler dlsym, codegen int-call
Three changes that turned the runtime into something Engram-the-server
can actually run on top of.

1. engram_*_json accessors. The runtime's engram_get_node/search/scan/
   neighbors/activate return ElList/ElMap; passing those through
   json_stringify hit the type-erasure wall (an ElList* has no header
   that distinguishes it from a string pointer). Added pre-serialized
   sibling builtins:

     engram_get_node_json(id)         -> JSON object
     engram_search_json(query, limit) -> JSON array of node objects
     engram_scan_nodes_json(limit, offset)
     engram_neighbors_json(node_id, max_depth, direction)
     engram_activate_json(query, depth)
     engram_stats_json()

   Each walks the typed C structures and serializes directly, reusing
   the existing engram_emit_node_json / engram_emit_edge_json helpers
   from the snapshot path.

2. http_set_handler now falls back to dlsym(RTLD_DEFAULT, name) when
   the named handler isn't already in the C-level registry. El programs
   that define `fn handle_request(method, path, body) -> String` can
   register themselves just by calling http_set_handler("handle_request").
   No C glue required. Verified live on a real El server.

3. Codegen: extended int-typed dispatch on `+` to handle Calls. New
   helper is_int_call recognizes a known-int-returning builtin set:
   str_len, str_index_of, str_to_int, str_char_code, native_list_len,
   el_list_len, len, json_get_int, json_array_len, engram_node_count,
   engram_edge_count, time_now, time_now_utc, time_diff, time_add,
   time_from_parts, el_abs/max/min, float_to_int. With this,
   `pos + str_len(needle)` compiles to integer arithmetic instead of
   string concat. The earlier limitation noted in the previous commit
   (Ident + Call returning Int) is now closed.

Also: el_to_float / el_from_float moved to el_runtime.h as static
inlines so generated programs can use them. Eliminates the unused
inline definitions that were duplicating in the .c file.

Closure verified: stage1 vs stage2 byte-identical against the new
runtime. dist/platform/elc rebuilt; .prev4 preserved.

Engram server (engram/src/server.el) end-to-end:
  POST /api/nodes ×3 → 3 UUIDs returned
  POST /api/edges ×2 → linkage made
  GET /api/stats → {"node_count":3,"edge_count":2}
  GET /api/search?q=spreading&limit=5 → 1 hit, full node JSON
  POST /api/activate {"query":"Hebbian","depth":3}
    → seed node @ hop 0, strength 0.8
    → 1-hop neighbor @ strength 0.392 (= 0.8 × 0.7 weight × 0.7 decay)
  GET /api/neighbors/<id>?depth=2 → {node, edge, hops} triple
  POST /api/save → {"ok":true,"path":"..."}
  Server stays alive across all routes.

Snapshot save/load on restart still TODO — server starts with 0 nodes
even when a snapshot exists; investigation pending.
2026-04-30 13:44:41 -05:00
Will Anderson 6bdd4a4ba9 runtime: http_set_handler self-registers via dlsym
El programs that define `fn handle_request(method, path, body) -> String`
can now use http_serve directly without C-level glue. http_set_handler
falls back to dlsym(RTLD_DEFAULT, name) when the named handler isn't
already in the registry, picks up the El-compiled symbol, and registers
it transparently.

Closes the gap that made http_serve unusable from pure El. Verified
with a real El server on :17890 — POST /hello with body returned
{"method":"POST","path":"/hello","echo":"test body"} via curl.

dist/platform/elc rebuilt; .prev3 preserved.
2026-04-30 13:30:55 -05:00
Will Anderson 951b8d574b runtime: HTTP, in-process graph store, LLM, fs_list
Batches 2/3/4 of the runtime extension. The runtime grew from 1620
to 3112 lines (.c) and 247 to 286 lines (.h) — adding 27 new or
real-implementation builtins and replacing every batch-1 stub.

Batch 2 — HTTP / fs (8 builtins)
- http_get, http_post: replaced stubs with real libcurl client.
  Network errors return JSON {"error":"..."} so callers can detect.
- http_post_json: sets Content-Type: application/json.
- http_get_with_headers, http_post_with_headers: ElMap → headers.
- http_post_form_auth: form-urlencoded + Authorization header
  (Stripe-style API calls).
- http_serve: replaced stub with real POSIX-socket server, threaded,
  capped at 64 concurrent connections. Auto-detects content type
  (HTML / JSON / plain). Handler dispatch via named registry.
- fs_list: directory listing via opendir/readdir.

Batch 3 — In-process graph store (14 builtins)
- engram_node, engram_node_full: create node, returns UUID.
- engram_get_node, engram_forget, engram_node_count.
- engram_strengthen: Hebbian potentiation (+0.05, clamp 1.0,
  bumps last_activated).
- engram_search, engram_scan_nodes: text search, paginated scan.
- engram_connect, engram_edge_between, engram_neighbors,
  engram_neighbors_filtered, engram_edge_count.
- engram_activate: real spreading-activation algorithm.
  BFS to depth, max-activation merge across paths, decay 0.7/hop,
  multiplied by node confidence, filtered by epistemic_confidence
  ≥ 0.2 (refresh threshold), sorted desc.
- engram_save, engram_load: JSON snapshot persistence.

Batch 4 — LLM (5 builtins)
- llm_call, llm_call_system: Anthropic /v1/messages via libcurl.
  ANTHROPIC_API_KEY from env. Default model claude-sonnet-4-5.
- llm_vision: adds image content block. URL / base64 / file path
  detected by prefix.
- llm_models: returns the available model list.
- llm_call_agentic: stubbed with TODO (single-turn fallback to
  llm_call_system); full tool_use loop is the next iteration.

Codegen fix: emit Float literals as `el_from_float(<v>)`. Without
the wrapper, C implicit conversion truncates 0.8 to 0 when passed to
a builtin expecting el_val_t. Float helpers moved to el_runtime.h
so generated programs can call them.

Compile-time
- cc -std=c11 -Wall -Wextra -c el_runtime.c → no errors, no warnings.
- Link requires -lcurl -lpthread (documented in header comment).

Verified end-to-end
- engram_node × 2, engram_connect, engram_activate("Hebbian", 2)
  returns 2 activated nodes with correct epistemic confidence.
- http_get("https://httpbin.org/get") returns 259-byte JSON live.
- Self-host closure: stage1 vs stage2 byte-identical against the
  new runtime.
- engram_save → engram_load round-trip preserves graph.

dist/platform/elc rebuilt against the new runtime (147 KB, up from
94 KB due to libcurl link). .prev2 preserves the prior binary.
2026-04-30 13:29:31 -05:00
Will Anderson 2eddaf1fe6 codegen: type-driven dispatch for + between Int idents
Closes the known limitation from the self-host commit: `fn add(a:Int,
b:Int) { a + b }` now compiles to integer addition, not string concat.
Previously the codegen heuristic guessed string concat whenever both
operands were Idents with no literal anchor.

Mechanism
- parser captures the leading type identifier from `let x: T = ...`
  bindings (new "type" field on Let) and from function parameter
  annotations (new "type" field on each param).
- codegen maintains a per-function int-name set in process state via
  state_set("__int_names", csv). cg_fn seeds it from typed parameters;
  cg_stmt extends it from typed `let` bindings and from `let x = <Int
  literal>` (literal inference).
- BinOp Plus: when both sides are Idents and both names are in the
  int-name set, emit arithmetic; otherwise the existing literal-anchor
  heuristic applies, with string concat as the fallback.

This is the first compiler change made entirely through the self-
hosting workflow — no Python bootstrap. Edit el source, run existing
elc on elc-combined.el, cc the output, test. Closure holds at the
new binary.

Tests
- add(40, 2) → 42
- count_to(10) → 45 (let i: Int / let total: Int rebinding)
- Regression suite (tiny/implret/whiletest/lextest) unchanged.

dist/platform/elc updated; .prev preserved.
2026-04-30 13:13:38 -05:00
Will Anderson 5c05ce9b99 self-host the el compiler
Today's milestone: dist/platform/elc compiles itself byte-for-byte to
itself (stage1 == stage2 == stage3 verified). The compiler is now a
real binary in the world.

What landed
- Spec rewrite (language.md) to truth — every feature marked
  implemented / planned / not-in-this-language with no fiction.
- C runtime extension: 51 new builtins. JSON parser + accessors,
  time, UUID, env, in-process state K/V, float formatting + math,
  string ops (index_of, split, char_at, char_code, pad_left/right,
  format), list ops (push, push_front, join, range), bool_to_str.
  Runtime grew 631 → 1611 lines, header 171 → 247.
- Codegen fix: transform_implicit_return lifts a function's bare
  trailing expression into an explicit return. Without it, lex(),
  parse(), and every other implicit-return function returned 0/nil
  and the whole pipeline produced empty C output.
- Codegen fix: index expressions dispatch on AST kind. obj["literal"]
  → el_get_field (map), arr[i] → el_list_get (list). Same Index node
  in the parser, two different runtime calls.
- Codegen fix: skip emitting fn main() (collides with C main()) and
  honor parsed return-type annotations so Void functions don't get
  return-wrapped (return println(x) is a C type error).
- Parser: capture return-type identifier from -> Ret annotations.
- Lexer: + vessel keyword, + % operator, + \r escape.
- Runtime fix: el_list_append now allocates a fresh list rather than
  realloc'ing the input. Realloc moved blocks made caller pointers
  dangle, which was inserting garbage values into declared lists and
  causing strcmp segfaults. Persistent allocation eliminates the
  whole class of use-after-free at modest memory cost.

Bootstrap path
- One-shot Python helper translated elc-combined.el to C and
  produced stage1. Helper is disposable; not committed.
- stage1 compiles elc-combined.el → stage2.c which cc compiles to
  stage2; stage2 compiles elc-combined.el → stage3.c. stage2.c and
  stage3.c are byte-identical. Closure proven.
- New elc installed at dist/platform/elc; old broken binary
  preserved as dist/platform/elc.legacy.
- dist/platform/elc.c is the canonical generated source.
- elvm and the bytecode pipeline are no longer on the critical path.

Known gap
- The `+` operator's heuristic dispatch still picks string concat
  when both operands are Idents with no literal anchor. Self-hosting
  works because the compiler source is careful, but `fn add(a:Int,
  b:Int) { a + b }` will not do arithmetic until codegen reads the
  parsed type annotations to dispatch. Fix is wiring; not done here.

Tested
- tiny / lextest / whiletest / map+field / array build all run.
- cgi-studio (1037 lines real El) compiles to C cleanly. Link fails
  only because runtime is missing fs_list, json_encode, llm_*; those
  are scheduled batches.
- Three-stage closure (stage1 vs stage2 vs stage3) byte-identical.
2026-04-30 13:10:29 -05:00
Will Anderson ede087eb04 codegen: emit C instead of bytecode — El is now natively compiled
Rewrites codegen.el to produce C source instead of JSON bytecode,
eliminating the ELVM interpreter as a runtime dependency.

- All El values use el_val_t (int64_t) as the universal type; integers
  are stored directly, strings/pointers via uintptr_t cast
- String literals wrapped with EL_STR(), arithmetic works natively
- fn declarations become C functions returning el_val_t
- let bindings become el_val_t local variables
- if/else, while, for all map to native C control flow
- String + String uses el_str_concat(); numeric + uses C +
- strip_outer_parens() prevents double-paren warnings in if/while
- compiler.el updated to describe C output and correct CLI usage

Adds el-compiler/runtime/ with:
- el_runtime.h: declares all builtins using el_val_t
- el_runtime.c: implements I/O, strings, math, list, map, fs, JSON;
  HTTP builtins are stubs (return empty string) pending libcurl

Compile El programs with:
  cc -I<runtime-dir> -o hello hello.c el_runtime.c
2026-04-29 22:33:27 -05:00
Will Anderson 4f3543b068 Archive Rust bootstrap — El compiler is now self-hosting 2026-04-29 22:21:31 -05:00