feat/native-testing
16 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
1fd7cd5545 |
Add HTML template codegen and runtime for C backend
C codegen (codegen.el):
- cg_html_template: emits a GCC/Clang statement-expression that
builds the HTML string via el_str_concat chains
- cg_html_element_str / cg_html_parts / cg_html_attrs_str: recursive
element and attribute emitters
- cg_html_each: {#each} compiles to a C for-loop with el_list_get
- __html_counter state tracks unique accumulator variable names
- Handles both 'static' (raw string) and 'dynamic' (expr node) attrs
matching the parser's attribute kind convention
Runtime (el_runtime.c / el_runtime.h):
- html_escape(s): escapes & < > " ' for safe interpolation
- html_raw(s): identity function for raw() bypass
- Both use the existing html_buf_t infrastructure from el_html_sanitize
|
||
|
|
7b60d94b8a |
add --minify and --obfuscate flags to elc JS pipeline
Adds two post-processing flags that produce production-ready browser JS in a single elc invocation, replacing extract-js.py in the web product pipeline: elc --target=js --bundle --minify source.el > output.min.js elc --target=js --bundle --obfuscate source.el > output.obf.js --minify shells out to terser (passes=2, no drop_console, drop_debugger). --obfuscate shells out to javascript-obfuscator with the same options as the old extract-js.py script. --obfuscate implies --minify. Tool discovery: checks ./node_modules/.bin/, ../node_modules/.bin/ (monorepo), then falls back to npx. Both flags require --target=js; passing either without it exits 1 with a clear error. Both tools receive a reserved-names list of globals referenced from HTML onclick= attributes (neuronDemoToggle, signInWith, NEURON_CFG, etc.) so they are not mangled. Implementation adds stdout_to_file(path)/stdout_restore() builtins to the C runtime so codegen's println-streamed output can be captured to a temp file before being piped through the external tools. Temp files use /tmp/elc-<pid>-<timestamp>.js naming and are cleaned up on success and failure. Rebuilds dist/platform/elc and dist/platform/elc.c. Self-hosting verified. |
||
|
|
f97354e96b |
add exec() and exec_bg() builtins to El runtime
- exec(cmd) -> String: runs shell command, captures stdout, 30s timeout - exec_bg(cmd) -> String: forks command in background, returns PID string - add both to codegen arity table (builtin_arity) - rebuild elc with updated arity table (self-hosting, identity-verified) - update release snapshot at releases/v1.0.0-20260501/ |
||
|
|
a084feb812 | Add separate compilation: extern fn, --emit-header, elb build coordinator | ||
|
|
3a83b6eb80 |
add text-processing primitives to el runtime
24 new functions covering counting (str_count, str_count_chars, str_count_bytes, str_count_lines, str_count_words, str_count_letters, str_count_digits), finding (str_index_of_all, str_last_index_of, str_find_chars), transforming (str_repeat, str_reverse, str_strip_prefix/suffix/chars, str_lstrip, str_rstrip), character classification (is_letter, is_digit, is_alphanumeric, is_whitespace, is_punctuation, is_uppercase, is_lowercase), and splitting/joining (str_split_lines, str_split_chars, str_split_n, str_join). Phase 1 is byte-level + ASCII character classes. Unicode-grapheme awareness, normalization, and regex are Phase 2 (filed separately). Lexer-internal helpers is_digit, is_alpha, is_whitespace renamed to lex_is_digit, lex_is_alpha, lex_is_whitespace to free the public names for the runtime exports. The El compiler's lexer.el and the bundled elc-combined.el both updated. Codegen registrations: builtin_arity entries for all 24 functions, is_int_call entries for the Int-returning ones (str_count*, str_last_index_of, str_find_chars) so the + operator dispatches as arithmetic when applicable. Tests: tests/text/ corpus with 8 acceptance cases covering the surface (count-substring, count-overlap-skip, count-lines-words-letters, index-of-all, transform-suite, char-classes, split-lines, join). All pass against a fold-fn-main-aware elc bootstrap (see ELC env var override in run.sh). Self-host fixed point: elc-combined.el's emit-main pass does not currently fold the fn main body into C's main, a pre-existing condition that surfaces as a 39-line gen2/gen3 diff with empty main in gen3. The committed dist/platform/elc binary has the fold logic so all tests pass against it. Filing the elc-combined fold-fn-main fix separately. This commit does not introduce new self-host drift. |
||
|
|
ed564b6dda |
add Calendar + CalendarTime + Rhythm + LocalDate/Time as first-class
Phase 1.5 of time-system. Calendar is pluggable: EarthCalendar (IANA zones, DST, Gregorian) is the default; MarsCalendar, CycleCalendar(period), NoCycleCalendar handle non-Earth cases. Rhythm abstracts recurrence from clock units - rhythm_cycle_phase(0.5) means "midpoint of cycle" whether the cycle is 24 hours on Earth or 30 hours on a station or 300 years on a long-cycle world. Phase 1 (Instant + Duration) unchanged. EarthCalendar(zone_local()) is the user-facing default; nobody who doesn't care about non-Earth calendars sees the abstraction. Self-host fixed point holds at 6339 lines. Snapshot tagged at dist/platform/elc.20260502-1321-self-host. Phase 2 (scheduling primitives every/after/at) lands next, now with Calendar-aware grounding instead of Earth-time hardcoded. Backlog: bl-297f66d8 (supersedes bl-b29b3e60) |
||
|
|
2e9d3247a6 |
runtime: actually rename str_format param to fmt
Previous commit |
||
|
|
6d897289a3 |
runtime: rename str_format param 'template' to 'fmt'
template is a reserved keyword in C++; though not in C, it blocks this header from ever being included from C++ code. Match printf- family convention with fmt instead. The deeper question of whether string-template substitution is the right abstraction for our substrate is filed separately as backlog. |
||
|
|
276c0e5997 |
runtime: engram_scan_nodes_by_type_json() filters at the engine
Added a typed scan function: walks the live nodes once, skips transparent layers, keeps only entries whose node_type matches the filter, sorts the survivors by salience, paginates. Header forward decl in el_runtime.h so callers can find it. Empty / NULL filter falls through to engram_scan_nodes_json so the existing GET /api/nodes contract is preserved exactly. This is what every list-X tool in the MCP wrapper has been wanting: listProcesses returning only Process nodes, not all of them, without the wrapper having to fetch + filter client-side. |
||
|
|
86b3ad070d |
compiler+runtime: codegen fixes for empty literal, == int idents, m.field; runtime body-loss fix and Linux feature macros
Three codegen bugs surfaced repeatedly across the parallel port-to-El
agents and were patched here:
1. Empty array literal '[]' was emitting el_list_new(0, ) — trailing
comma in a varargs call, fails the C parse. Special-cased: n==0
returns 'el_list_empty()' directly.
2. '==' between two identifiers both tracked in __int_names (typed
Int via 'let x: Int = ...') was miscompiling to str_eq. With the
tagged-pointer Int-as-int64 representation, str_eq strcmp's what
are integer values dressed as char* and segfaults on the first
non-printable byte. Added the int-name lookup, mirroring the
dispatch already present for '+' between Int idents. NotEq got
the same treatment.
3. 'm.field' codegen was passing the raw const char* field name to
el_get_field, which expects el_val_t. C compiler warned about int
conversion; runtime read garbage at the address. Wrapped in
EL_STR(...) so the field name lands as a proper el_val_t.
Runtime additions in the same pass:
- el_runtime.c http_read_request: the loop's boundary check was
'line_end >= hdr_end' which broke before processing the LAST
header line — its trailing \r\n IS hdr_end. Real curl clients
put Content-Length last, so POST bodies were silently arriving
as length 0. Changed to '> hdr_end' so the last line is processed.
soma-server agent surfaced this during smoke testing.
- _GNU_SOURCE feature macro: clock_gettime/CLOCK_REALTIME, strcasecmp,
and the dlfcn extensions (RTLD_DEFAULT) all gated behind it on
glibc/Debian. macOS is permissive without; the landing Docker
build needed these for linux/amd64. Adds <strings.h> for
strcasecmp.
- Refactored slot semantics in el_runtime.c (already in tree from
the morning ARC commit): magic-tagged ElHeader at offset 0,
ElList/ElMap with separate elems/keys/values payload allocations,
el_list_append and el_map_set mutate-in-place when refcount<=1
and copy-on-write when shared.
Self-host fixpoint reached at v3: elc → elc.c → cc → elc binary →
elc.c reproduced byte-for-byte. dist/platform/elc and dist/platform/elc.c
updated. The codegen.el and elc-combined.el changes are mirror-edits;
both flow through the bootstrap chain to keep self-hosting clean.
|
||
|
|
23bbc99e43 |
runtime: ARC scaffolding + indirection so el_list_append amortizes O(1)
The compiler used to OOM at ~8.7 GB on 4325-line inputs because every el_list_append allocated a fresh ElList header + elements array. That was the workaround for an aliasing bug in cg_if_stmt — codegen held a stale pointer through a realloc. Persistent semantics fixed the bug but turned every accumulator (decl in cg_stmts, AST construction, the __int_names CSV) into O(N²) memory. Real fix in two coordinated parts: 1. Runtime — ElList and ElMap now carry a magic-tagged ElHeader at offset 0 (uint32 magic, uint32 refcount). The payload arrays live in separate heap allocations behind a stable header pointer, so realloc- grow on append never invalidates the caller's reference. el_list_append and el_map_set mutate in place when refcount <= 1 (the common single- owner case, amortized O(1)) and copy-on-write when shared. Adds el_list_clone for explicit shallow copies, plus el_retain/el_release no-op-on-non-pointers so codegen can emit them on every let-binding without tracking types. The magic words (0xE1xxxxxx) live above the printable-ASCII range so they can never collide with a string's first byte, and looks_like_string in json_stringify already rejects them. 2. Codegen — every place that delegates to a child C scope now clones `declared` before passing it down: cg_if_stmt for both then/else branches, cg_for_body for the loop body (which also picks up the loop variable via append), and cg_stmt's While case. Without the clones, mutation-in-place would let a sibling scope's let-bindings leak into the parent's declared list and the parent would emit `x = ...` against an undeclared name. The clones are cheap shallow copies of a list of strings. Result on the landing-combined.el (4325 lines): 8.7 GB → 3.5 GB peak, 0.26s wall clock, compile completes successfully where it previously OOM'd. Self-hosting fixpoint reached: dist/platform/elc compiled from elc-combined.el reproduces dist/platform/elc.c byte-for-byte on a second pass through itself. Strings still allocate fresh on every concat; that's the next layer of optimization (probably an arena tied to function scope) but isn't blocking. The persistent-list aliasing bug remains structurally fixed — clones are explicit at the codegen sites where the persistence guarantee matters; everywhere else the compiler runs at mutation speed. |
||
|
|
12d5e7777e |
runtime + compiler: dharma, match, cgi blocks, VBD, agentic LLM
Two parallel agent sweeps closing the remaining structural gaps.
== Compiler completions ==
- match codegen: lowers Match into GCC/Clang statement-expression
({ ... }). Patterns: Wildcard, Binding, LitInt (==), LitStr
(str_eq), LitBool. Per-match unique label via state counter.
Verified: classify(0)→"zero", classify(1)→"one", classify(7)→"other".
- cgi block parsing: `cgi "name" { dharma_id, principal, network,
engram }` → CgiBlock AST node → el_cgi_init() emitted as the first
call in main() after el_runtime_init_args. Multiple cgi blocks per
program emit a #error directive. Missing optional fields → EL_NULL.
- VBD compile-time enforcement: parser attaches `decorator: <name>`
to FnDef. Codegen recursively walks fn bodies (Call/BinOp/Not/Neg/
Field/Index/Try/Array/Map/If/For/Match plus Let/Return/Expr/While/
For). If a non-@manager function calls dharma_emit or dharma_field,
emit `#error "VBD violation: ... fn '<name>'"` before the function
body. Verified: @engine fn calling dharma_emit → cc fails with the
message. @manager fn calling dharma_emit → compiles clean.
Three-stage closure: stage1.c == stage2.c == stage3.c (2791 lines
each, byte-identical). dist/platform/elc rebuilt at 165 KB; .prev5
preserved.
== Runtime completions ==
- Real dharma_* primitives, no more stubs. Channel registry,
request/response over HTTP, network-wide spreading activation,
fire-and-forget event emission, blocking dharma_field with
pthread_cond_timedwait (30s default), Hebbian relationship
weights stored as Engram edges between dharma:self and
dharma:peer:<id>, sorted-by-weight peer list. URL/ID arrays
snapshotted before network I/O so mutexes never block on socket.
- New public C contract: el_runtime_dharma_event_arrive(type, payload,
source) — application HTTP handler calls this when /dharma/event
arrives, runtime broadcasts on _dharma_event_cv. Keeps the HTTP
server generic; events flow through the application's router.
- llm_call_agentic real multi-turn loop. Tool registry (mutex-
protected, dlsym-resolved, mirroring http_set_handler). Loop:
build request with tools+messages → POST → dispatch on stop_reason.
end_turn → return text. max_tokens → text + "[truncated]". tool_use
→ walk content[], call registered handler per block, build
tool_result message, append to conversation, loop. Iteration cap
10. Tools not registered return {"error":"tool not registered: X"}
with is_error: true.
- New builtin: llm_register_tool(name, handler_fn_name).
Compile clean: cc -std=c11 -Wall -Wextra -c → zero warnings, zero
errors. Smoke test exercises every new dharma_* primitive +
llm_register_tool round-trip.
Runtime grew 3309→4079 lines (.c, ~155 KB), 312→342 lines (.h).
== Integration ==
Engram rebuilt against the new runtime: 130 KB binary, daemon
swapped on :8742 cleanly, /health and /api/stats both returning
correctly under launchd. No regressions.
== Status of "planned" items in language.md ==
- match codegen → IMPLEMENTED
- cgi block parsing → IMPLEMENTED
- VBD enforcement → IMPLEMENTED
- % operator → IMPLEMENTED (earlier today)
- vessel keyword → lexed (codegen uses package compatible)
- activate construct → still planned (low priority; engram_activate
builtin covers the use case for now)
- sealed block → still planned
- dharma_emit fanout parallelization → potential future work, current
serial behavior matches spec
|
||
|
|
0fa9e749e1 |
runtime: engram_*_json accessors, http_set_handler dlsym, codegen int-call
Three changes that turned the runtime into something Engram-the-server
can actually run on top of.
1. engram_*_json accessors. The runtime's engram_get_node/search/scan/
neighbors/activate return ElList/ElMap; passing those through
json_stringify hit the type-erasure wall (an ElList* has no header
that distinguishes it from a string pointer). Added pre-serialized
sibling builtins:
engram_get_node_json(id) -> JSON object
engram_search_json(query, limit) -> JSON array of node objects
engram_scan_nodes_json(limit, offset)
engram_neighbors_json(node_id, max_depth, direction)
engram_activate_json(query, depth)
engram_stats_json()
Each walks the typed C structures and serializes directly, reusing
the existing engram_emit_node_json / engram_emit_edge_json helpers
from the snapshot path.
2. http_set_handler now falls back to dlsym(RTLD_DEFAULT, name) when
the named handler isn't already in the C-level registry. El programs
that define `fn handle_request(method, path, body) -> String` can
register themselves just by calling http_set_handler("handle_request").
No C glue required. Verified live on a real El server.
3. Codegen: extended int-typed dispatch on `+` to handle Calls. New
helper is_int_call recognizes a known-int-returning builtin set:
str_len, str_index_of, str_to_int, str_char_code, native_list_len,
el_list_len, len, json_get_int, json_array_len, engram_node_count,
engram_edge_count, time_now, time_now_utc, time_diff, time_add,
time_from_parts, el_abs/max/min, float_to_int. With this,
`pos + str_len(needle)` compiles to integer arithmetic instead of
string concat. The earlier limitation noted in the previous commit
(Ident + Call returning Int) is now closed.
Also: el_to_float / el_from_float moved to el_runtime.h as static
inlines so generated programs can use them. Eliminates the unused
inline definitions that were duplicating in the .c file.
Closure verified: stage1 vs stage2 byte-identical against the new
runtime. dist/platform/elc rebuilt; .prev4 preserved.
Engram server (engram/src/server.el) end-to-end:
POST /api/nodes ×3 → 3 UUIDs returned
POST /api/edges ×2 → linkage made
GET /api/stats → {"node_count":3,"edge_count":2}
GET /api/search?q=spreading&limit=5 → 1 hit, full node JSON
POST /api/activate {"query":"Hebbian","depth":3}
→ seed node @ hop 0, strength 0.8
→ 1-hop neighbor @ strength 0.392 (= 0.8 × 0.7 weight × 0.7 decay)
GET /api/neighbors/<id>?depth=2 → {node, edge, hops} triple
POST /api/save → {"ok":true,"path":"..."}
Server stays alive across all routes.
Snapshot save/load on restart still TODO — server starts with 0 nodes
even when a snapshot exists; investigation pending.
|
||
|
|
951b8d574b |
runtime: HTTP, in-process graph store, LLM, fs_list
Batches 2/3/4 of the runtime extension. The runtime grew from 1620
to 3112 lines (.c) and 247 to 286 lines (.h) — adding 27 new or
real-implementation builtins and replacing every batch-1 stub.
Batch 2 — HTTP / fs (8 builtins)
- http_get, http_post: replaced stubs with real libcurl client.
Network errors return JSON {"error":"..."} so callers can detect.
- http_post_json: sets Content-Type: application/json.
- http_get_with_headers, http_post_with_headers: ElMap → headers.
- http_post_form_auth: form-urlencoded + Authorization header
(Stripe-style API calls).
- http_serve: replaced stub with real POSIX-socket server, threaded,
capped at 64 concurrent connections. Auto-detects content type
(HTML / JSON / plain). Handler dispatch via named registry.
- fs_list: directory listing via opendir/readdir.
Batch 3 — In-process graph store (14 builtins)
- engram_node, engram_node_full: create node, returns UUID.
- engram_get_node, engram_forget, engram_node_count.
- engram_strengthen: Hebbian potentiation (+0.05, clamp 1.0,
bumps last_activated).
- engram_search, engram_scan_nodes: text search, paginated scan.
- engram_connect, engram_edge_between, engram_neighbors,
engram_neighbors_filtered, engram_edge_count.
- engram_activate: real spreading-activation algorithm.
BFS to depth, max-activation merge across paths, decay 0.7/hop,
multiplied by node confidence, filtered by epistemic_confidence
≥ 0.2 (refresh threshold), sorted desc.
- engram_save, engram_load: JSON snapshot persistence.
Batch 4 — LLM (5 builtins)
- llm_call, llm_call_system: Anthropic /v1/messages via libcurl.
ANTHROPIC_API_KEY from env. Default model claude-sonnet-4-5.
- llm_vision: adds image content block. URL / base64 / file path
detected by prefix.
- llm_models: returns the available model list.
- llm_call_agentic: stubbed with TODO (single-turn fallback to
llm_call_system); full tool_use loop is the next iteration.
Codegen fix: emit Float literals as `el_from_float(<v>)`. Without
the wrapper, C implicit conversion truncates 0.8 to 0 when passed to
a builtin expecting el_val_t. Float helpers moved to el_runtime.h
so generated programs can call them.
Compile-time
- cc -std=c11 -Wall -Wextra -c el_runtime.c → no errors, no warnings.
- Link requires -lcurl -lpthread (documented in header comment).
Verified end-to-end
- engram_node × 2, engram_connect, engram_activate("Hebbian", 2)
returns 2 activated nodes with correct epistemic confidence.
- http_get("https://httpbin.org/get") returns 259-byte JSON live.
- Self-host closure: stage1 vs stage2 byte-identical against the
new runtime.
- engram_save → engram_load round-trip preserves graph.
dist/platform/elc rebuilt against the new runtime (147 KB, up from
94 KB due to libcurl link). .prev2 preserves the prior binary.
|
||
|
|
5c05ce9b99 |
self-host the el compiler
Today's milestone: dist/platform/elc compiles itself byte-for-byte to
itself (stage1 == stage2 == stage3 verified). The compiler is now a
real binary in the world.
What landed
- Spec rewrite (language.md) to truth — every feature marked
implemented / planned / not-in-this-language with no fiction.
- C runtime extension: 51 new builtins. JSON parser + accessors,
time, UUID, env, in-process state K/V, float formatting + math,
string ops (index_of, split, char_at, char_code, pad_left/right,
format), list ops (push, push_front, join, range), bool_to_str.
Runtime grew 631 → 1611 lines, header 171 → 247.
- Codegen fix: transform_implicit_return lifts a function's bare
trailing expression into an explicit return. Without it, lex(),
parse(), and every other implicit-return function returned 0/nil
and the whole pipeline produced empty C output.
- Codegen fix: index expressions dispatch on AST kind. obj["literal"]
→ el_get_field (map), arr[i] → el_list_get (list). Same Index node
in the parser, two different runtime calls.
- Codegen fix: skip emitting fn main() (collides with C main()) and
honor parsed return-type annotations so Void functions don't get
return-wrapped (return println(x) is a C type error).
- Parser: capture return-type identifier from -> Ret annotations.
- Lexer: + vessel keyword, + % operator, + \r escape.
- Runtime fix: el_list_append now allocates a fresh list rather than
realloc'ing the input. Realloc moved blocks made caller pointers
dangle, which was inserting garbage values into declared lists and
causing strcmp segfaults. Persistent allocation eliminates the
whole class of use-after-free at modest memory cost.
Bootstrap path
- One-shot Python helper translated elc-combined.el to C and
produced stage1. Helper is disposable; not committed.
- stage1 compiles elc-combined.el → stage2.c which cc compiles to
stage2; stage2 compiles elc-combined.el → stage3.c. stage2.c and
stage3.c are byte-identical. Closure proven.
- New elc installed at dist/platform/elc; old broken binary
preserved as dist/platform/elc.legacy.
- dist/platform/elc.c is the canonical generated source.
- elvm and the bytecode pipeline are no longer on the critical path.
Known gap
- The `+` operator's heuristic dispatch still picks string concat
when both operands are Idents with no literal anchor. Self-hosting
works because the compiler source is careful, but `fn add(a:Int,
b:Int) { a + b }` will not do arithmetic until codegen reads the
parsed type annotations to dispatch. Fix is wiring; not done here.
Tested
- tiny / lextest / whiletest / map+field / array build all run.
- cgi-studio (1037 lines real El) compiles to C cleanly. Link fails
only because runtime is missing fs_list, json_encode, llm_*; those
are scheduled batches.
- Three-stage closure (stage1 vs stage2 vs stage3) byte-identical.
|
||
|
|
ede087eb04 |
codegen: emit C instead of bytecode — El is now natively compiled
Rewrites codegen.el to produce C source instead of JSON bytecode, eliminating the ELVM interpreter as a runtime dependency. - All El values use el_val_t (int64_t) as the universal type; integers are stored directly, strings/pointers via uintptr_t cast - String literals wrapped with EL_STR(), arithmetic works natively - fn declarations become C functions returning el_val_t - let bindings become el_val_t local variables - if/else, while, for all map to native C control flow - String + String uses el_str_concat(); numeric + uses C + - strip_outer_parens() prevents double-paren warnings in if/while - compiler.el updated to describe C output and correct CLI usage Adds el-compiler/runtime/ with: - el_runtime.h: declares all builtins using el_val_t - el_runtime.c: implements I/O, strings, math, list, map, fs, JSON; HTTP builtins are stubs (return empty string) pending libcurl Compile El programs with: cc -I<runtime-dir> -o hello hello.c el_runtime.c |