Files
el/lang/spec/codegen-js.md
T
2026-05-05 01:38:51 -05:00

467 lines
24 KiB
Markdown

# El JavaScript Backend (codegen-js)
**Status:** Phase 5 complete. ~90% language coverage. Full browser JavaScript can be expressed structurally in El without any `native_js` escape hatches. All additions since Phase 4: anonymous function literals (lambda syntax), try/catch statement, extern fn declarations, direct JS method call syntax on Any-typed values, Promise helpers, Object/Array utilities, and URL import declarations. Proof: `examples/browser-auth.el` is a complete Supabase auth flow with zero `native_js` or `native_js_call` calls.
**Authoritative files**
| File | Role |
|---|---|
| `el-compiler/src/codegen-js.el` | El → JS code generator (mirrors `codegen.el`) |
| `el-compiler/runtime/el_runtime.js` | Browser/Node runtime that compiled programs link against |
| `el-compiler/src/compiler.el` | Adds `compile_js()` and `--target=js` CLI dispatch |
| `spec/codegen-js.md` | This document |
---
## 1. Why a JS backend exists
El compiles to C today. C is the right substrate for the agent runtime, the DHARMA daemon, and Engram. But three first-class consumers of El need to **run in a browser**, where C is not an option:
1. **`el-ui/runtime/`** — the activation-based frontend framework written in JS. The long-term plan is to author components and the runtime itself in El and compile them down to JS.
2. **`cgi-studio`** — the web app for cultivating CGIs. Today it is hand-written JS. Once the JS backend is mature, the studio's UI logic can be authored in El and share types/identifier names with the CGI it cultivates.
3. **Marketplace plugin UIs** — third parties writing browser-side El that runs untrusted in a sandbox. They need a JS target.
A secondary motivation: **El-on-Node**. CLI tooling, build scripts, and tests benefit from a tight `el → js → node` cycle without a `cc` step.
---
## 2. Type representation strategy
The C backend pretends every value is `int64_t`. That is a deliberate runtime trick to avoid dynamic dispatch in generated C. JavaScript already has tagged dynamic values, so the JS backend is **simpler**: every El value is a native JS value, and the tag of `el_val_t` collapses into the JS type system.
| El type | C representation | JS representation |
|---|---|---|
| `Int` | `int64_t` (direct) | `number` (with `Number.isSafeInteger` caveat — see §6) |
| `Float` | `int64_t` bit-cast of `double` via `el_from_float` | `number` (no bit-cast — JS number IS a double) |
| `Bool` | `int64_t`, 0 = false, nonzero = true | `boolean` |
| `String` | `(int64_t)(uintptr_t)cstring` | `string` |
| `Void` | C `void` | `undefined` |
| `[T]` (List) | `el_val_t` pointer to refcounted struct | `Array<any>` |
| `Map<K,V>` | `el_val_t` pointer to refcounted struct | plain object `{[key]: any}` |
| `EL_NULL` (`0`) | `(el_val_t)0` | `null` |
| Any | `el_val_t` | `any` (no compile-time check) |
**Key consequences:**
- `+` on two strings is JS `+` (string concat) — no `el_str_concat()` runtime call needed for the common case. The runtime DOES export `el_str_concat` for the cases where codegen does not know the types.
- `==` on strings is `===` — not `str_eq()`. Same disambiguation logic as the C backend (look at left/right kind, fall back to `str_eq` for identifiers without int annotation).
- `Map` access `m["foo"]` compiles to JS `m["foo"]` (no `el_get_field`). For `Field` access (`m.foo`) we emit `m["foo"]` so it works on plain objects regardless of prototype shape.
- List access `arr[i]` is JS `arr[i]`. No bounds checking — same as C (which segfaults on bad index). Could add `el_list_get` wrapper later for safe access.
- `EL_NULL` becomes JS `null`, not `undefined`. The runtime checks for `=== null` consistently. This avoids the JS undefined/null fork and matches El's single null value.
---
## 3. Builtin runtime layer (`el_runtime.js`)
Same function names as `el_runtime.c` wherever possible, so codegen-js can emit the same call sites. The runtime is a single ES module that exposes every builtin as a named export AND attaches them to a `globalThis.__el` namespace (so generated code can do either `import * as el from './el_runtime.js'` or assume globals).
**The codegen-js generated output uses the global-namespace style:** every emitted file starts with `import './el_runtime.js'` (which side-effects the globals) so call sites stay flat — `println(x)` not `el.println(x)`. This matches the C backend's flat call surface and keeps the generated code grep-compatible across targets.
### Implemented (~90 builtins)
| Category | Functions |
|---|---|
| I/O | `println`, `print` |
| String | `el_str_concat`, `str_concat`, `str_eq`, `str_starts_with`, `str_ends_with`, `str_len`, `int_to_str`, `str_to_int`, `str_slice`, `str_contains`, `str_replace`, `str_to_upper`, `str_to_lower`, `str_trim`, `str_index_of`, `str_split`, `str_char_at`, `str_char_code`, `str_lower`, `str_upper`, `str_pad_left`, `str_pad_right` |
| Math | `el_abs`, `el_max`, `el_min`, `math_sqrt`, `math_log`, `math_ln`, `math_sin`, `math_cos`, `math_pi` |
| Float | `float_to_str`, `int_to_float`, `float_to_int`, `format_float`, `decimal_round`, `str_to_float` |
| List | `el_list_new`, `el_list_len`, `el_list_get`, `el_list_append`, `el_list_empty`, `el_list_clone`, `list_push`, `list_push_front`, `list_join`, `list_range` |
| Map | `el_map_new`, `el_get_field`, `el_map_get`, `el_map_set` |
| HTTP | `http_get`, `http_post`, `http_post_json`, `http_get_with_headers`, `http_post_with_headers` (via `fetch()`, return `Promise<string>`) |
| FS | `fs_read`, `fs_write`, `fs_list` (Node-only) |
| JSON | `json_parse`, `json_stringify`, `json_get`, `json_get_string`, `json_get_int`, `json_get_float`, `json_get_bool`, `json_get_raw`, `json_set`, `json_array_len` |
| Time | `time_now`, `time_now_utc`, `sleep_secs` (Node), `sleep_ms` |
| Bool | `bool_to_str` |
| Process | `exit_program` (Node `process.exit`) |
| Refcount | `el_retain`, `el_release` (no-ops) |
| Method shortforms | `append`, `len`, `get`, `map_get`, `map_set` |
| Native VM aliases | `native_list_get`, `native_list_len`, `native_list_append`, `native_list_empty`, `native_list_clone`, `native_string_chars`, `native_int_to_str` |
| `args` / `env` / `state_*` | Process args, environment, in-memory state |
| UUID | `uuid_v4`, `uuid_new` |
| DOM bridge | `dom_get_element`, `dom_get_value`, `dom_set_value`, `dom_get_text`, `dom_set_text`, `dom_set_prop`, `dom_get_prop`, `dom_set_style`, `dom_add_class`, `dom_remove_class`, `dom_show`, `dom_hide`, `dom_listen`, `dom_query`, `dom_query_all`, `dom_create`, `dom_append`, `dom_remove`, `dom_is_null` (browser-only) |
| DOM extended | `dom_set_attr`, `dom_get_attr`, `dom_remove_attr`, `dom_set_html`, `dom_get_html`, `dom_get_parent`, `dom_contains_class`, `dom_get_checked`, `dom_set_checked` (browser-only) |
| Timers | `set_timeout(ms, cb)`, `set_interval(ms, cb) -> Int`, `clear_interval(handle)` |
| Local storage | `local_storage_get`, `local_storage_set`, `local_storage_remove` (browser-only) |
| Window | `window_location`, `window_redirect`, `window_on_load`, `window_set`, `window_get` |
| Debug | `console_log` |
| Promise helpers (Phase 5) | `promise_then(p, cb)`, `promise_catch(p, cb)`, `promise_resolve(val)`, `promise_reject(msg)` |
| Object / Array (Phase 5) | `object_assign(t, s)`, `object_keys(obj)`, `object_values(obj)`, `json_deep_clone(obj)`, `array_from(iterable)`, `type_of(val)`, `instanceof_check(val, name)` |
| native_js escape hatch | `native_js(code)` — eval; `native_js_call(obj, method, args)` — method call. Use only when no structural alternative exists |
### Stubbed (throw at runtime)
Every function in this list compiles successfully but throws `Error("not supported in JS target — needs server-side delegation: <name>")` when called. This is a **runtime** error, not a compile error, so it doesn't block compilation of code that has dead-code paths through these functions.
- All `dharma_*` (membership in DHARMA network requires the daemon)
- All `engram_*` (needs the embedded SQLite + activation engine — could be reimplemented in JS later)
- All `llm_*` (CORS + API key handling — must go through a server-side proxy)
- `http_serve` (browsers don't host servers; Node could, but that's a separate runtime mode)
- `el_cgi_init` (CGI identity is a server-side concept)
- Crypto: `sha256_*`, `hmac_sha256_*`, `base64*` (deferred — can use `crypto.subtle` later)
### Browser-side specific behavior
When running in a browser:
- `println` / `print` map to `console.log` (no stdout in browsers)
- `http_get` / `http_post` use `fetch()` (CORS applies)
- `fs_*` throws (browsers have no fs)
- `args()` returns `[]`
- `env(k)` throws (or could read from a global config object — TBD)
When running in Node:
- `println` / `print` map to `console.log` and `process.stdout.write`
- `fs_*` use `node:fs/promises` (sync versions for the simple cases)
- `args()` returns `process.argv.slice(2)`
- `env(k)` returns `process.env[k] ?? null`
The runtime auto-detects via `typeof window === 'undefined'`.
---
## 4. Tradeoffs vs the C backend
| Concern | C backend | JS backend |
|---|---|---|
| **Static types** | El's `Int` becomes `int64_t`, real arithmetic | El's `Int` becomes `number` — loses precision past 2^53 |
| **Linking model** | Static link against `el_runtime.c` + libcurl + libpthread | ES module import of `el_runtime.js` |
| **Dynamic dispatch** | `dlsym` for `http_set_handler` / `llm_register_tool` (requires `-rdynamic`) | JS function value lookup via `globalThis[name]` — no compiler flag |
| **Tool registry** | dlsym walks symbol table; tool fns must be top-level C symbols | Tool fns live as exports of the generated module; trivially callable |
| **Memory model** | Refcounted lists/maps with `el_retain`/`el_release` to avoid leaks | JS GC handles all of it; `el_retain`/`el_release` are no-ops |
| **`+` overload** | Has to dispatch in codegen between `el_str_concat` and integer `+` because at C level both are `int64_t` | JS `+` is already overloaded: `"a" + "b"``"ab"`, `1 + 2``3`. Codegen still preserves the existing dispatch for safety, but the runtime fallback is correct |
| **Concurrency** | `pthread`-backed `http_serve` | Single-threaded event loop; `http_serve` not supported in this target |
| **HTTP client** | libcurl, blocking, returns body string | `fetch()` is async — see §5 |
| **CGI identity** | `el_cgi_init` runs at start of `main()` | Not supported; UI code is not a CGI principal |
| **DHARMA / LLM** | Native, blocking, libcurl-backed | Not supported — all such calls throw and the program is expected to delegate to a server-side El daemon via plain HTTP |
| **Compile speed** | El → C → cc → binary (cc is the slow step) | El → JS → done. Faster iteration |
| **Output size** | Static binary ~2MB | Source `.js` + ~10kb runtime |
---
## 5. The async problem
`fetch()` is async. The C backend's `http_get(url)` is synchronous and returns the body string directly. El source was written assuming sync. Three options:
1. **Pretend it's sync from El's POV; use synchronous XHR (browser) or `child_process.execSync('curl ...')` (Node).** Bad: synchronous XHR is deprecated and frozen on the main thread; `execSync` is a hack.
2. **Make every `http_*` builtin in the JS runtime return a `Promise`, and rewrite codegen-js to insert `await` everywhere.** This requires turning every El function that transitively calls a network builtin into an `async fn` in JS. Doable, but invasive.
3. **Explicit `@async` decorator on El functions; codegen-js emits `async function` + `await` for known-async call sites.** This is the approach implemented.
**Decision:** option 3, with an explicit opt-in decorator. `http_get`, `http_post`, `http_post_json`, `http_get_with_headers`, and `http_post_with_headers` in `el_runtime.js` return `Promise<string>`. `codegen-js.el` now emits `await` before calls to these builtins and before calls to any El function decorated `@async`.
### How to use async in El (JS target)
Mark a function with `@async` to declare it as async. Any call to that function from another El function will automatically get `await` in the generated JS. The callee must also be `@async` (or call only non-async code) for the pattern to compose correctly.
```el
@async
fn fetch_user(id: String) -> String {
http_get("https://api.example.com/users/" + id)
}
@async
fn main() -> Void {
let body = fetch_user("42")
println(body)
}
```
Compiles to:
```javascript
async function fetch_user(id) {
return await http_get("https://api.example.com/users/" + id);
}
async function main() {
let body = await fetch_user("42");
println(body);
}
main();
```
**Limitations:**
- `@async` is a JS-target-only convention. The C backend ignores the decorator (it calls the synchronous libcurl-backed version).
- Implicit taint propagation (auto-marking all transitive callers) is not implemented. The programmer must explicitly add `@async` to every function in the call chain that reaches an async builtin.
- Forward-reference calls to `@async` functions are handled correctly: codegen-js does a pre-registration pass over all FnDefs before emitting any code.
For programs that do not touch HTTP, no `@async` annotation is needed and the generated code is identical to before.
---
## 6. Number precision
JS `number` is IEEE 754 double — only 53 bits of integer precision. El `Int` is `int64_t` and the runtime sometimes uses the full 64 bits (e.g. `time_now_utc` returns nanoseconds-since-epoch, which exceeds 2^53 in practice).
**Decision for this scaffold:** accept the precision loss. Document it. UI code does not use 64-bit timestamps. If/when a use case demands it, `time_now_utc` can return a `BigInt` and we can introduce a `BigInt` sub-mode. That's a follow-up.
---
## 7. Language features — JS target coverage
### Fully supported
| Feature | Notes |
|---|---|
| `cgi {}` block | Compiled to a no-op + comment (UI code is not a CGI) |
| `service {}` block | Compiled to a no-op + comment |
| `match` expressions | LitInt/LitStr/LitBool/Wildcard/Binding/Variant via IIFE if/else chain |
| `type` (struct) defs | Skipped; structs are plain JS objects. `t["field"]` works |
| `enum` defs | Skipped; enum values are strings or ints |
| `?` postfix (nil-prop) | `obj?.field` emits `(obj)?.["field"] ?? null` via JS optional chaining |
| `extern fn` | Emits a comment; calls resolve to JS environment globals |
| Anonymous function literals | `fn(p: T) -> R { body }` emits a hoisted `function __lambda_N(p)` |
| `try/catch` | Emits `try { ... } catch (name) { ... }` directly |
| URL imports | `import "https://..."` emits ES module import (or comment in bundle mode) |
| Method call on `Any` | `obj.method(args)` emits `obj.method(args)` for non-El-shortform methods |
| Field access on `Any` | `obj.field` emits `obj["field"]` (bracket notation, works on prototype chains) |
| `@async` decorator | `async function` + `await` at call sites for async builtins and `@async` fns |
### Not supported (stub throws or no-op)
| Feature | Status | Notes |
|---|---|---|
| All `dharma_*` | Stub throws | Requires server-side daemon |
| All `engram_*` | Stub throws | Could be ported to IndexedDB later |
| All `llm_*` | Stub throws | Route through server |
| `http_serve` | Stub throws | Browsers cannot host servers |
| `el_cgi_init` | No-op | CGI identity is server-side |
| Capability enforcement | Not enforced | Runtime stubs throw; compile-time check is a follow-up |
| VBD role check | Not enforced | Same |
| Float bit-cast | Not needed | JS number is already a double |
| Crypto primitives | Stub throws | Add via `crypto.subtle` later |
| `state_*` | In-memory only | Resets on page reload |
| `args()` | Node-only | Browser returns `[]` |
| `fs_*` | Node-only | Browser throws |
---
## 7a. Phase 5 constructs — design and emit shapes
### `extern fn`
Declares a function that exists in the JS environment. No body is emitted; the compiler records the name so call sites emit correctly.
```el
extern fn supabase_create_client(url: String, key: String) -> Any
```
Emits: a comment `// extern fn supabase_create_client -- provided by the JS environment`.
Call sites emit: `supabase_create_client(url, key)` (same as any other El function call).
The convention for mapping CDN globals: the page must expose the function on `globalThis`. For Supabase, the CDN bundle exposes `supabase.createClient`; a thin adapter assigns `globalThis.supabase_create_client = supabase.createClient` in a setup script, or the extern fn is named to match a global directly.
### Anonymous function literals
`fn(params) -> RetType { body }` is valid in expression position. Emitted as a hoisted function declaration with a generated name.
```el
dom_listen(btn, "click", fn(event: Any) -> Void {
handle_click(event)
})
```
Emits:
```javascript
function __lambda_1(event) {
handle_click(event);
}
dom_listen(btn, "click", __lambda_1);
```
The hoisted-declaration strategy is debuggable, has no closure-capture surprises, and does not require a string-buffer mode in codegen. The generated name appears in stack traces.
### `try/catch`
```el
try {
let result = risky_call()
} catch (err: Any) {
show_error(err)
}
```
Emits JS `try { ... } catch (err) { ... }` directly. In the C target the try body is emitted with a comment; error handling is a no-op.
### Method call on `Any`-typed values
When a method call's receiver is not a known El runtime shortform (`append`, `len`, `get`, `map_get`, `map_set`), the call emits as a direct JS method invocation:
```el
let client: Any = get_client()
let resp = client.auth.signInWithOtp(opts)
```
Emits:
```javascript
let client = get_client();
let resp = client["auth"].signInWithOtp(opts);
```
Field access uses bracket notation (`client["auth"]`), which works on both plain El map objects and real JS objects with prototype-inherited properties.
### URL imports
```el
import "https://cdn.jsdelivr.net/npm/@supabase/supabase-js@2/dist/umd/supabase.js"
```
In module mode: `import "https://...";` at the top of the generated file.
In bundle/IIFE mode: `// external: https://...` comment.
El source imports (`.el` files) are excluded -- they were already inlined by `resolve_imports`.
---
## 8. CLI dispatch — `--target=js`
The compiler entry point `compiler.el` adds a `compile_js(source: String) -> String` alongside the existing `compile()`. The CLI behavior:
```
elc <source.el> <output> # default — emit C
elc --target=c <source.el> <out> # explicit — emit C
elc --target=js <source.el> <out> # emit JS
elc --target=js source.el # write JS to stdout (no out path)
```
The argv parser scans for a `--target=<lang>` token; remaining positional args are `<source>` and optional `<out>`. The dispatch logic stays in El: a `compile_dispatch(target, source) -> String` switch.
---
## 8a. Production output — `--minify` and `--obfuscate`
Two post-processing flags produce production-ready browser JS in a single compiler invocation, replacing any external post-processing scripts.
### Usage
```
elc --target=js --bundle --minify source.el > output.min.js
elc --target=js --bundle --obfuscate source.el > output.obf.js
elc --target=js --bundle --minify --obfuscate source.el > output.final.js
```
Both flags require `--target=js`. Passing either without `--target=js` prints an error and exits with code 1.
`--obfuscate` implies `--minify` — obfuscating unminified code produces no benefit and only increases output size.
### Pipeline order
```
generate JS -> (if --bundle, wrap in IIFE) -> (if --minify, run terser) -> (if --obfuscate, run javascript-obfuscator) -> output
```
### Tool discovery
The compiler looks for each tool in this order:
1. `<src_dir>/node_modules/.bin/<tool>` — local install next to source file
2. `<src_dir>/../node_modules/.bin/<tool>` — one level up (monorepo layout)
3. `npx --yes <tool>` — fall back to npx (uses globally cached package or downloads on first use)
If no path resolves and npx is not on `PATH`, the compiler prints a clear error and exits non-zero:
```
el-compiler: error: terser not found. Run 'npm install terser' in your project directory.
el-compiler: error: javascript-obfuscator not found. Run 'npm install javascript-obfuscator' in your project directory.
```
### Minification (terser)
Command issued internally:
```
terser <tmpfile> --compress passes=2,drop_console=false,drop_debugger=true \
--mangle 'reserved=[<reserved>]' --output <tmpfile.min>
```
### Obfuscation (javascript-obfuscator)
Command issued internally (runs after minification):
```
javascript-obfuscator <input> --output <output>
--compact true
--simplify true
--string-array true
--string-array-encoding base64
--string-array-threshold 0.75
--identifier-names-generator hexadecimal
--rename-globals false
--self-defending false
--reserved-names <reserved>
```
### Reserved names
These identifiers are protected from renaming by both tools. They are referenced directly from HTML `onclick=` attributes and other global-scope callsites:
```
neuronDemoToggle, neuronDemoSend, neuronDemoReset,
signInWith, signInWithEmail, signUpWithEmail, sendMagicLink,
signOut, resetPassword, sendResetEmail, updatePassword,
showSignIn, showSignUp, hideReset,
setSort, addFamilyMember, removeFamilyMember, copyForPlatform, entHeadcountChange,
NEURON_CFG
```
### Temp files
The compiler uses `/tmp/elc-<pid>-<timestamp>.js` naming for temp files. All temp files are cleaned up on both success and failure paths.
### Implementation notes
- The compiler adds `stdout_to_file(path)` / `stdout_restore()` builtins to the C runtime (`el_runtime.c`) to capture codegen output (which is streamed via `println`) into a temp file before passing it to the external tools.
- `--minify` and `--obfuscate` error messages are printed after stdout is restored, so they always reach the terminal regardless of output redirection.
---
## 9. The path to compiling el-ui/runtime through this backend
This is the real-world test. `el-ui/runtime/src/` is currently 5 hand-written `.js` files. The path to authoring them in El:
1. **Phase 1 — Hello-world.** DONE.
2. **Phase 2 — Language coverage.** DONE. `match`, struct/enum field access, `?`-propagation, `for`-over-list, complete operators.
3. **Phase 3 — DOM bridge.** DONE. Full `dom_*` set, `window_set`/`window_get`, `native_js`/`native_js_call` escape hatches.
4. **Phase 4 — Production output.** DONE. `--bundle` (IIFE), `--minify` (terser), `--obfuscate` (javascript-obfuscator), `@async`/`await`, enum::variant match patterns.
5. **Phase 5 — Full JS expression coverage.** DONE. This is the phase documented in this revision.
- `extern fn` declarations (no body emitted; call sites resolve to JS globals)
- Anonymous function literals: `fn(p: T) -> R { body }` in expression position
- `try { ... } catch (name: T) { ... }` statement
- Method call on `Any`-typed values: `client.auth.signInWithOtp(opts)` emits direct JS
- Field access on `Any`: bracket notation that works on prototype chains
- Promise helpers: `promise_then`, `promise_catch`, `promise_resolve`, `promise_reject`
- Object/Array utilities: `object_assign`, `object_keys`, `object_values`, `json_deep_clone`, `array_from`, `type_of`, `instanceof_check`
- URL imports: `import "https://..."` emits ES module import
- **Proof**: `examples/browser-auth.el` -- complete Supabase auth flow with zero `native_js` or `native_js_call`
6. **Phase 6 — Port `el-ui/runtime/`.** Translate the 5 JS files to El, compile to JS, swap in. Run el-ui's existing tests. The language is now expressive enough for this.
7. **Phase 7 — Port cgi-studio UI.** Larger surface area; same pattern.
8. **Phase 8 — Marketplace plugins.** Open the door for third-party UI El.
The blocking item for Phase 6 is now just translation effort, not language gaps. Phase 5 removed the last structural barriers.
---
## 10. Test
```bash
echo 'fn main() -> Void { println("hello from el-js") }' > /tmp/hello.el
elc --target=js /tmp/hello.el > /tmp/hello.js
node /tmp/hello.js
# → hello from el-js
```
This should pass after the bootstrap rebuild. See §11.
---
## 11. Bootstrap status
Adding `--target=js` to `compile()` requires regenerating the shipped `elc` binary at `dist/platform/elc`. The rebuild path is:
1. Existing `elc` binary compiles updated `elc-combined.el` (which now includes `codegen-js.el` and the `--target=js` dispatch) → `elc.c`.
2. `cc` compiles `elc.c` → new `elc` binary.
3. New `elc` binary supports `--target=js`.
The scaffold checks all four scaffold files in. The bootstrap rebuild happens as a follow-up step, gated on review of this design doc.