perf: 81% RSS reduction in elc compiler #4

Closed
will.anderson wants to merge 0 commits from optimize/compiler-perf into main
Owner

Summary

Seven rounds of swarm optimization on the El compiler, applied as a clean branch from `add-linux-binaries` (eb52be4).

  • Flat stride-2 token list — lexer emits `[kind0, val0, kind1, val1, ...]` instead of `[{kind, val}, ...]`, eliminating per-token ElMap allocation (~112B × N tokens)
  • str_char_code hot loop — character classification via Int codes in lexer, no strdup per character
  • Batch c_escape — `str_slice` clean ASCII runs instead of per-byte `str_char_at`; only special bytes go through the append path
  • Systematic el_release() — eagerly frees intermediate parse result maps throughout parser.el after all fields extracted; containers freed as soon as consumed
  • Streaming codegen pipeline — `codegen_streaming()` parses one declaration at a time, emits C, discards AST; peak memory is O(one function) instead of O(whole program)
  • Per-function and per-statement arena scoping — `el_arena_push/pop` around each compile unit and each cg_stmt(); intermediate codegen strings freed at boundaries
  • HAVE_CURL guard — curl-dependent HTTP/LLM/OTLP/Dharma functions behind `#ifdef HAVE_CURL`; elc CLI links without -lcurl, eliminating libcurl SSL/TLS init overhead (~27MB saved)
  • HTML codegen parts-list — O(n) instead of O(n²) string growth for `cg_html_parts`, `cg_html_attrs_str`, `cg_html_element_str`

Results

Metric Before After Change
RSS (web/src/main.el) 33.4MB 6.5MB -80.5%
Binary size 452KB 318KB -29.7%
Self-host PASS PASS

Test plan

  • Self-host: gen2.c == gen3.c (verified during development: PASS)
  • Benchmark: RSS < 7MB on web/src/main.el (measured: 6.5MB)
  • Compile time: ≤ 250ms (measured: ~210ms)
  • Bootstrap binary updated in lang/dist/platform/elc
  • Build with -DHAVE_CURL for production deployments needing HTTP/LLM
## Summary Seven rounds of swarm optimization on the El compiler, applied as a clean branch from \`add-linux-binaries\` (eb52be4). - **Flat stride-2 token list** — lexer emits \`[kind0, val0, kind1, val1, ...]\` instead of \`[{kind, val}, ...]\`, eliminating per-token ElMap allocation (~112B × N tokens) - **str_char_code hot loop** — character classification via Int codes in lexer, no strdup per character - **Batch c_escape** — \`str_slice\` clean ASCII runs instead of per-byte \`str_char_at\`; only special bytes go through the append path - **Systematic el_release()** — eagerly frees intermediate parse result maps throughout parser.el after all fields extracted; containers freed as soon as consumed - **Streaming codegen pipeline** — \`codegen_streaming()\` parses one declaration at a time, emits C, discards AST; peak memory is O(one function) instead of O(whole program) - **Per-function and per-statement arena scoping** — \`el_arena_push/pop\` around each compile unit and each cg_stmt(); intermediate codegen strings freed at boundaries - **HAVE_CURL guard** — curl-dependent HTTP/LLM/OTLP/Dharma functions behind \`#ifdef HAVE_CURL\`; elc CLI links without -lcurl, eliminating libcurl SSL/TLS init overhead (~27MB saved) - **HTML codegen parts-list** — O(n) instead of O(n²) string growth for \`cg_html_parts\`, \`cg_html_attrs_str\`, \`cg_html_element_str\` ## Results | Metric | Before | After | Change | |--------|--------|-------|--------| | RSS (web/src/main.el) | 33.4MB | 6.5MB | -80.5% | | Binary size | 452KB | 318KB | -29.7% | | Self-host | PASS | PASS | — | ## Test plan - [ ] Self-host: gen2.c == gen3.c (verified during development: PASS) - [ ] Benchmark: RSS < 7MB on web/src/main.el (measured: 6.5MB) - [ ] Compile time: ≤ 250ms (measured: ~210ms) - [ ] Bootstrap binary updated in lang/dist/platform/elc - [ ] Build with -DHAVE_CURL for production deployments needing HTTP/LLM
will.anderson added 6 commits 2026-05-06 01:40:02 +00:00
Replace str_char_at (returns strdup String) with str_char_code (returns Int)
in the main lex() while loop and scan_digits/scan_ident helpers.

For a 400KB combined source, str_char_at was allocating ~400K x 16B = 6.4MB
of transient 2-byte strings for the ch variable alone. str_char_code returns
an integer directly — zero allocation.

Add Int-based helpers: is_digit_code, is_alpha_code, is_ws_code,
is_alnum_or_underscore_code. Rewrite lex() operator dispatch using char
code constants (e.g. '/'=47, '"'=34, '='=61).

Result on main.el: 17.1MB -> 15.4MB peak RSS (-10%).
Self-hosting: PASS.
Combines two orthogonal optimizations:
1. Flat token list (from beta): lex() returns [Any] with alternating kind/value
   pairs instead of [Map], eliminating one ElMap per token (~3 mallocs each).
   Parser updated: tok_kind(t,i) = t[2*i], tok_value(t,i) = t[2*i+1].

2. Char code dispatch (from alpha): lex() hot loop uses str_char_code -> Int
   instead of str_char_at -> strdup String for all character classification.
   Eliminates ~400K x 16B = 6.4MB of temporary string allocations.

scan_digits and scan_ident also updated to use str_char_code.

Result on main.el: 17.1MB -> 14.4MB peak RSS (-16%).
Self-hosting: PASS.
Combines two orthogonal optimizations:
1. c_escape batching (from alpha): ASCII runs emitted as str_slice segments instead
   of one str_char_at string per byte. O(N) allocs → O(K) where K = special chars.

2. scan_interp_string batching (from beta): char dispatch via str_char_code (Int)
   + clean_start tracking to flush plain runs as str_slice. Eliminates per-char
   string allocations in the string-literal scanning hot path.

Result on web/src/main.el: 14.5MB -> 13.4MB peak RSS (-7.6%).
Self-hosting: PASS.
- Flat token list: lexer emits [kind0, val0, kind1, val1, ...] instead of [{kind,val}, ...]
  Eliminates per-token ElMap allocation (~112B × N tokens)
- str_char_code hot loop: char classification via Int codes, no strdup per char
- Batch c_escape: str_slice clean runs instead of char-at per byte
- Parser updated to use tok_at/tok_kind/tok_value stride-2 accessors
Chain of optimizations from swarm rounds 4-7:
- Flat stride-2 token list: eliminate per-token Map allocation (~112B each × N tokens)
- Systematic el_release() in parser.el: eagerly free intermediate parse result maps
- Per-function and per-statement arena scoping in codegen_streaming()
- Streaming codegen pipeline: parse one fn at a time, emit C, discard AST
- HAVE_CURL guard: elc CLI binary drops libcurl, eliminating SSL/TLS init overhead
- HTML codegen parts-list: O(n) instead of O(n²) string growth for nested templates
- Batch c_escape: str_slice clean runs instead of char-at per byte

Result: 33.4MB → 6.5MB RSS on web/src/main.el (-81%). Self-host: PASS.
will.anderson closed this pull request 2026-05-06 19:36:58 +00:00

Pull request closed

Please reopen this pull request to perform a merge.
Sign in to join this conversation.
No Reviewers
No labels
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: neuron-technologies/el#4