Commit Graph

2 Commits

Author SHA1 Message Date
Will Anderson 990ce72539 lexer: strip JS/CSS comments from code-bearing string literals at compile time
scan_string() is the right gate for this: every El source that embeds JS
or CSS does so as a quoted string literal, and the lexer is the single
chokepoint every backend reads. Strip there and the // line comments
and /* */ block comments never reach the parser, codegen, or the served
HTML.

looks_like_code is intentionally narrow:
  - contains "<script" or "<style" (the embedded-asset case), or
  - contains "function" AND ";" (a JS body without an opening tag)
Plain prose with stray // sequences passes through verbatim.

strip_code_comments tracks JS string state (single, double, backtick)
and never strips inside one. Backslash escapes inside JS strings consume
the next char verbatim. URL guard: when the char before / is ':', emit
the / literally and advance one — preserves https:// inside string
literals. Block-comment scan walks until the matching '*/' pair.

elc-cli.el is now a one-line `import "el-compiler/src/compiler.el"`
shim. Top-level `let _argv = args()` was clashing with C int main()'s
`char** _argv` parameter once compiler.el's fn main() body got folded
into C main. compiler.el owns the CLI entry point now.

Self-host fixed point reached: gen2 == gen3 byte-identical.
Tagged dist/platform/elc.20260502-1104-self-host alongside dist/platform/elc.
2026-05-02 11:14:18 -05:00
Will Anderson 5c05ce9b99 self-host the el compiler
Today's milestone: dist/platform/elc compiles itself byte-for-byte to
itself (stage1 == stage2 == stage3 verified). The compiler is now a
real binary in the world.

What landed
- Spec rewrite (language.md) to truth — every feature marked
  implemented / planned / not-in-this-language with no fiction.
- C runtime extension: 51 new builtins. JSON parser + accessors,
  time, UUID, env, in-process state K/V, float formatting + math,
  string ops (index_of, split, char_at, char_code, pad_left/right,
  format), list ops (push, push_front, join, range), bool_to_str.
  Runtime grew 631 → 1611 lines, header 171 → 247.
- Codegen fix: transform_implicit_return lifts a function's bare
  trailing expression into an explicit return. Without it, lex(),
  parse(), and every other implicit-return function returned 0/nil
  and the whole pipeline produced empty C output.
- Codegen fix: index expressions dispatch on AST kind. obj["literal"]
  → el_get_field (map), arr[i] → el_list_get (list). Same Index node
  in the parser, two different runtime calls.
- Codegen fix: skip emitting fn main() (collides with C main()) and
  honor parsed return-type annotations so Void functions don't get
  return-wrapped (return println(x) is a C type error).
- Parser: capture return-type identifier from -> Ret annotations.
- Lexer: + vessel keyword, + % operator, + \r escape.
- Runtime fix: el_list_append now allocates a fresh list rather than
  realloc'ing the input. Realloc moved blocks made caller pointers
  dangle, which was inserting garbage values into declared lists and
  causing strcmp segfaults. Persistent allocation eliminates the
  whole class of use-after-free at modest memory cost.

Bootstrap path
- One-shot Python helper translated elc-combined.el to C and
  produced stage1. Helper is disposable; not committed.
- stage1 compiles elc-combined.el → stage2.c which cc compiles to
  stage2; stage2 compiles elc-combined.el → stage3.c. stage2.c and
  stage3.c are byte-identical. Closure proven.
- New elc installed at dist/platform/elc; old broken binary
  preserved as dist/platform/elc.legacy.
- dist/platform/elc.c is the canonical generated source.
- elvm and the bytecode pipeline are no longer on the critical path.

Known gap
- The `+` operator's heuristic dispatch still picks string concat
  when both operands are Idents with no literal anchor. Self-hosting
  works because the compiler source is careful, but `fn add(a:Int,
  b:Int) { a + b }` will not do arithmetic until codegen reads the
  parsed type annotations to dispatch. Fix is wiring; not done here.

Tested
- tiny / lextest / whiletest / map+field / array build all run.
- cgi-studio (1037 lines real El) compiles to C cleanly. Link fails
  only because runtime is missing fs_list, json_encode, llm_*; those
  are scheduled batches.
- Three-stage closure (stage1 vs stage2 vs stage3) byte-identical.
2026-04-30 13:10:29 -05:00