security: replace denylist sanitize_share_html with allowlist el_html_sanitize
Deploy marketing to Cloud Run / deploy (push) Failing after 5s

A real attacker probed /api/share earlier today with <script>alert(1),
<iframe src=evil>, <img onerror>, <a href="javascript:...">, and a
<form action="/steal"> payload. Nothing executed because the chat
bubble at /share/<id> renders the served HTML inside marked.js's
already-escaped output, but the prior denylist sanitizer was fragile:

  - It comment-wrapped dangerous tags ("<!--script>...-->") which a
    literal "-->" inside an attacker-supplied attribute value can close
    early, re-exposing the original payload.
  - It renamed on*= attributes to data-x-on*= which left attack
    indicators visible in the served HTML.
  - It was a denylist; every new attack vector required a code change.
  - It didn't validate <a href> URL schemes properly.

The replacement is a runtime-level state-machine allowlist parser
(foundation/el af480f6: el_html_sanitize). The product just specifies
the JSON allowlist of allowed tags + attributes; the runtime drops
everything else, validates href/src URL schemes (http/https/mailto/
fragment/relative only), and drops whole subtrees of script/style/
iframe/object/embed/form regardless of the allowlist.

Phase 4 of bl-dc55ae07: deletes sanitize_share_html (main.el) and
gal_sanitize_html (gallery.el); replaces 3 call sites with
el_html_sanitize(html, allowlist). Defines default_share_allowlist
in main.el and the identical gallery_share_allowlist in gallery.el
(separate bindings to avoid a forward-reference at build-concat
order — gallery is concatenated before main).

Phase 5: migrations/20260502185500_backfill_resanitize_share_cards.sql
nulls answer_html for any share_cards row older than 1 hour. Applied
via the Supabase Management API; 0 rows in scope (the column was
added today and existing rows pre-date its first write).

Also fixes an orthogonal duplicate-symbol bug: unix_timestamp() was
defined in both dist/web_stubs.c and the runtime (the latter is a
recent runtime addition picked up by the runtime sync). Removed the
stub.

Backlog: bl-dc55ae07
This commit is contained in:
Will Anderson
2026-05-02 12:56:33 -05:00
parent 4629796a75
commit 46f93fd6eb
6 changed files with 1144 additions and 127 deletions
+71 -1
View File
@@ -199,6 +199,19 @@ el_val_t http_get_to_file(el_val_t url, el_val_t headers_map, el_val_t output_p
el_val_t url_encode(el_val_t s); /* RFC 3986 unreserved set */
el_val_t url_decode(el_val_t s); /* '+' → space, %XX → byte */
/* ── HTML allowlist sanitizer ────────────────────────────────────────────────
* el_html_sanitize(input_html, allowlist_json) — strict allowlist HTML
* cleaner. State-machine parser; tag/attribute names compared case-
* insensitively against the allowlist; `<a href>` / `<… src>` URL schemes
* validated (http, https, mailto, fragment-only, or relative); whole-
* subtree drop for script / style / iframe / object / embed / form; HTML-
* escapes free text outside dropped subtrees.
*
* The allowlist is JSON of the form
* {"p":[],"a":["href","title"],"strong":[],...}
* where each value is the array of attribute names allowed for that tag. */
el_val_t el_html_sanitize(el_val_t input_html, el_val_t allowlist_json);
/* ── Filesystem ──────────────────────────────────────────────────────────── */
el_val_t fs_read(el_val_t path);
@@ -246,6 +259,63 @@ el_val_t time_from_parts(el_val_t secs, el_val_t ns, el_val_t tz);
el_val_t time_add(el_val_t ts, el_val_t n, el_val_t unit);
el_val_t time_diff(el_val_t ts1, el_val_t ts2, el_val_t unit);
/* ── Instant + Duration: first-class temporal types ──────────────────────────
* Both types share the el_val_t (int64) slot. Instants are nanoseconds
* since the Unix epoch; Durations are signed nanoseconds. Type discipline
* is enforced at codegen-time: BinOps on names registered as Instant or
* Duration route through the typed wrappers below; mismatches like
* Instant+Instant become #error at the C compiler.
*
* Postfix literals — `30.seconds`, `1.hour`, `500.millis`, `30.nanos` — are
* recognised by the parser as DurationLit AST nodes and lowered to literal
* int64 nanoseconds at codegen time. The runtime never sees the units. */
el_val_t el_now_instant(void);
el_val_t now(void);
el_val_t unix_seconds(el_val_t n);
el_val_t unix_millis(el_val_t n);
el_val_t instant_from_iso8601(el_val_t s);
el_val_t el_duration_from_nanos(el_val_t ns);
el_val_t duration_seconds(el_val_t n);
el_val_t duration_millis(el_val_t n);
el_val_t duration_nanos(el_val_t n);
el_val_t el_instant_add_dur(el_val_t inst, el_val_t dur);
el_val_t el_instant_sub_dur(el_val_t inst, el_val_t dur);
el_val_t el_instant_diff(el_val_t a, el_val_t b);
el_val_t el_duration_add(el_val_t a, el_val_t b);
el_val_t el_duration_sub(el_val_t a, el_val_t b);
el_val_t el_duration_scale(el_val_t dur, el_val_t scalar);
el_val_t el_duration_div(el_val_t dur, el_val_t scalar);
el_val_t el_instant_lt(el_val_t a, el_val_t b);
el_val_t el_instant_le(el_val_t a, el_val_t b);
el_val_t el_instant_gt(el_val_t a, el_val_t b);
el_val_t el_instant_ge(el_val_t a, el_val_t b);
el_val_t el_instant_eq(el_val_t a, el_val_t b);
el_val_t el_instant_ne(el_val_t a, el_val_t b);
el_val_t el_duration_lt(el_val_t a, el_val_t b);
el_val_t el_duration_le(el_val_t a, el_val_t b);
el_val_t el_duration_gt(el_val_t a, el_val_t b);
el_val_t el_duration_ge(el_val_t a, el_val_t b);
el_val_t el_duration_eq(el_val_t a, el_val_t b);
el_val_t el_duration_ne(el_val_t a, el_val_t b);
el_val_t instant_to_unix_seconds(el_val_t i);
el_val_t instant_to_unix_millis(el_val_t i);
el_val_t instant_to_iso8601(el_val_t i);
el_val_t duration_to_seconds(el_val_t d);
el_val_t duration_to_millis(el_val_t d);
el_val_t duration_to_nanos(el_val_t d);
el_val_t el_sleep_duration(el_val_t dur);
el_val_t unix_timestamp(void);
el_val_t ttl_cache_set(el_val_t key, el_val_t value);
el_val_t ttl_cache_get(el_val_t key, el_val_t max_age);
el_val_t ttl_cache_age(el_val_t key);
/* ── UUID ────────────────────────────────────────────────────────────────── */
el_val_t uuid_new(void);
@@ -288,7 +358,7 @@ el_val_t str_char_at(el_val_t s, el_val_t i);
el_val_t str_char_code(el_val_t s, el_val_t i);
el_val_t str_pad_left(el_val_t s, el_val_t width, el_val_t pad);
el_val_t str_pad_right(el_val_t s, el_val_t width, el_val_t pad);
el_val_t str_format(el_val_t template, el_val_t data);
el_val_t str_format(el_val_t fmt, el_val_t data);
el_val_t str_lower(el_val_t s);
el_val_t str_upper(el_val_t s);