SCO is a session-level compression protocol that directs the inference model itself to emit compact encoded output. The client decompresses in real-time as tokens arrive, without any modification to inference infrastructure.
"65–80% output token reduction. Zero latency overhead. Fully backward-compatible."
Watch the compressed token stream arrive on the left and the decompressed output materialize on the right. The compression ratio updates in real time as each token is processed.
SCO stacks four independent compression techniques. Each layer compounds the gains of the prior layers. Used together, they achieve 65–80% token reduction using only prompt engineering.
Instead of prose, the model emits pipe-delimited schema fields. ACTION:called_api|RESULT:success_200|NEXT:validate is fully parseable and expands to a readable sentence at zero streaming overhead. The schema is negotiated in the sco-init handshake.
A pre-shared codebook maps single-token codes to common phrases. [fn] → function, [ret] → returns. Critically, each code must be verified as a single token in the target tokenizer — Unicode symbols silently fail this requirement.
The model defines a label once using the syntax ↦LABEL: full text↤, then references it as [§LABEL] thereafter. Labels are scoped per-session and accumulate across a multi-step execution. Ideal for recurring proper nouns, system names, and long noun phrases.
The model emits [Δstep_id] to reference a prior step's complete output from the client's execution cache — inserting its full content without re-emitting a single token. The same reference doubles as a GC eviction back-pointer for the persistent context cache.
Not all short strings are single tokens. Codebook codes must be verified against the actual tokenizer — Unicode symbols and many punctuation sequences silently expand to multiple tokens, negating the compression gain entirely.
len(tokens) == 1. Codes that fail are rejected. The verified codebook is transmitted in the sco-init SSE event alongside its HMAC signature.
The client-side decompressor is a deterministic state machine. It processes the raw byte stream character by character, resolving SCO constructs as they arrive without buffering or lookahead.
Token counts measured on multi-step agentic execution traces. "Prompt-only" uses system-prompt directives alone. "Fine-tuned" uses a model specifically trained to emit SCO output, achieving closer to theoretical maximum.
A single [Δstep_id] token does two jobs simultaneously: it triggers decompression expansion for the current response, and it records a GC eviction pointer for the persistent context cache.
[Δstep_id] token received.