Live on Azure — 95.1% pass@1 on HumanEval with 17× fewer tokens

100K tokens → 5K.
Same accuracy.

Eyelid compresses LLM context on CPU before it hits the model. Deterministic. Stateful. 17.5× compression. Zero quality loss.

How it works

One API call between the IDE and the model. No changes to your stack.

Deterministic Compression

Not a summarizer. Preserves verbatim code where it matters. CPU-only — no GPU dependency. 174ms pipeline, 99% cache hits.

Any IDE, Any Model

Works with Kiro, Cursor, Copilot, Windsurf. Model-agnostic.

Session Memory

Compression improves over time. The engine remembers the project context.

Measured results

Validated on 164 HumanEval problems with GPT-5.3-Codex-2. Full benchmark published.

95.1%

pass@1 on HumanEval

Identical to full context — with 17× fewer tokens

17.5×

Compression

50K → 2.3K tokens avg

<50ms

Engine Latency

CPU-only, no GPU

35.4%

Truncation pass@1

Naive approach fails

329

Tests Passing

22 properties verified

Built in RustLive on Azure164 HumanEval problems validated6.3M tokens saved in benchmark

Built for LLM providers

Eyelid deploys inside your infrastructure as a preprocessing layer. One API call. No model changes. No user-facing changes. 95.1% pass@1 on HumanEval — your users get identical answers with 17× less compute.

Faster Responses

20× fewer input tokens means dramatically lower prefill latency. Your P95 drops without touching the model.

Lower Cost

Same GPU fleet, 20× more requests per second. No new hardware, no new data centers, no 12-month wait.

Better Answers

Focused context beats truncated context. The model sees exactly what it needs — nothing more, nothing less.

100K tokens → 5K.Same accuracy.