How it works
One API call between the IDE and the model. No changes to your stack.
Deterministic Compression
Not a summarizer. Preserves verbatim code where it matters. CPU-only — no GPU dependency. 174ms pipeline, 99% cache hits.
Any IDE, Any Model
Works with Kiro, Cursor, Copilot, Windsurf. Model-agnostic.
Session Memory
Compression improves over time. The engine remembers the project context.
Measured results
Validated on 164 HumanEval problems with GPT-5.3-Codex-2. Full benchmark published.
Built for LLM providers
Eyelid deploys inside your infrastructure as a preprocessing layer. One API call. No model changes. No user-facing changes. 95.1% pass@1 on HumanEval — your users get identical answers with 17× less compute.
Faster Responses
20× fewer input tokens means dramatically lower prefill latency. Your P95 drops without touching the model.
Lower Cost
Same GPU fleet, 20× more requests per second. No new hardware, no new data centers, no 12-month wait.
Better Answers
Focused context beats truncated context. The model sees exactly what it needs — nothing more, nothing less.