Drop-in proxy for OpenAI and Anthropic. We strip the repeats before the model sees them. Same answers, smaller bill.
Point your OpenAI or Anthropic SDK at Prompt Crunch. Two lines of code.
Your prompts are optimized before they hit the provider. Same response back, fewer tokens billed.
Every request logged with before and after token counts in your dashboard.
import anthropic
client = anthropic.Anthropic(
api_key="your-anthropic-key",
base_url="https://api.promptcrunch.dev",
default_headers={
"X-PromptCrunch-Key": "pc_live_...",
},
)
from openai import OpenAI
client = OpenAI(
api_key="your-openai-key",
base_url="https://api.promptcrunch.dev/v1",
default_headers={
"X-PromptCrunch-Key": "pc_live_...",
},
)
API keys pass straight through to the provider. Never stored, never logged.
Every request logged with before and after token counts. Verify everything.
Short conversations go through untouched. We only step in when there's waste to cut.
OpenAI and Anthropic. GPT-4o, GPT-5.x, Claude Sonnet, Opus. One proxy for all of them.
One header and the request goes through raw. You're always in control.
Per-request logs. Token counts. Dollar amounts. See exactly where your money goes.
We're building a self-hosted version. Drop your email and we'll let you know when it's ready.
A small encrypted optimization state lives in memory for an hour so we don't reprocess each prompt from scratch. Flip on zero-retention and even that goes away. The state rides back to you with the response as a signed encrypted blob instead.
We cache a lightweight state object. Your prompts and responses never touch disk.
HMAC-signed, lives in RAM, auto-purged after one hour. Nothing persists.
Forwarded directly to the provider. Not stored, not logged, not read.
Token counts and dollar amounts for billing. Zero message content in our logs.
Our optimization runs on Anthropic's zero-retention API tier. Your data goes nowhere.
Flip one toggle in your dashboard. We hold nothing. The optimization state rides with your response as a signed encrypted blob.
Send your request. We optimize and forward it.
Get a state blob back. Encrypted, HMAC-signed, not your messages.
Send it back with your next prompt. We pick up where we left off. Nothing stored on our end.
Lose it? We reprocess from scratch. No data lost, ever.
If your question isn't here, email us.
Prompt caching saves you money when you replay the same prefix across requests. Prompt Crunch saves you money when the prefix grows over multiple turns of a conversation.
They're complementary, not competitors. You can use both. Caching kicks in for the first ~5 minutes of a hot session; we kick in once the conversation history outgrows what caching can hold.
No. Code blocks, JSON, config files, schemas, IDs, URLs, numbers, and any structured data are preserved verbatim. Only conversational filler, repeated context, and verbose explanations get optimized.
We benchmarked across 40-prompt conversations on Opus 4.6, Sonnet 4.6, and GPT-5.4-thinking and more, and saw no quality degradation.
Short conversations pass straight through with zero overhead. Longer conversations add a few hundred milliseconds for the optimization step before forwarding to your provider.
Net latency is usually lower, not higher: a 50% smaller prompt means the provider spends less time processing input tokens, which more than offsets our overhead on most requests.
If our optimization pipeline errors on a single request, we silently fall back to forwarding your original messages to the provider. Your application keeps working. The response carries a _promptcrunch.status: "error" flag so you can audit it.
If the proxy itself is unreachable, your SDK will throw a connection error like any other network blip.
Every API response carries the original-vs-billed token count as both a JSON field (_promptcrunch.tokens_saved) and an HTTP header (X-PromptCrunch-Saved). Your dashboard shows the breakdown per model, per day, in dollars.
If a request didn't save tokens, you don't pay anything. No savings, no charge.
Yes. Anything that lets you set a custom base_url on the OpenAI or Anthropic client works. Which is essentially everything.
Two lines of code change: swap the base URL and add the X-PromptCrunch-Key header. The rest of your stack is identical. Your provider API key still goes to your provider, your prompts behave the same way.
Your provider API key is forwarded straight through to OpenAI or Anthropic. We never store it. Not in logs, not in the database, not anywhere. Same goes for your messages: by default we hold only a small encrypted optimization state in memory for one hour, then it's gone.
Need stricter? Flip on zero-retention mode in your dashboard. We hold nothing. The optimization state rides with your response as a signed encrypted blob, and your client echoes it back on the next request. We're GDPR-compliant and built for teams handling regulated data.