Cut your API bill by up to 60%.

Drop-in proxy for OpenAI and Anthropic. We strip the repeats before the model sees them. Same answers, smaller bill.

$5 free credit · No card required · Two-line setup

70% fewer input tokens across 40 prompts.

01

Swap your base URL

Point your OpenAI or Anthropic SDK at Prompt Crunch. Two lines of code.

02

We optimize and forward

Your prompts are optimized before they hit the provider. Same response back, fewer tokens billed.

03

Watch your bill drop

Every request logged with before and after token counts in your dashboard.

Without Prompt Crunch

Prompt 1
200
Prompt 10
8,000
Prompt 20
25,000
Prompt 40
50,000

With Prompt Crunch

Prompt 1
200
Prompt 10
5,200
Prompt 20
8,200
Prompt 40
15,000
*Tested with Claude Opus 4.6 on conversational chat. Output tokens unchanged. Results vary by model and content type.

Tested live on Opus 4.6, Sonnet 4.6, GPT-5.4.

Claude Opus 4.6
70%
avg input savings
  • Peak input savings 87%
  • Input tokens saved 420,551
GPT-5.4-thinking
60%
avg input savings
  • Peak input savings 64%
  • Input tokens saved 1,385,731
Claude Sonnet 4.6
65%
avg input savings
  • Peak input savings 89%
  • Input tokens saved 418,663

Your savings, by spend.

$ / mo
prompts
Token reduction rate 40%
Eligible spend (conversational) $3,000
Gross savings $720
Prompt Crunch fee $360
Your net monthly savings $360
Assumes 60% of traffic is multi-prompt conversations. Savings scale with conversation length.

Two line integration.

python / anthropic
import anthropic

client = anthropic.Anthropic(
    api_key="your-anthropic-key",
    base_url="https://api.promptcrunch.dev",
    default_headers={
        "X-PromptCrunch-Key": "pc_live_...",
    },
)
python / openai
from openai import OpenAI

client = OpenAI(
    api_key="your-openai-key",
    base_url="https://api.promptcrunch.dev/v1",
    default_headers={
        "X-PromptCrunch-Key": "pc_live_...",
    },
)
Works with LangChain LlamaIndex Vercel AI SDK Instructor LiteLLM Pydantic AI Mastra + anything that lets you set a base URL

Built for production.

Your keys stay yours

API keys pass straight through to the provider. Never stored, never logged.

Full audit trail

Every request logged with before and after token counts. Verify everything.

Smart passthrough

Short conversations go through untouched. We only step in when there's waste to cut.

Works with your stack

OpenAI and Anthropic. GPT-4o, GPT-5.x, Claude Sonnet, Opus. One proxy for all of them.

Bypass anytime

One header and the request goes through raw. You're always in control.

Real-time dashboard

Per-request logs. Token counts. Dollar amounts. See exactly where your money goes.

Want to run it locally?

We're building a self-hosted version. Drop your email and we'll let you know when it's ready.

Got it. We'll be in touch.

Your data is never saved or sold.

A small encrypted optimization state lives in memory for an hour so we don't reprocess each prompt from scratch. Flip on zero-retention and even that goes away. The state rides back to you with the response as a signed encrypted blob instead.

Optimization state only

We cache a lightweight state object. Your prompts and responses never touch disk.

Encrypted and ephemeral

HMAC-signed, lives in RAM, auto-purged after one hour. Nothing persists.

API keys pass through

Forwarded directly to the provider. Not stored, not logged, not read.

We log numbers, not words

Token counts and dollar amounts for billing. Zero message content in our logs.

No training, no selling, no sharing

Our optimization runs on Anthropic's zero-retention API tier. Your data goes nowhere.

Want zero retention?

Flip one toggle in your dashboard. We hold nothing. The optimization state rides with your response as a signed encrypted blob.

01

Send your request. We optimize and forward it.

02

Get a state blob back. Encrypted, HMAC-signed, not your messages.

03

Send it back with your next prompt. We pick up where we left off. Nothing stored on our end.

04

Lose it? We reprocess from scratch. No data lost, ever.

No savings, no charge.

50/50
we split the savings
Half the savings are yours. Half cover the optimization.
If a request doesn't save tokens, it's free.
  • $5 free credit to start, no card required
  • Works with OpenAI and Anthropic models
  • Full per-request audit trail
  • Streaming, zero-retention, bypass: all included
Try it for free

Common questions.

If your question isn't here, email us.

How is this different from Anthropic's prompt caching?

Prompt caching saves you money when you replay the same prefix across requests. Prompt Crunch saves you money when the prefix grows over multiple turns of a conversation.

They're complementary, not competitors. You can use both. Caching kicks in for the first ~5 minutes of a hot session; we kick in once the conversation history outgrows what caching can hold.

Won't optimization break my conversation context?

No. Code blocks, JSON, config files, schemas, IDs, URLs, numbers, and any structured data are preserved verbatim. Only conversational filler, repeated context, and verbose explanations get optimized.

We benchmarked across 40-prompt conversations on Opus 4.6, Sonnet 4.6, and GPT-5.4-thinking and more, and saw no quality degradation.

What's the latency overhead?

Short conversations pass straight through with zero overhead. Longer conversations add a few hundred milliseconds for the optimization step before forwarding to your provider.

Net latency is usually lower, not higher: a 50% smaller prompt means the provider spends less time processing input tokens, which more than offsets our overhead on most requests.

What happens if Prompt Crunch goes down?

If our optimization pipeline errors on a single request, we silently fall back to forwarding your original messages to the provider. Your application keeps working. The response carries a _promptcrunch.status: "error" flag so you can audit it.

If the proxy itself is unreachable, your SDK will throw a connection error like any other network blip.

How do I know you're actually saving me money?

Every API response carries the original-vs-billed token count as both a JSON field (_promptcrunch.tokens_saved) and an HTTP header (X-PromptCrunch-Saved). Your dashboard shows the breakdown per model, per day, in dollars.

If a request didn't save tokens, you don't pay anything. No savings, no charge.

Does this work with LangChain, LlamaIndex, the Vercel AI SDK, etc?

Yes. Anything that lets you set a custom base_url on the OpenAI or Anthropic client works. Which is essentially everything.

Two lines of code change: swap the base URL and add the X-PromptCrunch-Key header. The rest of your stack is identical. Your provider API key still goes to your provider, your prompts behave the same way.

What about GDPR, privacy, and my data?

Your provider API key is forwarded straight through to OpenAI or Anthropic. We never store it. Not in logs, not in the database, not anywhere. Same goes for your messages: by default we hold only a small encrypted optimization state in memory for one hour, then it's gone.

Need stricter? Flip on zero-retention mode in your dashboard. We hold nothing. The optimization state rides with your response as a signed encrypted blob, and your client echoes it back on the next request. We're GDPR-compliant and built for teams handling regulated data.

Start saving in two minutes.

$5 free credit. No card required.
Create your account