Cut your API bill by up to 60%.

Drop-in proxy for OpenAI and Anthropic. We strip the repeats before the model sees them. Same answers, smaller bill.

Try it for free See the numbers

$5 free credit · No card required · Two-line setup

app.promptcrunch.dev/my

Your savings · 30 days

$248.17

of $496.34 gross savings

Requests12.4K
Tokens saved86.2M
Avg savings62%
Processed139M

Daily savings 30 days

70% fewer input tokens across 40 prompts.

Swap your base URL

Point your OpenAI or Anthropic SDK at Prompt Crunch. Two lines of code.

We optimize and forward

Your prompts are optimized before they hit the provider. Same response back, fewer tokens billed.

Watch your bill drop

Every request logged with before and after token counts in your dashboard.

Without Prompt Crunch

Prompt 1

200

Prompt 10

8,000

Prompt 20

25,000

Prompt 40

50,000

With Prompt Crunch

Prompt 1

200

Prompt 10

5,200

Prompt 20

8,200

Prompt 40

15,000

*Tested with Claude Opus 4.6 on conversational chat. Output tokens unchanged. Results vary by model and content type.

Tested live on Opus 4.6, Sonnet 4.6, GPT-5.4.

Claude Opus 4.6

70%

avg input savings

Peak input savings 87%
Input tokens saved 420,551

GPT-5.4-thinking

60%

avg input savings

Peak input savings 64%
Input tokens saved 1,385,731

Claude Sonnet 4.6

65%

avg input savings

Peak input savings 89%
Input tokens saved 418,663

Your savings, by spend.

Monthly API spend

$ / mo

Primary model

Average conversation length

prompts

Token reduction rate 40%

Eligible spend (conversational) $3,000

Gross savings $720

Prompt Crunch fee $360

                            Your net monthly savings
                            $360
                        

Assumes 60% of traffic is multi-prompt conversations. Savings scale with conversation length.

Two line integration.

python / anthropic

import anthropic

client = anthropic.Anthropic(
    api_key="your-anthropic-key",
    base_url="https://api.promptcrunch.dev",
    default_headers={
        "X-PromptCrunch-Key": "pc_live_...",
    },
)

python / openai

from openai import OpenAI

client = OpenAI(
    api_key="your-openai-key",
    base_url="https://api.promptcrunch.dev/v1",
    default_headers={
        "X-PromptCrunch-Key": "pc_live_...",
    },
)

Works with LangChain LlamaIndex Vercel AI SDK Instructor LiteLLM Pydantic AI Mastra + anything that lets you set a base URL

Built for production.

Your keys stay yours

API keys pass straight through to the provider. Never stored, never logged.

Full audit trail

Every request logged with before and after token counts. Verify everything.

Smart passthrough

Short conversations go through untouched. We only step in when there's waste to cut.

Works with your stack

OpenAI and Anthropic. GPT-4o, GPT-5.x, Claude Sonnet, Opus. One proxy for all of them.

Bypass anytime

One header and the request goes through raw. You're always in control.

Real-time dashboard

Per-request logs. Token counts. Dollar amounts. See exactly where your money goes.

Want to run it locally?

We're building a self-hosted version. Drop your email and we'll let you know when it's ready.

Got it. We'll be in touch.

Your data is never saved or sold.

A small encrypted optimization state lives in memory for an hour so we don't reprocess each prompt from scratch. Flip on zero-retention and even that goes away. The state rides back to you with the response as a signed encrypted blob instead.

Optimization state only

We cache a lightweight state object. Your prompts and responses never touch disk.

Encrypted and ephemeral

HMAC-signed, lives in RAM, auto-purged after one hour. Nothing persists.

API keys pass through

Forwarded directly to the provider. Not stored, not logged, not read.

We log numbers, not words

Token counts and dollar amounts for billing. Zero message content in our logs.

No training, no selling, no sharing

Our optimization runs on Anthropic's zero-retention API tier. Your data goes nowhere.

Want zero retention?

Flip one toggle in your dashboard. We hold nothing. The optimization state rides with your response as a signed encrypted blob.

Send your request. We optimize and forward it.

Get a state blob back. Encrypted, HMAC-signed, not your messages.

Send it back with your next prompt. We pick up where we left off. Nothing stored on our end.

Lose it? We reprocess from scratch. No data lost, ever.

No savings, no charge.

50/50

we split the savings

Half the savings are yours. Half cover the optimization.
If a request doesn't save tokens, it's free.

$5 free credit to start, no card required
Works with OpenAI and Anthropic models
Full per-request audit trail
Streaming, zero-retention, bypass: all included

Try it for free

Common questions.

If your question isn't here, email us.

How is this different from Anthropic's prompt caching?

Prompt caching saves you money when you replay the same prefix across requests. Prompt Crunch saves you money when the prefix grows over multiple turns of a conversation.

They're complementary, not competitors. You can use both. Caching kicks in for the first ~5 minutes of a hot session; we kick in once the conversation history outgrows what caching can hold.

Won't optimization break my conversation context?

No. Code blocks, JSON, config files, schemas, IDs, URLs, numbers, and any structured data are preserved verbatim. Only conversational filler, repeated context, and verbose explanations get optimized.

We benchmarked across 40-prompt conversations on Opus 4.6, Sonnet 4.6, and GPT-5.4-thinking and more, and saw no quality degradation.

What's the latency overhead?

Short conversations pass straight through with zero overhead. Longer conversations add a few hundred milliseconds for the optimization step before forwarding to your provider.

Net latency is usually lower, not higher: a 50% smaller prompt means the provider spends less time processing input tokens, which more than offsets our overhead on most requests.

What happens if Prompt Crunch goes down?

If our optimization pipeline errors on a single request, we silently fall back to forwarding your original messages to the provider. Your application keeps working. The response carries a _promptcrunch.status: "error" flag so you can audit it.

If the proxy itself is unreachable, your SDK will throw a connection error like any other network blip.

How do I know you're actually saving me money?

Every API response carries the original-vs-billed token count as both a JSON field (_promptcrunch.tokens_saved) and an HTTP header (X-PromptCrunch-Saved). Your dashboard shows the breakdown per model, per day, in dollars.

If a request didn't save tokens, you don't pay anything. No savings, no charge.

Does this work with LangChain, LlamaIndex, the Vercel AI SDK, etc?

Yes. Anything that lets you set a custom base_url on the OpenAI or Anthropic client works. Which is essentially everything.

Two lines of code change: swap the base URL and add the X-PromptCrunch-Key header. The rest of your stack is identical. Your provider API key still goes to your provider, your prompts behave the same way.

What about GDPR, privacy, and my data?

Your provider API key is forwarded straight through to OpenAI or Anthropic. We never store it. Not in logs, not in the database, not anywhere. Same goes for your messages: by default we hold only a small encrypted optimization state in memory for one hour, then it's gone.

Need stricter? Flip on zero-retention mode in your dashboard. We hold nothing. The optimization state rides with your response as a signed encrypted blob, and your client echoes it back on the next request. We're GDPR-compliant and built for teams handling regulated data.

Cut your API bill by up to 60%.

70% fewer input tokens across 40 prompts.

Swap your base URL

We optimize and forward

Watch your bill drop

Without Prompt Crunch

With Prompt Crunch

Tested live on Opus 4.6, Sonnet 4.6, GPT-5.4.

Your savings, by spend.

Two line integration.

Built for production.

Your keys stay yours

Full audit trail

Smart passthrough

Works with your stack

Bypass anytime

Real-time dashboard

Want to run it locally?

Your data is never saved or sold.

Optimization state only

Encrypted and ephemeral

API keys pass through

We log numbers, not words

No training, no selling, no sharing

Want zero retention?

No savings, no charge.

Common questions.

Start saving in two minutes.