API Documentation
Prompt Crunch is a drop-in proxy for the Anthropic and OpenAI APIs. Point your existing SDK at Prompt Crunch, add one header, and we optimize input tokens before they hit the provider. Same responses, fewer tokens billed.
Introduction
Prompt Crunch sits between your application and the LLM provider. Every request you send through us is optimized to remove redundant conversational history before being forwarded. You get back the exact same response your model would normally produce, billed for fewer input tokens.
The core product is API-compatible with both Anthropic and OpenAI. If you already have code that calls anthropic.messages.create() or openai.chat.completions.create(), you only need to change the base URL and add one header. That's it.
Quickstart
Three steps to start saving on your API bill:
- Sign up and get your Prompt Crunch API key (format:
pc_live_...) - Change your SDK's
base_urltohttps://api.promptcrunch.dev - Add the
X-PromptCrunch-Keyheader to every request
import anthropic
client = anthropic.Anthropic(
api_key="your-anthropic-key",
base_url="https://api.promptcrunch.dev",
default_headers={
"X-PromptCrunch-Key": "pc_live_...",
},
)
response = client.messages.create(
model="claude-sonnet-4-5-20251001",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.content[0].text)
from openai import OpenAI
client = OpenAI(
api_key="your-openai-key",
base_url="https://api.promptcrunch.dev/v1",
default_headers={
"X-PromptCrunch-Key": "pc_live_...",
},
)
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
# Anthropic Messages API
curl https://api.promptcrunch.dev/v1/messages \
-H "x-api-key: your-anthropic-key" \
-H "X-PromptCrunch-Key: pc_live_..." \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{
"model": "claude-sonnet-4-5-20251001",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}]
}'
Authentication
Every request to Prompt Crunch needs two API keys:
pc_live_<64 hex chars>. Find it in your dashboard. This authenticates you to Prompt Crunch.Authorization (OpenAI)
Base URL
Prompt Crunch exposes three endpoints that mirror Anthropic and OpenAI exactly. Point your SDK's base URL at whichever matches your provider:
# Anthropic SDK
base_url = "https://api.promptcrunch.dev"
# OpenAI SDK (note the /v1 suffix, mirrors openai.com)
base_url = "https://api.promptcrunch.dev/v1"
Anthropic Messages
Drop-in replacement for Anthropic's /v1/messages endpoint. Supports every field the Anthropic API supports, including model, messages, system, max_tokens, temperature, stream, tools, tool_choice, stop_sequences, top_p, top_k, metadata, and vision content blocks.
Headers
sk-ant-...).2023-06-01. Passed through to Anthropic.Example request
curl https://api.promptcrunch.dev/v1/messages \
-H "x-api-key: sk-ant-..." \
-H "X-PromptCrunch-Key: pc_live_..." \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{
"model": "claude-sonnet-4-5-20251001",
"max_tokens": 2048,
"system": "You are a senior engineer.",
"messages": [
{"role": "user", "content": "Explain consistent hashing."}
]
}'
Example response
{
"id": "msg_01Wn7EE8WV4ehpNSXssYudKh",
"type": "message",
"role": "assistant",
"model": "claude-sonnet-4-5-20251001",
"content": [
{"type": "text", "text": "Consistent hashing is a..."}
],
"stop_reason": "end_turn",
"usage": {
"input_tokens": 489,
"output_tokens": 362
},
"_promptcrunch": {
"status": "passthrough",
"original_tokens": 506,
"tokens_saved": 0,
"savings_pct": 0,
"credit_remaining_usd": 13.47
}
}
OpenAI Chat Completions
Drop-in replacement for OpenAI's /v1/chat/completions. Supports all standard fields including model, messages, max_tokens, max_completion_tokens, temperature, tools, tool_choice, response_format, stream, and reasoning models like gpt-5.4-thinking and o3.
gpt-5.4-thinking, o1, o3, etc., always use max_completion_tokens with plenty of headroom (4k+). These models spend tokens on internal reasoning before producing visible output.
Headers
Authorization: Bearer sk-...true to skip optimization for this request.Example request
curl https://api.promptcrunch.dev/v1/chat/completions \
-H "Authorization: Bearer sk-..." \
-H "X-PromptCrunch-Key: pc_live_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.4",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is a Merkle tree?"}
],
"max_completion_tokens": 4096
}'
Example response
{
"id": "chatcmpl-AbCdEf...",
"object": "chat.completion",
"created": 1712345678,
"model": "gpt-5.4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "A Merkle tree is a..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 111,
"completion_tokens": 453,
"total_tokens": 564
},
"_promptcrunch": {
"status": "passthrough",
"original_tokens": 138,
"tokens_saved": 0,
"savings_pct": 0
}
}
OpenAI Responses API
Proxy for OpenAI's newer Responses API, used by gpt-5-pro, gpt-5.4-pro, and reasoning variants. Same authentication as Chat Completions. The Responses format uses an input field instead of messages.
Example request
curl https://api.promptcrunch.dev/v1/responses \
-H "Authorization: Bearer sk-..." \
-H "X-PromptCrunch-Key: pc_live_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.4-pro",
"input": "Explain how Raft consensus works."
}'
Streaming
Pass stream: true in your request body. Your existing SDK code doesn't need any modifications.
import anthropic
client = anthropic.Anthropic(
api_key="sk-ant-...",
base_url="https://api.promptcrunch.dev",
default_headers={"X-PromptCrunch-Key": "pc_live_..."},
)
with client.messages.stream(
model="claude-sonnet-4-6-20251001",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a story."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
from openai import OpenAI
client = OpenAI(
api_key="sk-...",
base_url="https://api.promptcrunch.dev/v1",
default_headers={"X-PromptCrunch-Key": "pc_live_..."},
)
stream = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Tell me a story."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
curl https://api.promptcrunch.dev/v1/messages \
-H "x-api-key: sk-ant-..." \
-H "X-PromptCrunch-Key: pc_live_..." \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-N \
-d '{
"model": "claude-sonnet-4-6-20251001",
"max_tokens": 1024,
"stream": true,
"messages": [{"role": "user", "content": "Tell me a story."}]
}'
_promptcrunch JSON field.
Response metadata
Every non-streaming response includes a _promptcrunch object appended to the JSON. This tells you what Prompt Crunch did with your request.
"_promptcrunch": {
"status": "optimized",
"original_tokens": 12796,
"tokens_saved": 8362,
"savings_pct": 65.3,
"prompt_score": {"score": 8, "reason": "Clear and specific"},
"credit_remaining_usd": 13.47
}
optimized (we reduced the token count), passthrough (no optimization applied), bypass (skipped via header), error (optimization failed, your original messages were forwarded).{score, reason} object rating prompt quality 1-10.Response headers
Every response carries the same metadata on response headers, prefixed with x-promptcrunch-. Use these when you can't or don't want to parse the JSON body.
compacted, passthrough, bypass, error.original minus optimized).Bypass optimization
Sometimes you want to skip optimization entirely: short prompts, A/B testing, or debugging. Pass the bypass header:
"X-PromptCrunch-Bypass": "true"
The request passes straight through to the provider with no optimization and no processing overhead. Bypassed requests are never billed.
Zero-retention mode
By default, Prompt Crunch holds a small encrypted optimization state in memory for up to one hour so repeat conversations don't reprocess from scratch. For teams handling regulated data, you can flip on zero-retention mode in your dashboard. When enabled:
- No conversation content is cached on our servers, not even encrypted
- Every request is independently optimized
- For incremental optimization across turns, you pass a client state blob yourself
Client state blob
In zero-retention mode (or any time you want stateless incremental optimization), you can echo back a compact, encrypted state blob between turns. The blob is HMAC-signed, gzipped, and contains no plaintext conversation data.
How it works
- Make your first request. The response includes
_promptcrunch.state, a short opaque string. - Store it client-side.
- On your next request, send it back via the
X-PromptCrunch-Stateheader. - Prompt Crunch picks up where you left off.
# Turn 1: no state yet
resp = client.messages.create(
model="claude-sonnet-4-6-20251001",
max_tokens=1024,
messages=conversation,
)
state = resp.model_extra.get("_promptcrunch", {}).get("state")
# Turn 2+: pass the state back
client = anthropic.Anthropic(
api_key="sk-ant-...",
base_url="https://api.promptcrunch.dev",
default_headers={
"X-PromptCrunch-Key": "pc_live_...",
"X-PromptCrunch-State": state,
},
)
resp = client.messages.create(...)
# Turn 1: no state yet
resp = client.chat.completions.create(
model="gpt-5.4",
messages=conversation,
)
state = resp.model_extra.get("_promptcrunch", {}).get("state")
# Turn 2+: pass the state back
client = OpenAI(
api_key="sk-...",
base_url="https://api.promptcrunch.dev/v1",
default_headers={
"X-PromptCrunch-Key": "pc_live_...",
"X-PromptCrunch-State": state,
},
)
resp = client.chat.completions.create(...)
# Turn 1: capture the state from the response
STATE=$(curl -s https://api.promptcrunch.dev/v1/messages \
-H "x-api-key: sk-ant-..." \
-H "X-PromptCrunch-Key: pc_live_..." \
-H "anthropic-version: 2023-06-01" \
-d '{"model":"claude-sonnet-4-6-20251001","max_tokens":1024,"messages":[...]}' \
| jq -r '._promptcrunch.state')
# Turn 2+: send it back via header
curl https://api.promptcrunch.dev/v1/messages \
-H "x-api-key: sk-ant-..." \
-H "X-PromptCrunch-Key: pc_live_..." \
-H "X-PromptCrunch-State: $STATE" \
-H "anthropic-version: 2023-06-01" \
-d '{"model":"claude-sonnet-4-6-20251001","max_tokens":1024,"messages":[...]}'
Get account Auth
Returns your profile, trial credit balance, usage stats, and billing breakdown for the last 30 days. Requires X-PromptCrunch-Key header or active session cookie.
curl https://api.promptcrunch.dev/api/me \
-H "X-PromptCrunch-Key: pc_live_..."
Response
{
"user": {
"id": 42,
"email": "you@example.com",
"name": "Jane Doe",
"plan_status": "trial",
"trial_credit_usd": 5.00,
"trial_used_usd": 0.53,
"trial_credit_remaining_usd": 4.47,
"billing_rate": 0.5,
"zero_retention": false
},
"stats": {
"total_requests": 874,
"total_tokens_saved": 10800000,
"savings_percentage": 62.3
},
"billing": {
"your_savings_usd": 13.90,
"promptcrunch_fee_usd": 13.90,
"gross_savings_usd": 27.80
}
}
Usage history Auth
Returns your recent request history. Supports limit (max 500). Each entry includes model, token counts, optimization status, and timestamp.
Daily rollup. Supports days (max 365). Useful for dashboards and usage graphs.
Billing summary with dollar-denominated savings and fees for the period.
Warnings
Every proxy response carries a _promptcrunch.warnings array with any actionable signals about your account. Warnings are non-blocking: your request still succeeds, but we flag things you probably want to know about (like running low on credit).
Where warnings appear
- JSON body:
_promptcrunch.warningswith full list of code, message, and action URL - Response headers:
X-PromptCrunch-Warning(primary code) andX-PromptCrunch-Warning-Message(human-readable text) - Dashboard: a persistent banner at the top of your dashboard
- Email: billing warnings are sent by email, rate-limited to once per 24 hours per type
Warning codes
_promptcrunch.error field contains details.Example response with a warning
{
"id": "msg_01...",
"type": "message",
"content": [{"type": "text", "text": "..."}],
"_promptcrunch": {
"status": "optimized",
"original_tokens": 12796,
"tokens_saved": 8362,
"savings_pct": 65.3,
"credit_remaining_usd": 0.47,
"warnings": [
{
"code": "credit_low",
"message": "Credit low: $0.47 remaining. Top up to keep optimization running.",
"severity": "warning",
"action_url": "https://promptcrunch.dev/my#billing"
}
]
}
}
Example client code
A quick pattern for handling warnings in your client:
def call_with_warnings(messages):
resp = client.messages.create(model="...", messages=messages)
meta = resp.model_extra.get("_promptcrunch", {})
for warning in meta.get("warnings", []):
logger.warning(f"Prompt Crunch: {warning['code']} - {warning['message']}")
if warning["code"] == "credit_exhausted":
alert_ops_team("LLM proxy credit exhausted")
return resp
_promptcrunch.warnings, nothing breaks. Warnings are additive signals, never errors.
Status codes
Prompt Crunch uses conventional HTTP status codes. Any status code returned by the upstream provider is passed back to you unchanged, along with the provider's original error body.
_promptcrunch.status for optimization details.model, messages), or malformed request body.X-PromptCrunch-Key, or missing provider auth header (x-api-key / Authorization).Error handling
If our optimization pipeline fails for any reason, we never drop your request. We forward your original, uncompressed messages to the provider and return the response as normal, with _promptcrunch.status: "error". You'll miss the savings on that one request, but your application keeps working.
Errors from the upstream provider are passed back verbatim. If Anthropic returns a 429 rate-limit error, you'll see the exact Anthropic error body with HTTP 429.
{
"detail": "Invalid or inactive Prompt Crunch API key"
}
Rate limits
Prompt Crunch applies its own per-user rate limits on top of whatever limits your provider enforces:
Need higher limits? Get in touch.
Supported models
Prompt Crunch is model-agnostic. Any model accessible via the Anthropic or OpenAI APIs works. Below are the models we've benchmarked and explicitly tuned for:
Anthropic
claude-opus-4-6-*: best quality, highest savingsclaude-sonnet-4-6-*: recommended for most workloadsclaude-haiku-4-5-*: fastest, lowest cost per token
OpenAI
gpt-5.4,gpt-5.4-thinking,gpt-5.4-progpt-5.3,gpt-5.3-codexgpt-4o,gpt-4o-mini,gpt-4-turbo,gpt-4o3,o4-mini(reasoning models)