API Documentation

Prompt Crunch is a drop-in proxy for the Anthropic and OpenAI APIs. Point your existing SDK at Prompt Crunch, add one header, and we optimize input tokens before they hit the provider. Same responses, fewer tokens billed.

Introduction

Prompt Crunch sits between your application and the LLM provider. Every request you send through us is optimized to remove redundant conversational history before being forwarded. You get back the exact same response your model would normally produce, billed for fewer input tokens.

The core product is API-compatible with both Anthropic and OpenAI. If you already have code that calls anthropic.messages.create() or openai.chat.completions.create(), you only need to change the base URL and add one header. That's it.

Quickstart

Three steps to start saving on your API bill:

  1. Sign up and get your Prompt Crunch API key (format: pc_live_...)
  2. Change your SDK's base_url to https://api.promptcrunch.dev
  3. Add the X-PromptCrunch-Key header to every request
import anthropic

client = anthropic.Anthropic(
    api_key="your-anthropic-key",
    base_url="https://api.promptcrunch.dev",
    default_headers={
        "X-PromptCrunch-Key": "pc_live_...",
    },
)

response = client.messages.create(
    model="claude-sonnet-4-5-20251001",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.content[0].text)
from openai import OpenAI

client = OpenAI(
    api_key="your-openai-key",
    base_url="https://api.promptcrunch.dev/v1",
    default_headers={
        "X-PromptCrunch-Key": "pc_live_...",
    },
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
# Anthropic Messages API
curl https://api.promptcrunch.dev/v1/messages \
  -H "x-api-key: your-anthropic-key" \
  -H "X-PromptCrunch-Key: pc_live_..." \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5-20251001",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Authentication

Every request to Prompt Crunch needs two API keys:

X-PromptCrunch-Key
Required
Your Prompt Crunch API key. Format: pc_live_<64 hex chars>. Find it in your dashboard. This authenticates you to Prompt Crunch.
x-api-key (Anthropic)
Authorization (OpenAI)
Required
Your provider API key. We pass it straight through to Anthropic or OpenAI. Never stored, never logged.
Your provider keys stay yours. Prompt Crunch forwards them unchanged and never persists them in any form.

Base URL

Prompt Crunch exposes three endpoints that mirror Anthropic and OpenAI exactly. Point your SDK's base URL at whichever matches your provider:

# Anthropic SDK
base_url = "https://api.promptcrunch.dev"

# OpenAI SDK (note the /v1 suffix, mirrors openai.com)
base_url = "https://api.promptcrunch.dev/v1"

Anthropic Messages

POST /v1/messages

Drop-in replacement for Anthropic's /v1/messages endpoint. Supports every field the Anthropic API supports, including model, messages, system, max_tokens, temperature, stream, tools, tool_choice, stop_sequences, top_p, top_k, metadata, and vision content blocks.

Headers

x-api-key
Required
Your Anthropic API key (sk-ant-...).
X-PromptCrunch-Key
Required
Your Prompt Crunch API key.
anthropic-version
string
Anthropic API version. Defaults to 2023-06-01. Passed through to Anthropic.
anthropic-beta
string
Any Anthropic beta flags. Passed through unchanged.
X-PromptCrunch-Bypass
boolean
Set to true to skip optimization for this request. See Bypass.

Example request

curl https://api.promptcrunch.dev/v1/messages \
  -H "x-api-key: sk-ant-..." \
  -H "X-PromptCrunch-Key: pc_live_..." \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5-20251001",
    "max_tokens": 2048,
    "system": "You are a senior engineer.",
    "messages": [
      {"role": "user", "content": "Explain consistent hashing."}
    ]
  }'

Example response

{
  "id": "msg_01Wn7EE8WV4ehpNSXssYudKh",
  "type": "message",
  "role": "assistant",
  "model": "claude-sonnet-4-5-20251001",
  "content": [
    {"type": "text", "text": "Consistent hashing is a..."}
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 489,
    "output_tokens": 362
  },
  "_promptcrunch": {
    "status": "passthrough",
    "original_tokens": 506,
    "tokens_saved": 0,
    "savings_pct": 0,
    "credit_remaining_usd": 13.47
  }
}

OpenAI Chat Completions

POST /v1/chat/completions

Drop-in replacement for OpenAI's /v1/chat/completions. Supports all standard fields including model, messages, max_tokens, max_completion_tokens, temperature, tools, tool_choice, response_format, stream, and reasoning models like gpt-5.4-thinking and o3.

Reasoning models: For gpt-5.4-thinking, o1, o3, etc., always use max_completion_tokens with plenty of headroom (4k+). These models spend tokens on internal reasoning before producing visible output.

Headers

Authorization
Required
Bearer token with your OpenAI API key: Authorization: Bearer sk-...
X-PromptCrunch-Key
Required
Your Prompt Crunch API key.
X-PromptCrunch-Bypass
boolean
Set to true to skip optimization for this request.

Example request

curl https://api.promptcrunch.dev/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "X-PromptCrunch-Key: pc_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is a Merkle tree?"}
    ],
    "max_completion_tokens": 4096
  }'

Example response

{
  "id": "chatcmpl-AbCdEf...",
  "object": "chat.completion",
  "created": 1712345678,
  "model": "gpt-5.4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A Merkle tree is a..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 111,
    "completion_tokens": 453,
    "total_tokens": 564
  },
  "_promptcrunch": {
    "status": "passthrough",
    "original_tokens": 138,
    "tokens_saved": 0,
    "savings_pct": 0
  }
}

OpenAI Responses API

POST /v1/responses

Proxy for OpenAI's newer Responses API, used by gpt-5-pro, gpt-5.4-pro, and reasoning variants. Same authentication as Chat Completions. The Responses format uses an input field instead of messages.

Example request

curl https://api.promptcrunch.dev/v1/responses \
  -H "Authorization: Bearer sk-..." \
  -H "X-PromptCrunch-Key: pc_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4-pro",
    "input": "Explain how Raft consensus works."
  }'

Streaming

Pass stream: true in your request body. Your existing SDK code doesn't need any modifications.

import anthropic

client = anthropic.Anthropic(
    api_key="sk-ant-...",
    base_url="https://api.promptcrunch.dev",
    default_headers={"X-PromptCrunch-Key": "pc_live_..."},
)

with client.messages.stream(
    model="claude-sonnet-4-6-20251001",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
from openai import OpenAI

client = OpenAI(
    api_key="sk-...",
    base_url="https://api.promptcrunch.dev/v1",
    default_headers={"X-PromptCrunch-Key": "pc_live_..."},
)

stream = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
curl https://api.promptcrunch.dev/v1/messages \
  -H "x-api-key: sk-ant-..." \
  -H "X-PromptCrunch-Key: pc_live_..." \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -N \
  -d '{
    "model": "claude-sonnet-4-6-20251001",
    "max_tokens": 1024,
    "stream": true,
    "messages": [{"role": "user", "content": "Tell me a story."}]
  }'
Note: When streaming, optimization metadata arrives in the response headers instead of the _promptcrunch JSON field.

Response metadata

Every non-streaming response includes a _promptcrunch object appended to the JSON. This tells you what Prompt Crunch did with your request.

"_promptcrunch": {
  "status": "optimized",
  "original_tokens": 12796,
  "tokens_saved": 8362,
  "savings_pct": 65.3,
  "prompt_score": {"score": 8, "reason": "Clear and specific"},
  "credit_remaining_usd": 13.47
}
status
string
One of: optimized (we reduced the token count), passthrough (no optimization applied), bypass (skipped via header), error (optimization failed, your original messages were forwarded).
original_tokens
integer
Estimated token count of your original request.
tokens_saved
integer
Number of input tokens saved by optimization on this request.
savings_pct
number
Savings as a percentage of original tokens (0-100).
prompt_score
object
Optional. If prompt scoring is enabled, returns a {score, reason} object rating prompt quality 1-10.
credit_remaining_usd
number
Your remaining trial/purchased credit balance in USD.

Response headers

Every response carries the same metadata on response headers, prefixed with x-promptcrunch-. Use these when you can't or don't want to parse the JSON body.

x-promptcrunch-status
string
One of compacted, passthrough, bypass, error.
x-promptcrunch-original-tokens
integer
Original input token count before optimization.
x-promptcrunch-optimized-tokens
integer
Input token count after optimization.
x-promptcrunch-saved
integer
Tokens saved on this request (original minus optimized).
x-promptcrunch-prompt-score
integer
Optional 1-10 prompt quality score if scoring is enabled.

Bypass optimization

Sometimes you want to skip optimization entirely: short prompts, A/B testing, or debugging. Pass the bypass header:

"X-PromptCrunch-Bypass": "true"

The request passes straight through to the provider with no optimization and no processing overhead. Bypassed requests are never billed.

Zero-retention mode

By default, Prompt Crunch holds a small encrypted optimization state in memory for up to one hour so repeat conversations don't reprocess from scratch. For teams handling regulated data, you can flip on zero-retention mode in your dashboard. When enabled:

  • No conversation content is cached on our servers, not even encrypted
  • Every request is independently optimized
  • For incremental optimization across turns, you pass a client state blob yourself
Zero retention is a per-user setting. Enable it once in the dashboard and it applies to every request made with your API key. No headers or code changes required.

Client state blob

In zero-retention mode (or any time you want stateless incremental optimization), you can echo back a compact, encrypted state blob between turns. The blob is HMAC-signed, gzipped, and contains no plaintext conversation data.

How it works

  1. Make your first request. The response includes _promptcrunch.state, a short opaque string.
  2. Store it client-side.
  3. On your next request, send it back via the X-PromptCrunch-State header.
  4. Prompt Crunch picks up where you left off.
# Turn 1: no state yet
resp = client.messages.create(
    model="claude-sonnet-4-6-20251001",
    max_tokens=1024,
    messages=conversation,
)
state = resp.model_extra.get("_promptcrunch", {}).get("state")

# Turn 2+: pass the state back
client = anthropic.Anthropic(
    api_key="sk-ant-...",
    base_url="https://api.promptcrunch.dev",
    default_headers={
        "X-PromptCrunch-Key": "pc_live_...",
        "X-PromptCrunch-State": state,
    },
)
resp = client.messages.create(...)
# Turn 1: no state yet
resp = client.chat.completions.create(
    model="gpt-5.4",
    messages=conversation,
)
state = resp.model_extra.get("_promptcrunch", {}).get("state")

# Turn 2+: pass the state back
client = OpenAI(
    api_key="sk-...",
    base_url="https://api.promptcrunch.dev/v1",
    default_headers={
        "X-PromptCrunch-Key": "pc_live_...",
        "X-PromptCrunch-State": state,
    },
)
resp = client.chat.completions.create(...)
# Turn 1: capture the state from the response
STATE=$(curl -s https://api.promptcrunch.dev/v1/messages \
  -H "x-api-key: sk-ant-..." \
  -H "X-PromptCrunch-Key: pc_live_..." \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-sonnet-4-6-20251001","max_tokens":1024,"messages":[...]}' \
  | jq -r '._promptcrunch.state')

# Turn 2+: send it back via header
curl https://api.promptcrunch.dev/v1/messages \
  -H "x-api-key: sk-ant-..." \
  -H "X-PromptCrunch-Key: pc_live_..." \
  -H "X-PromptCrunch-State: $STATE" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-sonnet-4-6-20251001","max_tokens":1024,"messages":[...]}'
Lose the blob? No problem. We reprocess from scratch on the next request. You won't lose any data, just miss the incremental speedup on that one call.

Get account Auth

GET /api/me

Returns your profile, trial credit balance, usage stats, and billing breakdown for the last 30 days. Requires X-PromptCrunch-Key header or active session cookie.

curl https://api.promptcrunch.dev/api/me \
  -H "X-PromptCrunch-Key: pc_live_..."

Response

{
  "user": {
    "id": 42,
    "email": "you@example.com",
    "name": "Jane Doe",
    "plan_status": "trial",
    "trial_credit_usd": 5.00,
    "trial_used_usd": 0.53,
    "trial_credit_remaining_usd": 4.47,
    "billing_rate": 0.5,
    "zero_retention": false
  },
  "stats": {
    "total_requests": 874,
    "total_tokens_saved": 10800000,
    "savings_percentage": 62.3
  },
  "billing": {
    "your_savings_usd": 13.90,
    "promptcrunch_fee_usd": 13.90,
    "gross_savings_usd": 27.80
  }
}

Usage history Auth

GET /api/usage?limit=50

Returns your recent request history. Supports limit (max 500). Each entry includes model, token counts, optimization status, and timestamp.

GET /api/usage/daily?days=30

Daily rollup. Supports days (max 365). Useful for dashboards and usage graphs.

GET /api/usage/billing?days=30

Billing summary with dollar-denominated savings and fees for the period.

Warnings

Every proxy response carries a _promptcrunch.warnings array with any actionable signals about your account. Warnings are non-blocking: your request still succeeds, but we flag things you probably want to know about (like running low on credit).

Where warnings appear

  • JSON body: _promptcrunch.warnings with full list of code, message, and action URL
  • Response headers: X-PromptCrunch-Warning (primary code) and X-PromptCrunch-Warning-Message (human-readable text)
  • Dashboard: a persistent banner at the top of your dashboard
  • Email: billing warnings are sent by email, rate-limited to once per 24 hours per type

Warning codes

credit_low
warning
Your credit balance dropped below $1.00. Top up to keep optimization running.
credit_exhausted
warning
Your credit is at $0.00. Requests are still being forwarded to your provider, but without optimization. You're paying full price for every token until you top up.
auto_topup_failed
warning
Your last auto-top-up payment failed. Update your payment method to resume automatic billing.
optimization_failed
info
The optimization pipeline errored on this request, so your original messages were forwarded unchanged. Usually transient. The _promptcrunch.error field contains details.

Example response with a warning

{
  "id": "msg_01...",
  "type": "message",
  "content": [{"type": "text", "text": "..."}],
  "_promptcrunch": {
    "status": "optimized",
    "original_tokens": 12796,
    "tokens_saved": 8362,
    "savings_pct": 65.3,
    "credit_remaining_usd": 0.47,
    "warnings": [
      {
        "code": "credit_low",
        "message": "Credit low: $0.47 remaining. Top up to keep optimization running.",
        "severity": "warning",
        "action_url": "https://promptcrunch.dev/my#billing"
      }
    ]
  }
}

Example client code

A quick pattern for handling warnings in your client:

def call_with_warnings(messages):
    resp = client.messages.create(model="...", messages=messages)
    meta = resp.model_extra.get("_promptcrunch", {})
    for warning in meta.get("warnings", []):
        logger.warning(f"Prompt Crunch: {warning['code']} - {warning['message']}")
        if warning["code"] == "credit_exhausted":
            alert_ops_team("LLM proxy credit exhausted")
    return resp
Backward compatible. If your client doesn't look at _promptcrunch.warnings, nothing breaks. Warnings are additive signals, never errors.

Status codes

Prompt Crunch uses conventional HTTP status codes. Any status code returned by the upstream provider is passed back to you unchanged, along with the provider's original error body.

200 OK
success
Request processed successfully. Check _promptcrunch.status for optimization details.
400 Bad Request
client
Invalid JSON, missing required fields (model, messages), or malformed request body.
401 Unauthorized
client
Missing or invalid X-PromptCrunch-Key, or missing provider auth header (x-api-key / Authorization).
403 Forbidden
client
Account inactive (email not verified) or trial credit exhausted without a payment method.
413 Payload Too Large
client
Request body exceeds 10 MB limit.
429 Too Many Requests
client
Rate limit exceeded. See Rate limits.
502 Bad Gateway
server
Upstream provider returned an error. The provider's error body is forwarded unchanged.
504 Gateway Timeout
server
Upstream provider took longer than our 300s timeout.

Error handling

If our optimization pipeline fails for any reason, we never drop your request. We forward your original, uncompressed messages to the provider and return the response as normal, with _promptcrunch.status: "error". You'll miss the savings on that one request, but your application keeps working.

Errors from the upstream provider are passed back verbatim. If Anthropic returns a 429 rate-limit error, you'll see the exact Anthropic error body with HTTP 429.

{
  "detail": "Invalid or inactive Prompt Crunch API key"
}

Rate limits

Prompt Crunch applies its own per-user rate limits on top of whatever limits your provider enforces:

Proxy requests
60/min
Authenticated requests per user per minute. Exceeding returns 429.
Key rotation
1/min
Per user. Prevents accidental repeated rotations.
Signup
3/hour
Per IP. Blocks basic abuse.

Need higher limits? Get in touch.

Supported models

Prompt Crunch is model-agnostic. Any model accessible via the Anthropic or OpenAI APIs works. Below are the models we've benchmarked and explicitly tuned for:

Anthropic

  • claude-opus-4-6-*: best quality, highest savings
  • claude-sonnet-4-6-*: recommended for most workloads
  • claude-haiku-4-5-*: fastest, lowest cost per token

OpenAI

  • gpt-5.4, gpt-5.4-thinking, gpt-5.4-pro
  • gpt-5.3, gpt-5.3-codex
  • gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4
  • o3, o4-mini (reasoning models)
Not on the list? Try it anyway. If the provider supports it, we proxy it.