API Documentation | Prompt Crunch

Introduction

Prompt Crunch sits between your application and the LLM provider. Every request you send through us is optimized to remove redundant conversational history before being forwarded. You get back the exact same response your model would normally produce, billed for fewer input tokens.

The core product is API-compatible with both Anthropic and OpenAI. If you already have code that calls anthropic.messages.create() or openai.chat.completions.create(), you only need to change the base URL and add one header. That's it.

Quickstart

Three steps to start saving on your API bill:

Sign up and get your Prompt Crunch API key (format: pc_live_...)
Change your SDK's base_url to https://api.promptcrunch.dev
Add the X-PromptCrunch-Key header to every request

import anthropic

client = anthropic.Anthropic(
    api_key="your-anthropic-key",
    base_url="https://api.promptcrunch.dev",
    default_headers={
        "X-PromptCrunch-Key": "pc_live_...",
    },
)

response = client.messages.create(
    model="claude-sonnet-4-5-20251001",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.content[0].text)

from openai import OpenAI

client = OpenAI(
    api_key="your-openai-key",
    base_url="https://api.promptcrunch.dev/v1",
    default_headers={
        "X-PromptCrunch-Key": "pc_live_...",
    },
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

# Anthropic Messages API
curl https://api.promptcrunch.dev/v1/messages \
  -H "x-api-key: your-anthropic-key" \
  -H "X-PromptCrunch-Key: pc_live_..." \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5-20251001",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Authentication

Every request to Prompt Crunch needs two API keys:

X-PromptCrunch-Key

Required

Your Prompt Crunch API key. Format: pc_live_<64 hex chars>. Find it in your dashboard. This authenticates you to Prompt Crunch.

x-api-key (Anthropic)
Authorization (OpenAI)

Required

Your provider API key. We pass it straight through to Anthropic or OpenAI. Never stored, never logged.

Your provider keys stay yours. Prompt Crunch forwards them unchanged and never persists them in any form.

Base URL

Prompt Crunch exposes three endpoints that mirror Anthropic and OpenAI exactly. Point your SDK's base URL at whichever matches your provider:

# Anthropic SDK
base_url = "https://api.promptcrunch.dev"

# OpenAI SDK (note the /v1 suffix, mirrors openai.com)
base_url = "https://api.promptcrunch.dev/v1"

Anthropic Messages

POST /v1/messages

Drop-in replacement for Anthropic's /v1/messages endpoint. Supports every field the Anthropic API supports, including model, messages, system, max_tokens, temperature, stream, tools, tool_choice, stop_sequences, top_p, top_k, metadata, and vision content blocks.

Headers

x-api-key

Required

Your Anthropic API key (sk-ant-...).

X-PromptCrunch-Key

Required

Your Prompt Crunch API key.

anthropic-version

string

Anthropic API version. Defaults to 2023-06-01. Passed through to Anthropic.

anthropic-beta

string

Any Anthropic beta flags. Passed through unchanged.

X-PromptCrunch-Bypass

boolean

Set to true to skip optimization for this request. See Bypass.

Example request

curl https://api.promptcrunch.dev/v1/messages \
  -H "x-api-key: sk-ant-..." \
  -H "X-PromptCrunch-Key: pc_live_..." \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5-20251001",
    "max_tokens": 2048,
    "system": "You are a senior engineer.",
    "messages": [
      {"role": "user", "content": "Explain consistent hashing."}
    ]
  }'

Example response

{
  "id": "msg_01Wn7EE8WV4ehpNSXssYudKh",
  "type": "message",
  "role": "assistant",
  "model": "claude-sonnet-4-5-20251001",
  "content": [
    {"type": "text", "text": "Consistent hashing is a..."}
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 489,
    "output_tokens": 362
  },
  "_promptcrunch": {
    "status": "passthrough",
    "original_tokens": 506,
    "tokens_saved": 0,
    "savings_pct": 0,
    "credit_remaining_usd": 13.47
  }
}

OpenAI Chat Completions

POST /v1/chat/completions

Drop-in replacement for OpenAI's /v1/chat/completions. Supports all standard fields including model, messages, max_tokens, max_completion_tokens, temperature, tools, tool_choice, response_format, stream, and reasoning models like gpt-5.4-thinking and o3.

Reasoning models: For gpt-5.4-thinking, o1, o3, etc., always use max_completion_tokens with plenty of headroom (4k+). These models spend tokens on internal reasoning before producing visible output.

Headers

Authorization

Required

Bearer token with your OpenAI API key: Authorization: Bearer sk-...

X-PromptCrunch-Key

Required

Your Prompt Crunch API key.

X-PromptCrunch-Bypass

boolean

Set to true to skip optimization for this request.

Example request

curl https://api.promptcrunch.dev/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "X-PromptCrunch-Key: pc_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is a Merkle tree?"}
    ],
    "max_completion_tokens": 4096
  }'

Example response

{
  "id": "chatcmpl-AbCdEf...",
  "object": "chat.completion",
  "created": 1712345678,
  "model": "gpt-5.4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A Merkle tree is a..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 111,
    "completion_tokens": 453,
    "total_tokens": 564
  },
  "_promptcrunch": {
    "status": "passthrough",
    "original_tokens": 138,
    "tokens_saved": 0,
    "savings_pct": 0
  }
}

OpenAI Responses API

POST /v1/responses

Proxy for OpenAI's newer Responses API, used by gpt-5-pro, gpt-5.4-pro, and reasoning variants. Same authentication as Chat Completions. The Responses format uses an input field instead of messages.

Example request

curl https://api.promptcrunch.dev/v1/responses \
  -H "Authorization: Bearer sk-..." \
  -H "X-PromptCrunch-Key: pc_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4-pro",
    "input": "Explain how Raft consensus works."
  }'

Streaming

Pass stream: true in your request body. Your existing SDK code doesn't need any modifications.

import anthropic

client = anthropic.Anthropic(
    api_key="sk-ant-...",
    base_url="https://api.promptcrunch.dev",
    default_headers={"X-PromptCrunch-Key": "pc_live_..."},
)

with client.messages.stream(
    model="claude-sonnet-4-6-20251001",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

from openai import OpenAI

client = OpenAI(
    api_key="sk-...",
    base_url="https://api.promptcrunch.dev/v1",
    default_headers={"X-PromptCrunch-Key": "pc_live_..."},
)

stream = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

curl https://api.promptcrunch.dev/v1/messages \
  -H "x-api-key: sk-ant-..." \
  -H "X-PromptCrunch-Key: pc_live_..." \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -N \
  -d '{
    "model": "claude-sonnet-4-6-20251001",
    "max_tokens": 1024,
    "stream": true,
    "messages": [{"role": "user", "content": "Tell me a story."}]
  }'

Note: When streaming, optimization metadata arrives in the response headers instead of the _promptcrunch JSON field.

Response metadata

Every non-streaming response includes a _promptcrunch object appended to the JSON. This tells you what Prompt Crunch did with your request.

"_promptcrunch": {
  "status": "optimized",
  "original_tokens": 12796,
  "tokens_saved": 8362,
  "savings_pct": 65.3,
  "prompt_score": {"score": 8, "reason": "Clear and specific"},
  "credit_remaining_usd": 13.47
}

status

string

One of: optimized (we reduced the token count), passthrough (no optimization applied), bypass (skipped via header), error (optimization failed, your original messages were forwarded).

original_tokens

integer

Estimated token count of your original request.

tokens_saved

integer

Number of input tokens saved by optimization on this request.

savings_pct

number

Savings as a percentage of original tokens (0-100).

prompt_score

object

Optional. If prompt scoring is enabled, returns a {score, reason} object rating prompt quality 1-10.

credit_remaining_usd

number

Your remaining trial/purchased credit balance in USD.

Response headers

Every response carries the same metadata on response headers, prefixed with x-promptcrunch-. Use these when you can't or don't want to parse the JSON body.

x-promptcrunch-status

string

One of compacted, passthrough, bypass, error.

x-promptcrunch-original-tokens

integer

Original input token count before optimization.

x-promptcrunch-optimized-tokens

integer

Input token count after optimization.

x-promptcrunch-saved

integer

Tokens saved on this request (original minus optimized).

x-promptcrunch-prompt-score

integer

Optional 1-10 prompt quality score if scoring is enabled.

Bypass optimization

Sometimes you want to skip optimization entirely: short prompts, A/B testing, or debugging. Pass the bypass header:

"X-PromptCrunch-Bypass": "true"

The request passes straight through to the provider with no optimization and no processing overhead. Bypassed requests are never billed.

Zero-retention mode

By default, Prompt Crunch holds a small encrypted optimization state in memory for up to one hour so repeat conversations don't reprocess from scratch. For teams handling regulated data, you can flip on zero-retention mode in your dashboard. When enabled:

No conversation content is cached on our servers, not even encrypted
Every request is independently optimized
For incremental optimization across turns, you pass a client state blob yourself

Zero retention is a per-user setting. Enable it once in the dashboard and it applies to every request made with your API key. No headers or code changes required.

Client state blob

In zero-retention mode (or any time you want stateless incremental optimization), you can echo back a compact, encrypted state blob between turns. The blob is HMAC-signed, gzipped, and contains no plaintext conversation data.

How it works

Make your first request. The response includes _promptcrunch.state, a short opaque string.
Store it client-side.
On your next request, send it back via the X-PromptCrunch-State header.
Prompt Crunch picks up where you left off.

# Turn 1: no state yet
resp = client.messages.create(
    model="claude-sonnet-4-6-20251001",
    max_tokens=1024,
    messages=conversation,
)
state = resp.model_extra.get("_promptcrunch", {}).get("state")

# Turn 2+: pass the state back
client = anthropic.Anthropic(
    api_key="sk-ant-...",
    base_url="https://api.promptcrunch.dev",
    default_headers={
        "X-PromptCrunch-Key": "pc_live_...",
        "X-PromptCrunch-State": state,
    },
)
resp = client.messages.create(...)

# Turn 1: no state yet
resp = client.chat.completions.create(
    model="gpt-5.4",
    messages=conversation,
)
state = resp.model_extra.get("_promptcrunch", {}).get("state")

# Turn 2+: pass the state back
client = OpenAI(
    api_key="sk-...",
    base_url="https://api.promptcrunch.dev/v1",
    default_headers={
        "X-PromptCrunch-Key": "pc_live_...",
        "X-PromptCrunch-State": state,
    },
)
resp = client.chat.completions.create(...)

# Turn 1: capture the state from the response
STATE=$(curl -s https://api.promptcrunch.dev/v1/messages \
  -H "x-api-key: sk-ant-..." \
  -H "X-PromptCrunch-Key: pc_live_..." \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-sonnet-4-6-20251001","max_tokens":1024,"messages":[...]}' \
  | jq -r '._promptcrunch.state')

# Turn 2+: send it back via header
curl https://api.promptcrunch.dev/v1/messages \
  -H "x-api-key: sk-ant-..." \
  -H "X-PromptCrunch-Key: pc_live_..." \
  -H "X-PromptCrunch-State: $STATE" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-sonnet-4-6-20251001","max_tokens":1024,"messages":[...]}'

Lose the blob? No problem. We reprocess from scratch on the next request. You won't lose any data, just miss the incremental speedup on that one call.

Get account Auth

GET /api/me

Returns your profile, trial credit balance, usage stats, and billing breakdown for the last 30 days. Requires X-PromptCrunch-Key header or active session cookie.

curl https://api.promptcrunch.dev/api/me \
  -H "X-PromptCrunch-Key: pc_live_..."

Response

{
  "user": {
    "id": 42,
    "email": "you@example.com",
    "name": "Jane Doe",
    "plan_status": "trial",
    "trial_credit_usd": 5.00,
    "trial_used_usd": 0.53,
    "trial_credit_remaining_usd": 4.47,
    "billing_rate": 0.5,
    "zero_retention": false
  },
  "stats": {
    "total_requests": 874,
    "total_tokens_saved": 10800000,
    "savings_percentage": 62.3
  },
  "billing": {
    "your_savings_usd": 13.90,
    "promptcrunch_fee_usd": 13.90,
    "gross_savings_usd": 27.80
  }
}

Usage history Auth

GET /api/usage?limit=50

Returns your recent request history. Supports limit (max 500). Each entry includes model, token counts, optimization status, and timestamp.

GET /api/usage/daily?days=30

Daily rollup. Supports days (max 365). Useful for dashboards and usage graphs.

GET /api/usage/billing?days=30

Billing summary with dollar-denominated savings and fees for the period.

Warnings

Every proxy response carries a _promptcrunch.warnings array with any actionable signals about your account. Warnings are non-blocking: your request still succeeds, but we flag things you probably want to know about (like running low on credit).

Where warnings appear

JSON body: _promptcrunch.warnings with full list of code, message, and action URL
Response headers: X-PromptCrunch-Warning (primary code) and X-PromptCrunch-Warning-Message (human-readable text)
Dashboard: a persistent banner at the top of your dashboard
Email: billing warnings are sent by email, rate-limited to once per 24 hours per type

Warning codes

credit_low

warning

Your credit balance dropped below $1.00. Top up to keep optimization running.

credit_exhausted

warning

Your credit is at $0.00. Requests are still being forwarded to your provider, but without optimization. You're paying full price for every token until you top up.

auto_topup_failed

warning

Your last auto-top-up payment failed. Update your payment method to resume automatic billing.

optimization_failed

info

The optimization pipeline errored on this request, so your original messages were forwarded unchanged. Usually transient. The _promptcrunch.error field contains details.

Example response with a warning

{
  "id": "msg_01...",
  "type": "message",
  "content": [{"type": "text", "text": "..."}],
  "_promptcrunch": {
    "status": "optimized",
    "original_tokens": 12796,
    "tokens_saved": 8362,
    "savings_pct": 65.3,
    "credit_remaining_usd": 0.47,
    "warnings": [
      {
        "code": "credit_low",
        "message": "Credit low: $0.47 remaining. Top up to keep optimization running.",
        "severity": "warning",
        "action_url": "https://promptcrunch.dev/my#billing"
      }
    ]
  }
}

Example client code

A quick pattern for handling warnings in your client:

def call_with_warnings(messages):
    resp = client.messages.create(model="...", messages=messages)
    meta = resp.model_extra.get("_promptcrunch", {})
    for warning in meta.get("warnings", []):
        logger.warning(f"Prompt Crunch: {warning['code']} - {warning['message']}")
        if warning["code"] == "credit_exhausted":
            alert_ops_team("LLM proxy credit exhausted")
    return resp

Backward compatible. If your client doesn't look at _promptcrunch.warnings, nothing breaks. Warnings are additive signals, never errors.

Status codes

Prompt Crunch uses conventional HTTP status codes. Any status code returned by the upstream provider is passed back to you unchanged, along with the provider's original error body.

200 OK

success

Request processed successfully. Check _promptcrunch.status for optimization details.

400 Bad Request

client

Invalid JSON, missing required fields (model, messages), or malformed request body.

401 Unauthorized

client

Missing or invalid X-PromptCrunch-Key, or missing provider auth header (x-api-key / Authorization).

403 Forbidden

client

Account inactive (email not verified) or trial credit exhausted without a payment method.

413 Payload Too Large

client

Request body exceeds 10 MB limit.

429 Too Many Requests

client

Rate limit exceeded. See Rate limits.

502 Bad Gateway

server

Upstream provider returned an error. The provider's error body is forwarded unchanged.

504 Gateway Timeout

server

Upstream provider took longer than our 300s timeout.

Error handling

If our optimization pipeline fails for any reason, we never drop your request. We forward your original, uncompressed messages to the provider and return the response as normal, with _promptcrunch.status: "error". You'll miss the savings on that one request, but your application keeps working.

Errors from the upstream provider are passed back verbatim. If Anthropic returns a 429 rate-limit error, you'll see the exact Anthropic error body with HTTP 429.

{
  "detail": "Invalid or inactive Prompt Crunch API key"
}

Rate limits

Prompt Crunch applies its own per-user rate limits on top of whatever limits your provider enforces:

Proxy requests

60/min

Authenticated requests per user per minute. Exceeding returns 429.

Key rotation

1/min

Per user. Prevents accidental repeated rotations.

Signup

3/hour

Per IP. Blocks basic abuse.

Need higher limits? Get in touch.

Supported models

Prompt Crunch is model-agnostic. Any model accessible via the Anthropic or OpenAI APIs works. Below are the models we've benchmarked and explicitly tuned for:

Anthropic

claude-opus-4-6-*: best quality, highest savings
claude-sonnet-4-6-*: recommended for most workloads
claude-haiku-4-5-*: fastest, lowest cost per token

OpenAI

gpt-5.4, gpt-5.4-thinking, gpt-5.4-pro
gpt-5.3, gpt-5.3-codex
gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4
o3, o4-mini (reasoning models)

Not on the list? Try it anyway. If the provider supports it, we proxy it.