Base URL switch
No SDK rewrite
Inference Spend Autopilot
CrossCompute is an OpenAI-compatible gateway for teams and autonomous agents. Route each request to the right model, enforce quota in credits, and let agents self-top-up over x402 on Base.
Base URL switch
No SDK rewrite
Routing decisions
Per request in milliseconds
Agent payment rail
x402 + USDC on Base
Why this wins
Instead of pinning one expensive model for all prompts, CrossCompute routes simple prompts to cheap models and reserves premium models for hard prompts.
Agents can monitor credits, trigger `/api/agents/topup`, complete x402 payment, and continue inference without manual ops.
Every call returns routing and billing headers, so teams and agents can verify model, price rule, and charged credits in real time.
Agent Loop
Agent reads `/agent-skill.md`, `/openapi.json`, and `/api-parameters.md`.
Agent registers and receives `cc_agent_*` key scoped to tracked credits.
Agent pays stablecoin via x402 challenge-response and gets quota instantly.
Agent calls `/v1/chat/completions`, then checks usage and tops up again when low.
Evidence
Up to 98%
FrugalGPT reports up to 98% cost reduction at matched top-model quality in benchmark settings.
arXiv: FrugalGPT2x+
RouteLLM reports more than 2x cost reduction while maintaining response quality.
arXiv: RouteLLM30%–70%
Public routing benchmarks commonly show major cost cuts at similar quality targets.
RouteLLM benchmarks100x claim
NotDiamond reports large production improvements in cost-performance for routing workloads.
NotDiamondPricing references: OpenAI, Anthropic, Google Gemini, DeepSeek. Numbers vary over time and by region, caching, and volume discounts.
Machine-readable Surface
Switch one base URL, keep your OpenAI-compatible calls, and let CrossCompute optimize cost on every request.