Documentation

API Documentation

One base URL, all major LLM providers. OpenAI-compatible protocol means your existing code works instantly.

https://api.chuizi.ai/v1Get API Key

Quick Start

Get up and running with chuizi.ai in under a minute. Use any OpenAI-compatible SDK — just change the base URL and API key.

1

Create an account

Sign up at chuizi.ai and add credits to your balance.

2

Generate an API key

Go to Console > API Keys and create a new key. Copy it immediately — it is shown only once.

3

Make your first request

main.py
python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Authentication

All API requests require a valid API key. Keys use the ck- prefix followed by 32 alphanumeric characters (e.g. ck-a1b2c3d4e5f6...). We store only the SHA-256 hash of each key — the full key is shown exactly once at creation time.

chuizi.ai accepts both authentication header formats. Use whichever your SDK expects:

headers
# OpenAI convention (works with all protocols)
Authorization: Bearer ck-your-key-here

# Anthropic convention (also accepted on all protocols)
x-api-key: ck-your-key-here
Security note: Your key is hashed with SHA-256 before storage. If you lose it, generate a new one — we cannot recover the original. Always store keys in environment variables, never in source code.

Model Naming

All models use the provider/model format (e.g. anthropic/claude-sonnet-4-6). This avoids naming collisions across providers and makes routing explicit. Bare model names (e.g. claude-sonnet-4-6) are accepted as aliases for convenience.

Full NameAliasProvider
anthropic/claude-opus-4-6claude-opus-4-6Anthropic
anthropic/claude-opus-4claude-opus-4Anthropic
anthropic/claude-sonnet-4-6claude-sonnet-4-6Anthropic
anthropic/claude-sonnet-4claude-sonnet-4Anthropic
anthropic/claude-haiku-4-5claude-haiku-4-5Anthropic
openai/gpt-4ogpt-4oOpenAI
openai/gpt-4.1gpt-4.1OpenAI
openai/gpt-4.1-minigpt-4.1-miniOpenAI
openai/o3o3OpenAI
openai/o4-minio4-miniOpenAI
google/gemini-2.5-progemini-2.5-proGoogle
google/gemini-2.5-flashgemini-2.5-flashGoogle
deepseek/deepseek-chatdeepseek-chatDeepSeek
deepseek/deepseek-reasonerdeepseek-reasonerDeepSeek

This is a representative sample. See the full model list for all available models with real-time pricing.

Note: When using the /anthropic protocol, use bare Anthropic model names without the anthropic/ prefix (e.g. claude-sonnet-4-6 instead of anthropic/claude-sonnet-4-6), matching the official Anthropic API behavior.

OpenAI-Compatible Protocol

The /v1/* endpoints implement the full OpenAI Chat Completions API. Any SDK or tool that works with OpenAI will work with chuizi.ai — just change the base URL.

POST
/v1/chat/completions

Chat completions (streaming + non-streaming). Supports tools, vision, JSON mode.

GET
/v1/models

List all available models with pricing and capability metadata.

GET
/v1/generation?id=gen-xxx

Query billing details for a specific request (see Billing section).

GET
/v1/key/info

Current key info: balance, limits, usage (see Key Management section).

Request Schema

request.json
json
{
  "model": "anthropic/claude-sonnet-4-6",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in one paragraph."}
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false,
  "stream_options": {"include_usage": true},
  "tools": [],
  "response_format": {"type": "text"}
}

Response Schema

Responses follow the standard OpenAI format. chuizi.ai adds an x_chuizi extension field with request metadata:

response.json
json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "anthropic/claude-sonnet-4-6",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Quantum computing leverages quantum mechanical phenomena..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 89,
    "total_tokens": 113
  },
  "x_chuizi": {
    "generation_id": "gen-abc123",
    "provider": "anthropic",
    "latency_ms": 1230,
    "cost": "0.00045600"
  }
}
Non-streaming only: The x_chuizi extension is always included in non-streaming responses. For streaming, these fields appear in the final [DONE]-preceding chunk. Use generation_id to query full billing details via the Generation API.

Anthropic Native Protocol

The /anthropic/* endpoints provide a near-passthrough proxy to the Anthropic Messages API. Request and response bodies are forwarded with minimal transformation — we only swap the auth key and extract usage for billing. This means full compatibility with Claude Code, Cursor, Cline, and any tool that speaks the Anthropic API natively.

POST
/anthropic/v1/messages

Anthropic Messages API (streaming + non-streaming). Full feature parity.

GET
/anthropic/v1/models

List Anthropic models available through chuizi.ai.

Claude Code Setup (2 lines)

Add these to your shell profile and restart your terminal. Claude Code works instantly — no format conversion, no SDK changes.

~/.zshrc
export ANTHROPIC_BASE_URL=https://api.chuizi.ai/anthropic
export ANTHROPIC_API_KEY=ck-your-key-here

Native Request Example

Requests use the standard Anthropic Messages API format. Include the anthropic-version header (required by the Anthropic SDK). Both x-api-key and Authorization: Bearer headers are accepted.

terminal
curl https://api.chuizi.ai/anthropic/v1/messages \
  -H "x-api-key: ck-YOUR-KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Anthropic Response Format

response.json
json
{
  "id": "gen-abc123",
  "type": "message",
  "role": "assistant",
  "content": [
    {"type": "text", "text": "Hello! How can I help you today?"}
  ],
  "model": "claude-sonnet-4-6",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 12,
    "output_tokens": 14,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0
  }
}

Anthropic SSE Streaming Events

When "stream": true is set, the response uses Anthropic's native SSE event types. These are passed through unmodified:

Event TypeDescription
message_startInitial message object with model and usage metadata.
content_block_startStart of a content block (text or tool_use).
content_block_deltaIncremental text delta within a content block.
content_block_stopEnd of the current content block.
message_deltaFinal usage stats (output tokens) and stop_reason.
message_stopStream is complete. No more events will follow.
pingKeepalive event. Ignore in application logic.
Passthrough guarantee: The /anthropic path does NOT convert formats. Your request body goes to Anthropic as-is (with auth swapped). The response comes back as-is. If Anthropic adds new features (e.g. new content block types), they work through chuizi.ai immediately with no changes on our side.
Beta features: The anthropic-beta header is passed through to upstream, enabling beta features like prompt caching and extended thinking when supported by the model.

Gemini Native Protocol

Coming Soon The Gemini native protocol is under development and not yet available. The documentation below is provided for reference and will become active in a future release.

The /gemini/* endpoints provide a native Google Gemini API proxy. Requests and responses are forwarded directly to Google AI — we only swap the auth key and extract usage for billing. Ideal for developers using the Google AI SDK.

POST
/gemini/v1beta/models/*/generateContent

Gemini content generation (non-streaming)

POST
/gemini/v1beta/models/*/streamGenerateContent

Gemini streaming content generation

GET
/gemini/v1beta/models

List available Gemini models

Request Example

Requests use the standard Gemini API format. Authentication is provided via x-api-key or Authorization: Bearer header (using your chuizi.ai ck- key, not a Google API key).

terminal
curl https://api.chuizi.ai/gemini/v1beta/models/gemini-2.5-pro:generateContent \
  -H "x-api-key: ck-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "Explain quantum computing in one paragraph."}]}
    ],
    "generationConfig": {
      "temperature": 0.7,
      "maxOutputTokens": 1024
    }
  }'

Response Format

response.json
json
{
  "candidates": [
    {
      "content": {
        "parts": [{"text": "Quantum computing harnesses quantum mechanical phenomena..."}],
        "role": "model"
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 12,
    "candidatesTokenCount": 89,
    "totalTokenCount": 101
  }
}

Streaming Generation

Use the streamGenerateContent endpoint with the ?alt=sse query parameter to receive Server-Sent Events streaming responses.

terminal
curl https://api.chuizi.ai/gemini/v1beta/models/gemini-2.5-pro:streamGenerateContent?alt=sse \
  -H "x-api-key: ck-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "Write a haiku about coding."}]}
    ]
  }'

SDK Examples

gemini.py
python
from google import genai

client = genai.Client(
    api_key="ck-your-key-here",
    http_options={"api_version": "v1beta", "base_url": "https://api.chuizi.ai/gemini"},
)

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="Explain quantum computing in one paragraph.",
)
print(response.text)
Note: The Gemini protocol is in Phase 3 and supports Gemini 2.5 Pro and Gemini 2.5 Flash models. Like all protocols, billing uses the unified Generation API — requests made via /gemini are queryable via GET /v1/generation?id=gen_xxx.

Streaming

Streaming delivers response tokens as they are generated via Server-Sent Events (SSE). This dramatically reduces time-to-first-token and enables real-time UI updates.

OpenAI Streaming

Set "stream": true in your request body. To receive token usage in the final chunk, add "stream_options": {"include_usage": true} — this is critical for tracking costs.

request.json
json
{
  "model": "anthropic/claude-sonnet-4-6",
  "messages": [{"role": "user", "content": "Count to 5"}],
  "stream": true,
  "stream_options": {"include_usage": true}
}

SSE Format

Each event is a line prefixed with data: followed by a JSON object. The stream ends with data: [DONE].

sse-stream
sse
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"1"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":", 2"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":8,"total_tokens":20}}

data: [DONE]

Response Headers

Every response (streaming and non-streaming) includes these headers:

HeaderDescription
X-Request-IdUnique request ID for debugging. Include in support tickets.
Content-Typetext/event-stream for streaming, application/json otherwise.
Cache-Controlno-cache (streaming responses are never cached).

Streaming Code Examples

stream.py
python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
)

stream = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Write a haiku about coding"}],
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
    if hasattr(chunk, "usage") and chunk.usage:
        print(f"\n\nTokens: {chunk.usage.total_tokens}")

When to Use Streaming

Use streaming for interactive UIs, chatbots, and any case where the user is waiting for a response. Use non-streaming for batch processing, background jobs, and when you need the complete response before proceeding. Streaming also provides faster time-to-first-token, which improves perceived latency even when total generation time is the same.

Billing & Usage

chuizi.ai uses a prepaid credit system. Add funds to your account, then usage is deducted per request. Per-request cost transparency with no hidden fees.

How Billing Works

1

Pre-deduct: Before the request is sent upstream, we estimate the cost based on input tokens and freeze that amount from your balance in Redis.

2

Process: The request is forwarded to the upstream provider. Tokens stream back to you in real-time.

3

Reconcile: After the response completes, the actual token usage from the provider determines the final cost. The frozen estimate is released and the real amount is deducted. A BullMQ worker writes the transaction to PostgreSQL.

Pricing Formula

pricing.txt
cost = input_tokens * input_price + output_tokens * output_price

# Example: anthropic/claude-sonnet-4-6
# $3.15 / 1M input, $15.75 / 1M output
#
# 1000 input + 500 output tokens:
# cost = (1000 * 3.15/1e6 + 500 * 15.75/1e6) = $0.011025

Auto-Recharge

Auto-recharge lets you maintain a minimum balance without manual top-ups. Set a balance threshold and recharge amount in your billing settings. When your balance drops below the threshold, your saved payment method is automatically charged for the configured amount.

Generation API

Every request generates a unique gen--prefixed ID. Query full billing details for any request:

GET
/v1/generation?id=gen-xxx

Query detailed billing and metadata for a specific request.

terminal
curl https://api.chuizi.ai/v1/generation?id=gen-abc123 \
  -H "Authorization: Bearer ck-YOUR-KEY"
response.json
json
{
  "data": {
    "id": "gen-abc123",
    "model": "anthropic/claude-sonnet-4-6",
    "provider": "anthropic",
    "input_tokens": 150,
    "output_tokens": 89,
    "native_input_tokens": 150,
    "native_output_tokens": 89,
    "cached_tokens": 0,
    "reasoning_tokens": 0,
    "cost": "0.00045600",
    "upstream_cost": "0.00035077",
    "latency_ms": 1230,
    "generation_time_ms": 1180,
    "finish_reason": "stop",
    "streamed": true,
    "created_at": "2025-01-15T10:30:00Z"
  }
}

Generation Response Fields

FieldTypeDescription
idstringUnique generation ID (gen_ prefix).
modelstringFull provider/model name used for the request.
providerstringUpstream provider (anthropic, openai, google, deepseek).
input_tokensnumberNumber of input (prompt) tokens billed.
output_tokensnumberNumber of output (completion) tokens billed.
native_input_tokensnumberToken count as reported by the upstream provider.
native_output_tokensnumberOutput token count as reported by the upstream provider.
cached_tokensnumberInput tokens served from cache (reduced cost).
reasoning_tokensnumberInternal reasoning tokens (o3, DeepSeek R1). Billed as output.
coststringTotal cost charged to your balance (Decimal string, USD).
upstream_coststringActual upstream provider cost.
latency_msnumberTotal round-trip latency including network (milliseconds).
generation_time_msnumberTime spent generating the response at the provider.
finish_reasonstringWhy generation stopped: stop, length, tool_calls, content_filter.
streamedbooleanWhether the request used streaming.
created_atstringISO 8601 timestamp of request creation.

Cache & Reasoning Tokens

Cache Tokens

Some providers cache repeated prompt prefixes, dramatically reducing cost for subsequent requests with the same system prompt or conversation history. Cached input tokens are billed at a fraction of the normal input price.

ProviderModelsCache DiscountHow to Enable
AnthropicAll Claude models90% off input priceAutomatic for repeated prefixes. Use cache_control for explicit control.
OpenAIGPT-4o, GPT-4.1, o350% off input priceAutomatic for prompts > 1024 tokens with shared prefix.
DeepSeekDeepSeek-Chat, DeepSeek-R190% off input priceAutomatic for repeated prefixes (context caching).
GoogleGemini 2.5 Pro/Flash75% off input priceUse cachedContent API or automatic prefix caching.

Cache tokens appear in the cached_tokens field of the Generation API response and in the Anthropic usage.cache_read_input_tokens field. Keep your system prompt stable across requests to maximize cache hits.

Reasoning Tokens

Reasoning models (OpenAI o3, o3-mini, DeepSeek R1) generate internal "thinking" tokens before producing the final output. These tokens are not visible in the response but are billed as output tokens.

ModelReasoning TokensBilling
openai/o3Up to 100K tokens per requestBilled at output token rate
openai/o4-miniUp to 100K tokens per requestBilled at output token rate
deepseek/deepseek-reasonerVariable (shown in response)Billed at output token rate

Complete Pricing Formula

pricing-formula.txt
cost = (input_tokens - cached_tokens) * input_price
     + cached_tokens * cached_input_price
     + (output_tokens + reasoning_tokens) * output_price

# Example: anthropic/claude-sonnet-4-6 with cache hit
# 2000 input tokens, 1500 cached, 500 output tokens
# Input price: $3.15/1M, Cached: $0.315/1M, Output: $15.75/1M
#
# cost = 500 * 3.15/1e6      # non-cached input: $0.001575
#      + 1500 * 0.315/1e6    # cached input:     $0.0004725
#      + 500 * 15.75/1e6     # output:           $0.007875
#      = $0.009923

Error Handling

All errors follow the OpenAI-compatible error format, regardless of which protocol you use. This makes error handling consistent across /v1 and /anthropic paths.

error.json
json
{
  "error": {
    "message": "Insufficient balance. Please top up at https://chuizi.ai/billing",
    "type": "insufficient_quota",
    "code": "402"
  }
}

Error Codes Reference

CodeTypeDescription
400invalid_request_errorMalformed request body, missing required fields, invalid model name, or unsupported parameters.
401authentication_errorMissing, invalid, or expired API key. Check the Authorization or x-api-key header.
402insufficient_quotaAccount balance too low for the estimated request cost. Top up your balance.
403permission_errorKey is inactive, IP not in whitelist, model not in allowed_models, or daily spend cap reached.
403model_not_allowedYour API key does not have access to this model.
403ip_not_allowedRequest IP is not in the key's whitelist.
403daily_limit_exceededDaily spending limit reached for this key.
404not_foundRequested model or endpoint does not exist. Check model naming format.
429rate_limit_errorRPM or TPM limit exceeded for your tier. Check X-RateLimit-Reset header for retry time.
500internal_errorInternal chuizi.ai server error. Retry with exponential backoff. Contact support if persistent.
502upstream_errorUpstream provider returned an error. The provider may be experiencing issues.
503service_unavailableUpstream provider is overloaded or temporarily unavailable. Retry after a short delay.
Tip: Every error response includes an X-Request-Id header. Include this ID when contacting support for fastest resolution.

Rate Limits

chuizi.ai has no rate limits by default. Pay for what you use with unlimited RPM/TPM. Per-key limits can be configured in the console if needed.

Default Limits

The table below shows default RPM (requests per minute) limits for each tier. Rate limits are per-key configurable. Set custom RPM in your API key settings.

TypeRPM
Free60
Starter120
Pro300
EnterpriseCustom

Rate Limit Headers

Every response includes rate limit information in the headers:

HeaderDescription
X-RateLimit-LimitMaximum requests per minute for this key.
X-RateLimit-RemainingRequests remaining in the current window.
X-RateLimit-ResetUnix timestamp (seconds) when the window resets.

Per-Key Customization

Override your tier's default RPM on individual keys via the Console or API. Set rpm_limit when creating/updating a key. You can also set a daily_limit (in USD) to cap spending — the key returns 403 once the daily cap is reached and resets at midnight UTC.

Claude Code Setup

Claude Code uses the Anthropic API natively. Set two environment variables in your shell profile and you are done — no format conversion, no SDK changes, no wrappers.

~/.zshrc
bash
# Add to ~/.zshrc or ~/.bashrc
export ANTHROPIC_BASE_URL=https://api.chuizi.ai/anthropic
export ANTHROPIC_API_KEY=ck-your-key-here

# Restart terminal, then run claude as usual
source ~/.zshrc
claude
Claude Code reads ANTHROPIC_BASE_URL and ANTHROPIC_API_KEY on startup. After editing your shell profile, restart your terminal or run source ~/.zshrc for the changes to take effect.

Cursor Setup

Cursor supports custom API endpoints for Anthropic models. Configure the base URL in Cursor settings to route all Claude requests through chuizi.ai.

1

Open Cursor Settings (Cmd/Ctrl + ,)

2

Navigate to Models > Anthropic

3

Set API Base URL to https://api.chuizi.ai/anthropic

4

Enter your chuizi.ai API key (ck-...) as the API Key

API Base URL
https://api.chuizi.ai/anthropic

Cline Setup

Cline (VS Code extension) supports custom API providers. Configure it to use chuizi.ai as the Anthropic API endpoint.

1

Open Cline extension settings in VS Code

2

Set API Provider to "Anthropic" with Custom Base URL: https://api.chuizi.ai/anthropic

3

Enter your chuizi.ai API key (ck-...) as the API Key

Custom Base URL
https://api.chuizi.ai/anthropic

OpenCode Setup

OpenCode reads provider configuration from its config file. Point the Anthropic provider to chuizi.ai.

opencode.toml
toml
[providers.anthropic]
base_url = "https://api.chuizi.ai/anthropic"
api_key = "ck-your-key-here"

Other Integrations

n8n

Use chuizi.ai in n8n by adding an OpenAI node with custom credentials. All OpenAI-compatible operations (chat, completions, embeddings) work out of the box.

n8n-setup
Base URL: https://api.chuizi.ai/v1
API Key:  ck-your-key-here

# In n8n:
# 1. Add an "OpenAI" node
# 2. Create a new credential:
#    - API Key: ck-your-key-here
#    - Base URL: https://api.chuizi.ai/v1
# 3. Select any model (e.g. anthropic/claude-sonnet-4-6)
# 4. Works with Chat operations

LangChain

LangChain works with chuizi.ai through the OpenAI-compatible interface. No extra libraries or adapters needed — just use ChatOpenAI and change the base URL.

langchain_example.py
python
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
    model="anthropic/claude-sonnet-4-6",
)

response = llm.invoke("Explain quantum computing in one sentence.")
print(response.content)

Vercel AI SDK

The Vercel AI SDK connects to chuizi.ai via @ai-sdk/openai's createOpenAI helper. All features including generateText, streamText, and tool calling work seamlessly.

vercel-ai.mjs
javascript
import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";

const chuizi = createOpenAI({
  baseURL: "https://api.chuizi.ai/v1",
  apiKey: "ck-your-key-here",
});

const { text } = await generateText({
  model: chuizi("anthropic/claude-sonnet-4-6"),
  prompt: "Explain quantum computing in one sentence.",
});
console.log(text);

Key Management

Each API key can be configured with fine-grained access controls. Set these when creating a key in the Console, or update them later.

PropertyTypeDescription
namestringHuman-readable label (e.g. "Production", "CI/CD")
allowed_modelsstring[]Restrict to specific models. Empty = all models allowed.
ip_whiteliststring[]Restrict to specific IP addresses or CIDR ranges. Empty = any IP.
rpm_limitnumberPer-key requests per minute override. Falls back to tier default.
daily_limitstringMaximum daily spend in USD (e.g. "50.00"). Resets at midnight UTC.
is_activebooleanDisable a key without deleting it. Disabled keys return 403.

Query Key Info

Check your key's current status, balance, and limits programmatically:

GET
/v1/key/info

Returns current key metadata, balance, and usage limits.

terminal
curl https://api.chuizi.ai/v1/key/info \
  -H "Authorization: Bearer ck-YOUR-KEY"
response.json
json
{
  "key_prefix": "ck-a1b2",
  "name": "Production",
  "group": "default",
  "tier": "pro",
  "balance": "142.35000000",
  "is_active": true,
  "allowed_models": ["anthropic/claude-sonnet-4-6", "openai/gpt-4o"],
  "ip_whitelist": [],
  "rpm_limit": 300,
  "daily_limit": "100.00",
  "last_used_at": "2025-06-15T14:22:00Z",
  "created_at": "2025-06-01T08:00:00Z"
}

SDKs & Examples

chuizi.ai works with any OpenAI-compatible SDK. No custom libraries needed — just change the base URL and API key.

Python — Streaming

stream.py
python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
)

stream = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Write a haiku about coding"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Node.js — Streaming

stream.mjs
javascript
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.chuizi.ai/v1",
  apiKey: "ck-your-key-here",
});

const stream = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-6",
  messages: [{ role: "user", content: "Write a haiku about coding" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Python — Tool Use (Function Calling)

tools.py
python
import json
from openai import OpenAI

client = OpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
)

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    }
}]

# Step 1: Initial request with tools
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
)

message = response.choices[0].message

# Step 2: If the model wants to call a tool
if message.tool_calls:
    tool_call = message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)

    # Your function implementation
    weather_data = {"temp": "22C", "condition": "Sunny"}

    # Step 3: Send the tool result back
    follow_up = client.chat.completions.create(
        model="openai/gpt-4o",
        messages=[
            {"role": "user", "content": "What's the weather in Tokyo?"},
            message,  # assistant message with tool_calls
            {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(weather_data),
            },
        ],
        tools=tools,
    )
    print(follow_up.choices[0].message.content)

curl — Non-Streaming

terminal
curl https://api.chuizi.ai/v1/chat/completions \
  -H "Authorization: Bearer ck-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain REST APIs in one sentence."}
    ],
    "temperature": 0.3,
    "max_tokens": 100
  }'

curl — Anthropic Native Protocol

terminal
curl https://api.chuizi.ai/anthropic/v1/messages \
  -H "x-api-key: ck-YOUR-KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 1024,
    "system": "You are a senior software engineer.",
    "messages": [
      {"role": "user", "content": "Review this code: function add(a,b){return a+b}"}
    ]
  }'

Best Practices

Error Recovery with Exponential Backoff

Always implement retry logic with exponential backoff for transient errors (429, 500, 502, 503). Here is a production-ready implementation:

retry.py
python
import time
import random
from openai import OpenAI, APIError, RateLimitError, APIConnectionError

client = OpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
)

def chat_with_retry(messages, model="anthropic/claude-sonnet-4-6", max_retries=5):
    """Make a chat request with exponential backoff retry."""
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model=model,
                messages=messages,
            )
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            # Use Retry-After header if available, otherwise exponential backoff
            wait = float(e.response.headers.get("retry-after", 2 ** attempt))
            wait += random.uniform(0, 1)  # jitter
            print(f"Rate limited. Retrying in {wait:.1f}s...")
            time.sleep(wait)
        except (APIError, APIConnectionError) as e:
            if attempt == max_retries - 1:
                raise
            wait = min(2 ** attempt + random.uniform(0, 1), 30)
            print(f"Error: {e}. Retrying in {wait:.1f}s...")
            time.sleep(wait)

response = chat_with_retry([{"role": "user", "content": "Hello!"}])
print(response.choices[0].message.content)

Cost Optimization

Maximize cache tokens

Keep your system prompt identical across requests to the same model. Anthropic and OpenAI automatically cache repeated prompt prefixes, saving up to 90% on input tokens.

Choose the right model

Use Haiku or GPT-4.1-mini for simple tasks (classification, extraction, formatting). Reserve Opus/GPT-4o/o3 for tasks that genuinely need advanced reasoning. The cost difference can be 10-50x.

Set max_tokens appropriately

Always set max_tokens to a reasonable limit for your use case. This caps output costs and prevents runaway generation. It also improves the accuracy of cost pre-deduction.

Monitor with the Generation API

Periodically query /v1/generation to understand your cost breakdown. Look for requests with high reasoning_tokens or low cache hit rates.

Streaming Best Practices

Always include stream_options

Set stream_options: {include_usage: true} to receive token counts in the final chunk. Without this, you cannot track costs client-side.

Handle partial chunks gracefully

SSE chunks can split across network packets. Always buffer and parse complete data: lines. The OpenAI SDK handles this automatically.

Use -N flag with curl

The -N (--no-buffer) flag disables output buffering in curl, so you see tokens as they arrive instead of all at once.

Key Security

Rotate keys regularly

Create new keys periodically and deactivate old ones. Use separate keys for development, staging, and production environments.

Use IP whitelisting

For production keys, restrict access to your server IP addresses or CIDR ranges. This prevents unauthorized usage even if a key is leaked.

Set daily spend caps

Configure daily_limit on each key to prevent runaway costs from bugs or compromised keys. The key returns 403 once the cap is reached.

Restrict allowed models

Use allowed_models to limit which models each key can access. A key meant for embeddings does not need access to GPT-4o or Claude Opus.

Never expose keys in client-side code

API keys should only be used server-side. Never include them in JavaScript bundles, mobile apps, or public repositories. Use environment variables exclusively.