Documentation

API Documentation

One base URL, all major LLM providers. OpenAI-compatible protocol means your existing code works instantly.

https://api.chuizi.ai/v1Get API Key

OpenAI Protocol

/v1/chat/completions

Anthropic Protocol

/anthropic/v1/messages

Gemini Protocol

/gemini/v1beta/models/*

Quick Start

Get up and running with chuizi.ai in under a minute. Use any OpenAI-compatible SDK — just change the base URL and API key.

Create an account

Generate an API key

Go to Console > API Keys and create a new key. Copy it immediately — it is shown only once.

Make your first request

main.py

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Authentication

All API requests require a valid API key. Keys use the ck- prefix followed by 32 alphanumeric characters (e.g. ck-a1b2c3d4e5f6...). We store only the SHA-256 hash of each key — the full key is shown exactly once at creation time.

chuizi.ai accepts both authentication header formats. Use whichever your SDK expects:

headers

# OpenAI convention (works with all protocols)
Authorization: Bearer ck-your-key-here

# Anthropic convention (also accepted on all protocols)
x-api-key: ck-your-key-here

Security note: Your key is hashed with SHA-256 before storage. If you lose it, generate a new one — we cannot recover the original. Always store keys in environment variables, never in source code.

Model Naming

All models use the provider/model format (e.g. anthropic/claude-sonnet-4-6). This avoids naming collisions across providers and makes routing explicit. Bare model names (e.g. claude-sonnet-4-6) are accepted as aliases for convenience.

Full Name	Alias	Provider
anthropic/claude-opus-4-6	claude-opus-4-6	Anthropic
anthropic/claude-opus-4	claude-opus-4	Anthropic
anthropic/claude-sonnet-4-6	claude-sonnet-4-6	Anthropic
anthropic/claude-sonnet-4	claude-sonnet-4	Anthropic
anthropic/claude-haiku-4-5	claude-haiku-4-5	Anthropic
openai/gpt-4o	gpt-4o	OpenAI
openai/gpt-4.1	gpt-4.1	OpenAI
openai/gpt-4.1-mini	gpt-4.1-mini	OpenAI
openai/o3	o3	OpenAI
openai/o4-mini	o4-mini	OpenAI
google/gemini-2.5-pro	gemini-2.5-pro	Google
google/gemini-2.5-flash	gemini-2.5-flash	Google
deepseek/deepseek-chat	deepseek-chat	DeepSeek
deepseek/deepseek-reasoner	deepseek-reasoner	DeepSeek

This is a representative sample. See the full model list for all available models with real-time pricing.

Note: When using the /anthropic protocol, use bare Anthropic model names without the anthropic/ prefix (e.g. claude-sonnet-4-6 instead of anthropic/claude-sonnet-4-6), matching the official Anthropic API behavior.

OpenAI-Compatible Protocol

The /v1/* endpoints implement the full OpenAI Chat Completions API. Any SDK or tool that works with OpenAI will work with chuizi.ai — just change the base URL.

POST

/v1/chat/completions

Chat completions (streaming + non-streaming). Supports tools, vision, JSON mode.

GET

/v1/models

List all available models with pricing and capability metadata.

GET

/v1/generation?id=gen-xxx

Query billing details for a specific request (see Billing section).

GET

/v1/key/info

Current key info: balance, limits, usage (see Key Management section).

Request Schema

request.json

json

{
  "model": "anthropic/claude-sonnet-4-6",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in one paragraph."}
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false,
  "stream_options": {"include_usage": true},
  "tools": [],
  "response_format": {"type": "text"}
}

Response Schema

Responses follow the standard OpenAI format. chuizi.ai adds an x_chuizi extension field with request metadata:

response.json

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "anthropic/claude-sonnet-4-6",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Quantum computing leverages quantum mechanical phenomena..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 89,
    "total_tokens": 113
  },
  "x_chuizi": {
    "generation_id": "gen-abc123",
    "provider": "anthropic",
    "latency_ms": 1230,
    "cost": "0.00045600"
  }
}

Non-streaming only: The x_chuizi extension is always included in non-streaming responses. For streaming, these fields appear in the final [DONE]-preceding chunk. Use generation_id to query full billing details via the Generation API.

Anthropic Native Protocol

The /anthropic/* endpoints provide a near-passthrough proxy to the Anthropic Messages API. Request and response bodies are forwarded with minimal transformation — we only swap the auth key and extract usage for billing. This means full compatibility with Claude Code, Cursor, Cline, and any tool that speaks the Anthropic API natively.

POST

/anthropic/v1/messages

Anthropic Messages API (streaming + non-streaming). Full feature parity.

GET

/anthropic/v1/models

List Anthropic models available through chuizi.ai.

Claude Code Setup (2 lines)

Add these to your shell profile and restart your terminal. Claude Code works instantly — no format conversion, no SDK changes.

~/.zshrc

export ANTHROPIC_BASE_URL=https://api.chuizi.ai/anthropic
export ANTHROPIC_API_KEY=ck-your-key-here

Native Request Example

Requests use the standard Anthropic Messages API format. Include the anthropic-version header (required by the Anthropic SDK). Both x-api-key and Authorization: Bearer headers are accepted.

terminal

curl https://api.chuizi.ai/anthropic/v1/messages \
  -H "x-api-key: ck-YOUR-KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Anthropic Response Format

response.json

json

{
  "id": "gen-abc123",
  "type": "message",
  "role": "assistant",
  "content": [
    {"type": "text", "text": "Hello! How can I help you today?"}
  ],
  "model": "claude-sonnet-4-6",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 12,
    "output_tokens": 14,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0
  }
}

Anthropic SSE Streaming Events

When "stream": true is set, the response uses Anthropic's native SSE event types. These are passed through unmodified:

Event Type	Description
message_start	Initial message object with model and usage metadata.
content_block_start	Start of a content block (text or tool_use).
content_block_delta	Incremental text delta within a content block.
content_block_stop	End of the current content block.
message_delta	Final usage stats (output tokens) and stop_reason.
message_stop	Stream is complete. No more events will follow.
ping	Keepalive event. Ignore in application logic.

Passthrough guarantee: The /anthropic path does NOT convert formats. Your request body goes to Anthropic as-is (with auth swapped). The response comes back as-is. If Anthropic adds new features (e.g. new content block types), they work through chuizi.ai immediately with no changes on our side.

Beta features: The anthropic-beta header is passed through to upstream, enabling beta features like prompt caching and extended thinking when supported by the model.

Gemini Native Protocol

Coming Soon The Gemini native protocol is under development and not yet available. The documentation below is provided for reference and will become active in a future release.

The /gemini/* endpoints provide a native Google Gemini API proxy. Requests and responses are forwarded directly to Google AI — we only swap the auth key and extract usage for billing. Ideal for developers using the Google AI SDK.

POST

/gemini/v1beta/models/*/generateContent

Gemini content generation (non-streaming)

POST

/gemini/v1beta/models/*/streamGenerateContent

Gemini streaming content generation

GET

/gemini/v1beta/models

List available Gemini models

Request Example

Requests use the standard Gemini API format. Authentication is provided via x-api-key or Authorization: Bearer header (using your chuizi.ai ck- key, not a Google API key).

terminal

curl https://api.chuizi.ai/gemini/v1beta/models/gemini-2.5-pro:generateContent \
  -H "x-api-key: ck-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "Explain quantum computing in one paragraph."}]}
    ],
    "generationConfig": {
      "temperature": 0.7,
      "maxOutputTokens": 1024
    }
  }'

Response Format

response.json

json

{
  "candidates": [
    {
      "content": {
        "parts": [{"text": "Quantum computing harnesses quantum mechanical phenomena..."}],
        "role": "model"
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 12,
    "candidatesTokenCount": 89,
    "totalTokenCount": 101
  }
}

Streaming Generation

Use the streamGenerateContent endpoint with the ?alt=sse query parameter to receive Server-Sent Events streaming responses.

terminal

curl https://api.chuizi.ai/gemini/v1beta/models/gemini-2.5-pro:streamGenerateContent?alt=sse \
  -H "x-api-key: ck-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "Write a haiku about coding."}]}
    ]
  }'

SDK Examples

gemini.py

python

from google import genai

client = genai.Client(
    api_key="ck-your-key-here",
    http_options={"api_version": "v1beta", "base_url": "https://api.chuizi.ai/gemini"},
)

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="Explain quantum computing in one paragraph.",
)
print(response.text)

Note: The Gemini protocol is in Phase 3 and supports Gemini 2.5 Pro and Gemini 2.5 Flash models. Like all protocols, billing uses the unified Generation API — requests made via /gemini are queryable via GET /v1/generation?id=gen_xxx.

Streaming

Streaming delivers response tokens as they are generated via Server-Sent Events (SSE). This dramatically reduces time-to-first-token and enables real-time UI updates.

OpenAI Streaming

Set "stream": true in your request body. To receive token usage in the final chunk, add "stream_options": {"include_usage": true} — this is critical for tracking costs.

request.json

json

{
  "model": "anthropic/claude-sonnet-4-6",
  "messages": [{"role": "user", "content": "Count to 5"}],
  "stream": true,
  "stream_options": {"include_usage": true}
}

SSE Format

Each event is a line prefixed with data: followed by a JSON object. The stream ends with data: [DONE].

sse-stream

sse

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"1"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":", 2"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":8,"total_tokens":20}}

data: [DONE]

Response Headers

Every response (streaming and non-streaming) includes these headers:

Header	Description
X-Request-Id	Unique request ID for debugging. Include in support tickets.
Content-Type	text/event-stream for streaming, application/json otherwise.
Cache-Control	no-cache (streaming responses are never cached).

Streaming Code Examples

stream.py

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
)

stream = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Write a haiku about coding"}],
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
    if hasattr(chunk, "usage") and chunk.usage:
        print(f"\n\nTokens: {chunk.usage.total_tokens}")

When to Use Streaming

Use streaming for interactive UIs, chatbots, and any case where the user is waiting for a response. Use non-streaming for batch processing, background jobs, and when you need the complete response before proceeding. Streaming also provides faster time-to-first-token, which improves perceived latency even when total generation time is the same.

Billing & Usage

chuizi.ai uses a prepaid credit system. Add funds to your account, then usage is deducted per request. Per-request cost transparency with no hidden fees.

How Billing Works

Pre-deduct: Before the request is sent upstream, we estimate the cost based on input tokens and freeze that amount from your balance in Redis.

Process: The request is forwarded to the upstream provider. Tokens stream back to you in real-time.

Reconcile: After the response completes, the actual token usage from the provider determines the final cost. The frozen estimate is released and the real amount is deducted. A BullMQ worker writes the transaction to PostgreSQL.

Pricing Formula

pricing.txt

cost = input_tokens * input_price + output_tokens * output_price

# Example: anthropic/claude-sonnet-4-6
# $3.15 / 1M input, $15.75 / 1M output
#
# 1000 input + 500 output tokens:
# cost = (1000 * 3.15/1e6 + 500 * 15.75/1e6) = $0.011025

Auto-Recharge

Auto-recharge lets you maintain a minimum balance without manual top-ups. Set a balance threshold and recharge amount in your billing settings. When your balance drops below the threshold, your saved payment method is automatically charged for the configured amount.

Generation API

Every request generates a unique gen--prefixed ID. Query full billing details for any request:

GET

/v1/generation?id=gen-xxx

Query detailed billing and metadata for a specific request.

terminal

curl https://api.chuizi.ai/v1/generation?id=gen-abc123 \
  -H "Authorization: Bearer ck-YOUR-KEY"

response.json

json

{
  "data": {
    "id": "gen-abc123",
    "model": "anthropic/claude-sonnet-4-6",
    "provider": "anthropic",
    "input_tokens": 150,
    "output_tokens": 89,
    "native_input_tokens": 150,
    "native_output_tokens": 89,
    "cached_tokens": 0,
    "reasoning_tokens": 0,
    "cost": "0.00045600",
    "upstream_cost": "0.00035077",
    "latency_ms": 1230,
    "generation_time_ms": 1180,
    "finish_reason": "stop",
    "streamed": true,
    "created_at": "2025-01-15T10:30:00Z"
  }
}

Generation Response Fields

Field	Type	Description
id	string	Unique generation ID (gen_ prefix).
model	string	Full provider/model name used for the request.
provider	string	Upstream provider (anthropic, openai, google, deepseek).
input_tokens	number	Number of input (prompt) tokens billed.
output_tokens	number	Number of output (completion) tokens billed.
native_input_tokens	number	Token count as reported by the upstream provider.
native_output_tokens	number	Output token count as reported by the upstream provider.
cached_tokens	number	Input tokens served from cache (reduced cost).
reasoning_tokens	number	Internal reasoning tokens (o3, DeepSeek R1). Billed as output.
cost	string	Total cost charged to your balance (Decimal string, USD).
upstream_cost	string	Actual upstream provider cost.
latency_ms	number	Total round-trip latency including network (milliseconds).
generation_time_ms	number	Time spent generating the response at the provider.
finish_reason	string	Why generation stopped: stop, length, tool_calls, content_filter.
streamed	boolean	Whether the request used streaming.
created_at	string	ISO 8601 timestamp of request creation.

Cache & Reasoning Tokens

Cache Tokens

Some providers cache repeated prompt prefixes, dramatically reducing cost for subsequent requests with the same system prompt or conversation history. Cached input tokens are billed at a fraction of the normal input price.

Provider	Models	Cache Discount	How to Enable
Anthropic	All Claude models	90% off input price	Automatic for repeated prefixes. Use cache_control for explicit control.
OpenAI	GPT-4o, GPT-4.1, o3	50% off input price	Automatic for prompts > 1024 tokens with shared prefix.
DeepSeek	DeepSeek-Chat, DeepSeek-R1	90% off input price	Automatic for repeated prefixes (context caching).
Google	Gemini 2.5 Pro/Flash	75% off input price	Use cachedContent API or automatic prefix caching.

Cache tokens appear in the cached_tokens field of the Generation API response and in the Anthropic usage.cache_read_input_tokens field. Keep your system prompt stable across requests to maximize cache hits.

Reasoning Tokens

Reasoning models (OpenAI o3, o3-mini, DeepSeek R1) generate internal "thinking" tokens before producing the final output. These tokens are not visible in the response but are billed as output tokens.

Model	Reasoning Tokens	Billing
openai/o3	Up to 100K tokens per request	Billed at output token rate
openai/o4-mini	Up to 100K tokens per request	Billed at output token rate
deepseek/deepseek-reasoner	Variable (shown in response)	Billed at output token rate

Complete Pricing Formula

pricing-formula.txt

cost = (input_tokens - cached_tokens) * input_price
     + cached_tokens * cached_input_price
     + (output_tokens + reasoning_tokens) * output_price

# Example: anthropic/claude-sonnet-4-6 with cache hit
# 2000 input tokens, 1500 cached, 500 output tokens
# Input price: $3.15/1M, Cached: $0.315/1M, Output: $15.75/1M
#
# cost = 500 * 3.15/1e6      # non-cached input: $0.001575
#      + 1500 * 0.315/1e6    # cached input:     $0.0004725
#      + 500 * 15.75/1e6     # output:           $0.007875
#      = $0.009923

Error Handling

All errors follow the OpenAI-compatible error format, regardless of which protocol you use. This makes error handling consistent across /v1 and /anthropic paths.

error.json

json

{
  "error": {
    "message": "Insufficient balance. Please top up at https://chuizi.ai/billing",
    "type": "insufficient_quota",
    "code": "402"
  }
}

Error Codes Reference

Code	Type	Description
400	invalid_request_error	Malformed request body, missing required fields, invalid model name, or unsupported parameters.
401	authentication_error	Missing, invalid, or expired API key. Check the Authorization or x-api-key header.
402	insufficient_quota	Account balance too low for the estimated request cost. Top up your balance.
403	permission_error	Key is inactive, IP not in whitelist, model not in allowed_models, or daily spend cap reached.
403	model_not_allowed	Your API key does not have access to this model.
403	ip_not_allowed	Request IP is not in the key's whitelist.
403	daily_limit_exceeded	Daily spending limit reached for this key.
404	not_found	Requested model or endpoint does not exist. Check model naming format.
429	rate_limit_error	RPM or TPM limit exceeded for your tier. Check X-RateLimit-Reset header for retry time.
500	internal_error	Internal chuizi.ai server error. Retry with exponential backoff. Contact support if persistent.
502	upstream_error	Upstream provider returned an error. The provider may be experiencing issues.
503	service_unavailable	Upstream provider is overloaded or temporarily unavailable. Retry after a short delay.

Tip: Every error response includes an X-Request-Id header. Include this ID when contacting support for fastest resolution.

Rate Limits

chuizi.ai has no rate limits by default. Pay for what you use with unlimited RPM/TPM. Per-key limits can be configured in the console if needed.

Default Limits

The table below shows default RPM (requests per minute) limits for each tier. Rate limits are per-key configurable. Set custom RPM in your API key settings.

Type	RPM
Free	60
Starter	120
Pro	300
Enterprise	Custom

Rate Limit Headers

Every response includes rate limit information in the headers:

Header	Description
X-RateLimit-Limit	Maximum requests per minute for this key.
X-RateLimit-Remaining	Requests remaining in the current window.
X-RateLimit-Reset	Unix timestamp (seconds) when the window resets.

Per-Key Customization

Override your tier's default RPM on individual keys via the Console or API. Set rpm_limit when creating/updating a key. You can also set a daily_limit (in USD) to cap spending — the key returns 403 once the daily cap is reached and resets at midnight UTC.

Claude Code Setup

Claude Code uses the Anthropic API natively. Set two environment variables in your shell profile and you are done — no format conversion, no SDK changes, no wrappers.

~/.zshrc

bash

# Add to ~/.zshrc or ~/.bashrc
export ANTHROPIC_BASE_URL=https://api.chuizi.ai/anthropic
export ANTHROPIC_API_KEY=ck-your-key-here

# Restart terminal, then run claude as usual
source ~/.zshrc
claude

Claude Code reads ANTHROPIC_BASE_URL and ANTHROPIC_API_KEY on startup. After editing your shell profile, restart your terminal or run source ~/.zshrc for the changes to take effect.

Cursor Setup

Cursor supports custom API endpoints for Anthropic models. Configure the base URL in Cursor settings to route all Claude requests through chuizi.ai.

Open Cursor Settings (Cmd/Ctrl + ,)

Navigate to Models > Anthropic

Set API Base URL to https://api.chuizi.ai/anthropic

Enter your chuizi.ai API key (ck-...) as the API Key

API Base URL

https://api.chuizi.ai/anthropic

Cline Setup

Cline (VS Code extension) supports custom API providers. Configure it to use chuizi.ai as the Anthropic API endpoint.

Open Cline extension settings in VS Code

Set API Provider to "Anthropic" with Custom Base URL: https://api.chuizi.ai/anthropic

Enter your chuizi.ai API key (ck-...) as the API Key

Custom Base URL

https://api.chuizi.ai/anthropic

OpenCode Setup

OpenCode reads provider configuration from its config file. Point the Anthropic provider to chuizi.ai.

opencode.toml

toml

[providers.anthropic]
base_url = "https://api.chuizi.ai/anthropic"
api_key = "ck-your-key-here"

Other Integrations

n8n

Use chuizi.ai in n8n by adding an OpenAI node with custom credentials. All OpenAI-compatible operations (chat, completions, embeddings) work out of the box.

n8n-setup

Base URL: https://api.chuizi.ai/v1
API Key:  ck-your-key-here

# In n8n:
# 1. Add an "OpenAI" node
# 2. Create a new credential:
#    - API Key: ck-your-key-here
#    - Base URL: https://api.chuizi.ai/v1
# 3. Select any model (e.g. anthropic/claude-sonnet-4-6)
# 4. Works with Chat operations

LangChain

LangChain works with chuizi.ai through the OpenAI-compatible interface. No extra libraries or adapters needed — just use ChatOpenAI and change the base URL.

langchain_example.py

python

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
    model="anthropic/claude-sonnet-4-6",
)

response = llm.invoke("Explain quantum computing in one sentence.")
print(response.content)

Vercel AI SDK

The Vercel AI SDK connects to chuizi.ai via @ai-sdk/openai's createOpenAI helper. All features including generateText, streamText, and tool calling work seamlessly.

vercel-ai.mjs

javascript

import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";

const chuizi = createOpenAI({
  baseURL: "https://api.chuizi.ai/v1",
  apiKey: "ck-your-key-here",
});

const { text } = await generateText({
  model: chuizi("anthropic/claude-sonnet-4-6"),
  prompt: "Explain quantum computing in one sentence.",
});
console.log(text);

Key Management

Each API key can be configured with fine-grained access controls. Set these when creating a key in the Console, or update them later.

Property	Type	Description
name	string	Human-readable label (e.g. "Production", "CI/CD")
allowed_models	string[]	Restrict to specific models. Empty = all models allowed.
ip_whitelist	string[]	Restrict to specific IP addresses or CIDR ranges. Empty = any IP.
rpm_limit	number	Per-key requests per minute override. Falls back to tier default.
daily_limit	string	Maximum daily spend in USD (e.g. "50.00"). Resets at midnight UTC.
is_active	boolean	Disable a key without deleting it. Disabled keys return 403.

Query Key Info

Check your key's current status, balance, and limits programmatically:

GET

/v1/key/info

Returns current key metadata, balance, and usage limits.

terminal

curl https://api.chuizi.ai/v1/key/info \
  -H "Authorization: Bearer ck-YOUR-KEY"

response.json

json

{
  "key_prefix": "ck-a1b2",
  "name": "Production",
  "group": "default",
  "tier": "pro",
  "balance": "142.35000000",
  "is_active": true,
  "allowed_models": ["anthropic/claude-sonnet-4-6", "openai/gpt-4o"],
  "ip_whitelist": [],
  "rpm_limit": 300,
  "daily_limit": "100.00",
  "last_used_at": "2025-06-15T14:22:00Z",
  "created_at": "2025-06-01T08:00:00Z"
}

SDKs & Examples

chuizi.ai works with any OpenAI-compatible SDK. No custom libraries needed — just change the base URL and API key.

Python — Streaming

stream.py

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
)

stream = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Write a haiku about coding"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Node.js — Streaming

stream.mjs

javascript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.chuizi.ai/v1",
  apiKey: "ck-your-key-here",
});

const stream = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-6",
  messages: [{ role: "user", content: "Write a haiku about coding" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Python — Tool Use (Function Calling)

tools.py

python

import json
from openai import OpenAI

client = OpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
)

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    }
}]

# Step 1: Initial request with tools
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
)

message = response.choices[0].message

# Step 2: If the model wants to call a tool
if message.tool_calls:
    tool_call = message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)

    # Your function implementation
    weather_data = {"temp": "22C", "condition": "Sunny"}

    # Step 3: Send the tool result back
    follow_up = client.chat.completions.create(
        model="openai/gpt-4o",
        messages=[
            {"role": "user", "content": "What's the weather in Tokyo?"},
            message,  # assistant message with tool_calls
            {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(weather_data),
            },
        ],
        tools=tools,
    )
    print(follow_up.choices[0].message.content)

curl — Non-Streaming

terminal

curl https://api.chuizi.ai/v1/chat/completions \
  -H "Authorization: Bearer ck-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain REST APIs in one sentence."}
    ],
    "temperature": 0.3,
    "max_tokens": 100
  }'

curl — Anthropic Native Protocol

terminal

curl https://api.chuizi.ai/anthropic/v1/messages \
  -H "x-api-key: ck-YOUR-KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 1024,
    "system": "You are a senior software engineer.",
    "messages": [
      {"role": "user", "content": "Review this code: function add(a,b){return a+b}"}
    ]
  }'

Best Practices

Error Recovery with Exponential Backoff

Always implement retry logic with exponential backoff for transient errors (429, 500, 502, 503). Here is a production-ready implementation:

retry.py

python

import time
import random
from openai import OpenAI, APIError, RateLimitError, APIConnectionError

client = OpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
)

def chat_with_retry(messages, model="anthropic/claude-sonnet-4-6", max_retries=5):
    """Make a chat request with exponential backoff retry."""
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model=model,
                messages=messages,
            )
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            # Use Retry-After header if available, otherwise exponential backoff
            wait = float(e.response.headers.get("retry-after", 2 ** attempt))
            wait += random.uniform(0, 1)  # jitter
            print(f"Rate limited. Retrying in {wait:.1f}s...")
            time.sleep(wait)
        except (APIError, APIConnectionError) as e:
            if attempt == max_retries - 1:
                raise
            wait = min(2 ** attempt + random.uniform(0, 1), 30)
            print(f"Error: {e}. Retrying in {wait:.1f}s...")
            time.sleep(wait)

response = chat_with_retry([{"role": "user", "content": "Hello!"}])
print(response.choices[0].message.content)

Cost Optimization

Maximize cache tokens

Keep your system prompt identical across requests to the same model. Anthropic and OpenAI automatically cache repeated prompt prefixes, saving up to 90% on input tokens.

Choose the right model

Use Haiku or GPT-4.1-mini for simple tasks (classification, extraction, formatting). Reserve Opus/GPT-4o/o3 for tasks that genuinely need advanced reasoning. The cost difference can be 10-50x.

Set max_tokens appropriately

Always set max_tokens to a reasonable limit for your use case. This caps output costs and prevents runaway generation. It also improves the accuracy of cost pre-deduction.

Monitor with the Generation API

Periodically query /v1/generation to understand your cost breakdown. Look for requests with high reasoning_tokens or low cache hit rates.

Streaming Best Practices

Always include stream_options

Set stream_options: {include_usage: true} to receive token counts in the final chunk. Without this, you cannot track costs client-side.

Handle partial chunks gracefully

SSE chunks can split across network packets. Always buffer and parse complete data: lines. The OpenAI SDK handles this automatically.

Use -N flag with curl

The -N (--no-buffer) flag disables output buffering in curl, so you see tokens as they arrive instead of all at once.

Key Security

Rotate keys regularly

Create new keys periodically and deactivate old ones. Use separate keys for development, staging, and production environments.

Use IP whitelisting

For production keys, restrict access to your server IP addresses or CIDR ranges. This prevents unauthorized usage even if a key is leaked.

Set daily spend caps

Configure daily_limit on each key to prevent runaway costs from bugs or compromised keys. The key returns 403 once the cap is reached.

Restrict allowed models

Use allowed_models to limit which models each key can access. A key meant for embeddings does not need access to GPT-4o or Claude Opus.

Never expose keys in client-side code

API keys should only be used server-side. Never include them in JavaScript bundles, mobile apps, or public repositories. Use environment variables exclusively.