Guide
Complete guide to Huddle01 AI inference, endpoint setup, supported workflows, and next steps.
Huddle01 AI Inference gives you one OpenAI-compatible endpoint for multiple model providers.
What is inference?
Inference means sending input (messages, prompt, or multimodal data) to a model and getting generated output in response.
With Huddle01:
- You use one API key.
- You hit one base URL.
- You can switch models by changing only the
modelfield.
Endpoint and auth
- Base URL:
https://gru.huddle01.io/v1 - Auth header:
Authorization: Bearer <HUDDLE_API_KEY> - Compatibility: OpenAI-compatible SDKs and HTTP APIs
Quick navigation
API Examples
Connection setup, OpenAPI-style schema, and SDK request examples.
Pricing and Models
Model-wise input/output pricing and capabilities matrix.
Request Lifecycle
Understand what happens from API request to model response.
Apps and IDEs
Use Huddle01 in OpenCode and other OpenAI-compatible tools.
Best Practices
Production tips for reliability, security, and cost control.
Quick start flow
- Create or copy your Huddle01 API key from dashboard.
- Set your client
base_urltohttps://gru.huddle01.io/v1. - Call
chat/completionswith one of the supported model IDs. - Read usage and spend from your existing dashboard/billing views.
Apps and IDEs
Most apps that support OpenAI-compatible providers need the same three fields:
| Field | Value |
|---|---|
| Provider type | OpenAI-compatible |
| Base URL | https://gru.huddle01.io/v1 |
| API key | Your Huddle01 AI Inference API key |
| Model | Any model ID from Pricing and Models |
OpenCode
Create or update opencode.json in your project root, or place it in your global OpenCode config directory.
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"huddle01": {
"npm": "@ai-sdk/openai-compatible",
"name": "Huddle01 AI Inference",
"options": {
"baseURL": "https://gru.huddle01.io/v1",
"apiKey": "{env:HUDDLE_API_KEY}"
},
"models": {
"qwen3-coder": {
"name": "Qwen3 Coder"
},
"deepseek-v3.2": {
"name": "DeepSeek V3.2"
},
"glm-4.7": {
"name": "GLM 4.7"
},
"minimax-m2.5": {
"name": "MiniMax M2.5"
}
}
}
}
}Then set your key and choose a model in OpenCode:
export HUDDLE_API_KEY="your_huddle01_key"
opencodeInside OpenCode, run /models and select one of the huddle01 models.
Other apps
For tools like Cursor extensions, Continue, Cline, Open WebUI, LangChain, or custom OpenAI SDK clients, use the same setup:
- Choose
OpenAI Compatibleas the provider. - Paste
https://gru.huddle01.io/v1as the base URL. - Paste your Huddle01 API key.
- Set the model ID, for example
qwen3-coder,deepseek-v3.2,glm-4.7, orminimax-m2.5.
If the app asks for a full endpoint instead of a base URL, use https://gru.huddle01.io/v1/chat/completions.
Request lifecycle
- Your app sends a request to
POST /chat/completions. - Huddle01 authenticates the API key and validates request shape.
- The AI Inference service routes your request to the selected model/provider.
- Response is normalized into OpenAI-compatible format.
- You receive generated output and usage metadata.
What you can build
- AI chat and assistant experiences
- Content generation and summarization
- Code generation and developer tools
- Workflow automations with structured prompting
- Multi-model routing with one integration surface
Compatibility
- Works with OpenAI-compatible SDKs
- Works with direct HTTP calls (
curl, backend services, serverless functions) - Supports model switching without client rewrites
Best practices
- Keep API keys on the server side only.
- Use environment variables and secret managers, never hardcode keys.
- Start with lower-cost models for non-critical traffic.
- Add retry + timeout handling in production clients.
- Log
model, latency, token usage, and request ID for debugging.
Production tip
Use one default model for most traffic and route only high-complexity requests to premium models. This keeps latency and cost predictable.
Common mistakes to avoid
- Using the wrong base URL (must be
https://gru.huddle01.io/v1) - Sending API key in custom headers instead of
Authorization: Bearer ... - Assuming every model has identical behavior/capabilities
- Not handling transient upstream/network failures
Next steps
- Go to API Examples for concrete API requests and SDK examples.
- Check Pricing to pick the right models for your workloads.