Huddle01 Cloud

Guide

Complete guide to Huddle01 AI inference, endpoint setup, supported workflows, and next steps.

Huddle01 AI Inference gives you one OpenAI-compatible endpoint for multiple model providers.

What is inference?

Inference means sending input (messages, prompt, or multimodal data) to a model and getting generated output in response.

With Huddle01:

  • You use one API key.
  • You hit one base URL.
  • You can switch models by changing only the model field.

Endpoint and auth

  • Base URL: https://gru.huddle01.io/v1
  • Auth header: Authorization: Bearer <HUDDLE_API_KEY>
  • Compatibility: OpenAI-compatible SDKs and HTTP APIs

Quick navigation

Quick start flow

  1. Create or copy your Huddle01 API key from dashboard.
  2. Set your client base_url to https://gru.huddle01.io/v1.
  3. Call chat/completions with one of the supported model IDs.
  4. Read usage and spend from your existing dashboard/billing views.

Apps and IDEs

Most apps that support OpenAI-compatible providers need the same three fields:

FieldValue
Provider typeOpenAI-compatible
Base URLhttps://gru.huddle01.io/v1
API keyYour Huddle01 AI Inference API key
ModelAny model ID from Pricing and Models

OpenCode

Create or update opencode.json in your project root, or place it in your global OpenCode config directory.

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "huddle01": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Huddle01 AI Inference",
      "options": {
        "baseURL": "https://gru.huddle01.io/v1",
        "apiKey": "{env:HUDDLE_API_KEY}"
      },
      "models": {
        "qwen3-coder": {
          "name": "Qwen3 Coder"
        },
        "deepseek-v3.2": {
          "name": "DeepSeek V3.2"
        },
        "glm-4.7": {
          "name": "GLM 4.7"
        },
        "minimax-m2.5": {
          "name": "MiniMax M2.5"
        }
      }
    }
  }
}

Then set your key and choose a model in OpenCode:

export HUDDLE_API_KEY="your_huddle01_key"
opencode

Inside OpenCode, run /models and select one of the huddle01 models.

Other apps

For tools like Cursor extensions, Continue, Cline, Open WebUI, LangChain, or custom OpenAI SDK clients, use the same setup:

  1. Choose OpenAI Compatible as the provider.
  2. Paste https://gru.huddle01.io/v1 as the base URL.
  3. Paste your Huddle01 API key.
  4. Set the model ID, for example qwen3-coder, deepseek-v3.2, glm-4.7, or minimax-m2.5.

If the app asks for a full endpoint instead of a base URL, use https://gru.huddle01.io/v1/chat/completions.

Request lifecycle

  1. Your app sends a request to POST /chat/completions.
  2. Huddle01 authenticates the API key and validates request shape.
  3. The AI Inference service routes your request to the selected model/provider.
  4. Response is normalized into OpenAI-compatible format.
  5. You receive generated output and usage metadata.

What you can build

  • AI chat and assistant experiences
  • Content generation and summarization
  • Code generation and developer tools
  • Workflow automations with structured prompting
  • Multi-model routing with one integration surface

Compatibility

  • Works with OpenAI-compatible SDKs
  • Works with direct HTTP calls (curl, backend services, serverless functions)
  • Supports model switching without client rewrites

Best practices

  • Keep API keys on the server side only.
  • Use environment variables and secret managers, never hardcode keys.
  • Start with lower-cost models for non-critical traffic.
  • Add retry + timeout handling in production clients.
  • Log model, latency, token usage, and request ID for debugging.

Production tip

Use one default model for most traffic and route only high-complexity requests to premium models. This keeps latency and cost predictable.

Common mistakes to avoid

  • Using the wrong base URL (must be https://gru.huddle01.io/v1)
  • Sending API key in custom headers instead of Authorization: Bearer ...
  • Assuming every model has identical behavior/capabilities
  • Not handling transient upstream/network failures

Next steps

  1. Go to API Examples for concrete API requests and SDK examples.
  2. Check Pricing to pick the right models for your workloads.