Error Handling & Troubleshooting

Comprehensive guide to handling errors, implementing retries, and troubleshooting common issues with the HiveOps API.

HTTP Status Codes

Code	Meaning	Description	Action
`200`	OK	Request successful	Continue normally
`400`	Bad Request	Invalid request format or parameters	Fix request, don't retry
`401`	Unauthorized	Invalid or missing API key	Check API key
`403`	Forbidden	Insufficient balance or blocked account	Add funds or contact support
`429`	Too Many Requests	Rate limit exceeded	Implement exponential backoff
`500`	Internal Server Error	Server-side error	Retry with exponential backoff
`503`	Service Unavailable	Temporary overload or maintenance	Retry after delay

Error Response Format

All errors return JSON in this format:

{
  "error": {
    "message": "Human-readable error description",
    "type": "error_type",
    "code": "error_code"
  }
}

Example Errors

401 Unauthorized:

{
  "error": {
    "message": "Invalid API key provided",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

429 Rate Limit:

{
  "error": {
    "message": "Rate limit exceeded. Please try again later.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

403 Insufficient Balance:

{
  "error": {
    "message": "Insufficient account balance. Please add funds to continue.",
    "type": "insufficient_quota",
    "code": "insufficient_balance"
  }
}

Error Types

1. Invalid Request Errors (400)

Common Causes:

Missing required parameters
Invalid parameter values
Malformed JSON
Model name not recognized

Example:

from openai import OpenAI

client = OpenAI(
    api_key="sk-YOUR-API-KEY",
    base_url="https://ai.hiveops.io"
)

try:
    response = client.chat.completions.create(
        model="invalid-model-name",  # Wrong model
        messages=[{"role": "user", "content": "Hi"}]
    )
except Exception as e:
    print(f"Error: {e}")
    # Error: Invalid model 'invalid-model-name' specified

Solution:

Validate request parameters before sending
Use correct model names (see API Reference)
Check request format against documentation

2. Authentication Errors (401)

Common Causes:

Missing Authorization header
Invalid API key
Expired API key (90 days)
Revoked API key

Example:

try:
    response = client.chat.completions.create(
        model="llama3:8b-instruct-q8_0",
        messages=[{"role": "user", "content": "Hi"}]
    )
except openai.AuthenticationError as e:
    print(f"Authentication failed: {e}")

Solutions:

Verify API Key:

import os

api_key = os.getenv("HIVEOPS_API_KEY")
if not api_key or not api_key.startswith("sk-"):
    raise ValueError("Invalid or missing API key")

Check Key Expiration:
- API keys expire after 90 days
- Generate a new key in Dashboard → API Keys
Ensure Correct Header:

curl https://ai.hiveops.io/models \
  -H "Authorization: Bearer sk-YOUR-API-KEY"  # Must include "Bearer "

3. Insufficient Balance (403)

Cause: Your account balance is $0 or negative.

Error Message:

Insufficient account balance. Please add funds to continue.

Solutions:

Check Balance:
- Go to Dashboard
- View current balance in the top right
Add Funds:
- Click "Add Funds" in dashboard
- Minimum top-up: $10
- Maximum: $1,000 per transaction
Handle in Code:

try:
    response = client.chat.completions.create(...)
except openai.APIError as e:
    if "insufficient_balance" in str(e).lower():
        print("⚠️ Balance too low! Add funds at https://hiveops.io/developer/billing")
        # Notify admin, pause processing, etc.
    raise

4. Rate Limit Errors (429)

Limits:

60 requests per minute
150,000 tokens per minute

Error Response Headers:

X-RateLimit-Limit-Requests: 60
X-RateLimit-Remaining-Requests: 0
X-RateLimit-Reset-Requests: 2026-03-20T12:30:00Z

Solution: Implement Exponential Backoff

Python

import time
import random
from openai import OpenAI, RateLimitError

client = OpenAI(
    api_key="sk-YOUR-API-KEY",
    base_url="https://ai.hiveops.io"
)

def call_with_retry(max_retries=5):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="llama3:8b-instruct-q8_0",
                messages=[{"role": "user", "content": "Hello"}]
            )
            return response

        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise  # Give up after max retries

            # Exponential backoff: 2^attempt seconds + jitter
            wait_time = (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limited. Retrying in {wait_time:.2f}s...")
            time.sleep(wait_time)

        except Exception as e:
            print(f"Unexpected error: {e}")
            raise

response = call_with_retry()
print(response.choices[0].message.content)

JavaScript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-YOUR-API-KEY",
  baseURL: "https://ai.hiveops.io",
});

async function callWithRetry(maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await client.chat.completions.create({
        model: "llama3:8b-instruct-q8_0",
        messages: [{ role: "user", content: "Hello" }],
      });
      return response;
    } catch (error) {
      if (error.status === 429 && attempt < maxRetries - 1) {
        // Exponential backoff
        const waitTime = Math.pow(2, attempt) * 1000 + Math.random() * 1000;
        console.log(`Rate limited. Retrying in ${waitTime / 1000}s...`);
        await new Promise((resolve) => setTimeout(resolve, waitTime));
      } else {
        throw error;
      }
    }
  }
}

const response = await callWithRetry();
console.log(response.choices[0].message.content);

5. Server Errors (500, 503)

Causes:

Temporary server overload
Model inference timeout
Internal system errors

Solution: Retry with Backoff

from openai import OpenAI, APIError
import time

client = OpenAI(
    api_key="sk-YOUR-API-KEY",
    base_url="https://ai.hiveops.io"
)

def call_with_server_retry(max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="llama3:8b-instruct-q8_0",
                messages=[{"role": "user", "content": "Hello"}],
                timeout=30  # 30 second timeout
            )
            return response

        except APIError as e:
            if e.status_code in [500, 503] and attempt < max_retries - 1:
                wait_time = 2 ** attempt  # 1s, 2s, 4s
                print(f"Server error. Retrying in {wait_time}s...")
                time.sleep(wait_time)
            else:
                raise

response = call_with_server_retry()

Complete Error Handling Template

Python Production Template

import time
import random
from openai import OpenAI, APIError, RateLimitError, AuthenticationError
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

client = OpenAI(
    api_key="sk-YOUR-API-KEY",
    base_url="https://ai.hiveops.io"
)

def robust_api_call(
    messages,
    model="llama3:8b-instruct-q8_0",
    max_retries=5,
    timeout=30
):
    """
    Call HiveOps API with comprehensive error handling
    """
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model=model,
                messages=messages,
                timeout=timeout
            )

            logger.info(f"Success after {attempt + 1} attempt(s)")
            return response

        except AuthenticationError as e:
            # Don't retry auth errors
            logger.error(f"Authentication failed: {e}")
            raise

        except RateLimitError as e:
            if attempt == max_retries - 1:
                logger.error("Rate limit exceeded after max retries")
                raise

            wait_time = (2 ** attempt) + random.uniform(0, 1)
            logger.warning(f"Rate limited. Retrying in {wait_time:.2f}s...")
            time.sleep(wait_time)

        except APIError as e:
            # Check for specific error types
            if "insufficient_balance" in str(e).lower():
                logger.error("Insufficient balance. Cannot retry.")
                raise

            if e.status_code in [500, 503] and attempt < max_retries - 1:
                wait_time = 2 ** attempt
                logger.warning(f"Server error {e.status_code}. Retrying in {wait_time}s...")
                time.sleep(wait_time)
            else:
                logger.error(f"API error: {e}")
                raise

        except Exception as e:
            logger.error(f"Unexpected error: {e}")
            raise

    raise Exception("Max retries exceeded")

# Usage
try:
    response = robust_api_call(
        messages=[{"role": "user", "content": "Hello!"}],
        model="llama3:8b-instruct-q8_0"
    )
    print(response.choices[0].message.content)

except Exception as e:
    print(f"Failed after all retries: {e}")
    # Handle gracefully (log, alert, fallback, etc.)

TypeScript Production Template

import OpenAI from "openai";
import type { ChatCompletionMessageParam } from "openai/resources/chat";

const client = new OpenAI({
  apiKey: process.env.HIVEOPS_API_KEY!,
  baseURL: "https://ai.hiveops.io",
});

interface RetryOptions {
  maxRetries?: number;
  timeout?: number;
  onRetry?: (attempt: number, error: any) => void;
}

async function robustApiCall(
  messages: ChatCompletionMessageParam[],
  model: string = "llama3:8b-instruct-q8_0",
  options: RetryOptions = {},
) {
  const { maxRetries = 5, timeout = 30000, onRetry } = options;

  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await client.chat.completions.create({
        model,
        messages,
        timeout,
      });

      console.log(`Success after ${attempt + 1} attempt(s)`);
      return response;
    } catch (error: any) {
      // Authentication errors - don't retry
      if (error.status === 401) {
        console.error("Authentication failed");
        throw error;
      }

      // Insufficient balance - don't retry
      if (error.message?.includes("insufficient_balance")) {
        console.error("Insufficient balance");
        throw error;
      }

      // Rate limit - retry with backoff
      if (error.status === 429 && attempt < maxRetries - 1) {
        const waitTime = Math.pow(2, attempt) * 1000 + Math.random() * 1000;
        console.warn(`Rate limited. Retrying in ${waitTime / 1000}s...`);
        onRetry?.(attempt, error);
        await new Promise((resolve) => setTimeout(resolve, waitTime));
        continue;
      }

      // Server errors - retry
      if ([500, 503].includes(error.status) && attempt < maxRetries - 1) {
        const waitTime = Math.pow(2, attempt) * 1000;
        console.warn(
          `Server error ${error.status}. Retrying in ${waitTime / 1000}s...`,
        );
        onRetry?.(attempt, error);
        await new Promise((resolve) => setTimeout(resolve, waitTime));
        continue;
      }

      // Give up
      console.error("API call failed:", error);
      throw error;
    }
  }

  throw new Error("Max retries exceeded");
}

// Usage
try {
  const response = await robustApiCall(
    [{ role: "user", content: "Hello!" }],
    "llama3:8b-instruct-q8_0",
    {
      maxRetries: 5,
      onRetry: (attempt, error) => {
        console.log(`Retry ${attempt + 1}: ${error.message}`);
      },
    },
  );

  console.log(response.choices[0].message.content);
} catch (error) {
  console.error("Failed after all retries:", error);
  // Handle gracefully
}

Common Issues & Solutions

Issue 1: "Model not found"

Error:

Invalid model 'gpt-3.5-turbo' specified

Cause: Using OpenAI model names instead of HiveOps models.

Solution: Use HiveOps model names:

OpenAI Model	HiveOps Equivalent
`gpt-3.5-turbo`	`llama3:8b-instruct-q8_0`
`gpt-4`	`llama-3-70b-instruct`

# Wrong
model="gpt-3.5-turbo"

# Correct
model="llama3:8b-instruct-q8_0"

Issue 2: Requests Still Going to OpenAI

Symptom: Requests work, but you're being charged by OpenAI, not HiveOps.

Cause: base_url not set.

Solution:

# Make sure base_url is set
client = OpenAI(
    api_key="sk-YOUR-HIVEOPS-KEY",  # HiveOps key, not OpenAI
    base_url="https://ai.hiveops.io"  # Must include this!
)

Issue 3: Slow Response Times

Symptoms:

Requests timing out
High latency (>5 seconds)

Causes & Solutions:

Large max_tokens:

# Slow (generates up to 4000 tokens)
response = client.chat.completions.create(
    model="llama3:8b-instruct-q8_0",
    messages=[...],
    max_tokens=4000
)

# Faster (limit to what you need)
response = client.chat.completions.create(
    model="llama3:8b-instruct-q8_0",
    messages=[...],
    max_tokens=500  # Only generate what you need
)

Long conversation history:

# Slow (processing 50+ messages)
messages = conversation_history  # 50 messages

# Faster (use sliding window)
messages = [
    {"role": "system", "content": system_prompt},
    *conversation_history[-10:]  # Last 10 messages only
]

Choose faster model:

Model	Avg Latency
`mistral-7b-instruct-v0.3`	~200ms
`llama3:8b-instruct-q8_0`	~300ms
`gemma-2-9b-it`	~350ms
`llama-3-70b-instruct`	~800ms

Issue 4: Streaming Not Working

Problem: Streaming doesn't output incrementally.

Cause: Not handling streaming correctly.

Solution (Python):

# Correct streaming implementation
stream = client.chat.completions.create(
    model="llama3:8b-instruct-q8_0",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)  # flush=True is key

Solution (JavaScript):

const stream = await client.chat.completions.create({
  model: "llama3:8b-instruct-q8_0",
  messages: [{ role: "user", content: "Count to 10" }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || "";
  process.stdout.write(content); // Write incrementally
}

Issue 5: High Costs

Problem: Bills higher than expected.

Solutions:

Monitor token usage:

response = client.chat.completions.create(...)

print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total: {response.usage.total_tokens}")

# Calculate cost (Llama 3 8B)
input_cost = (response.usage.prompt_tokens / 1_000_000) * 0.01
output_cost = (response.usage.completion_tokens / 1_000_000) * 0.02
print(f"Cost: ${input_cost + output_cost:.6f}")

Set budget alerts (coming soon)
Use cheaper models for simple tasks:

Task Complexity	Model	Cost Multiplier
Simple	`mistral-7b-instruct-v0.3`	1x (baseline)
Medium	`llama3:8b-instruct-q8_0`	10x
Complex	`llama-3-70b-instruct`	100x

Debugging Checklist

When things aren't working:

Is base_url set to https://ai.hiveops.io?
Is API key correct and starts with sk-?
Is model name correct? (Use llama3:8b-instruct-q8_0, not gpt-3.5-turbo)
Is account balance positive?
Are you within rate limits? (60 req/min)
Is request format valid? (Check JSON syntax)
Did you implement retry logic for 429/500/503 errors?
Are you using the latest SDK version?
- Python: pip install --upgrade openai
- JavaScript: npm install openai@latest

Testing Error Handling

Test Rate Limiting

import time

# Send 65 requests quickly to trigger rate limit
for i in range(65):
    try:
        response = client.chat.completions.create(
            model="llama3:8b-instruct-q8_0",
            messages=[{"role": "user", "content": f"Message {i}"}]
        )
        print(f"Request {i}: Success")
    except RateLimitError as e:
        print(f"Request {i}: Rate limited!")
        break
    time.sleep(0.1)

Test Balance Check

# Check balance before expensive operation
def check_balance_first():
    try:
        # Make a cheap request to verify account is active
        client.chat.completions.create(
            model="mistral-7b-instruct-v0.3",
            messages=[{"role": "user", "content": "test"}],
            max_tokens=1
        )
        return True
    except APIError as e:
        if "insufficient_balance" in str(e).lower():
            print("⚠️ Balance too low!")
            return False
        raise

if check_balance_first():
    # Proceed with main operation
    response = client.chat.completions.create(...)

Support & Resources

When to Contact Support

Contact [email protected] if:

Persistent 500 errors (not resolved by retries)
Suspected API key compromised
Billing discrepancies
Account locked or suspended
Feature requests

Do NOT contact support for:

How to use the API (read documentation first)
Model quality issues (models are third-party)
Rate limit increases (standard for all users)

Helpful Resources

📚 API Reference - Complete endpoint documentation
🚀 Quickstart Guide - Get started in 5 minutes
💬 Discord Community - Ask questions, get help
📧 Email Support: [email protected]
🐛 Report Bugs: [email protected]

Last Updated: March 20, 2026