HiveOps Logo
HiveOps
/Error Handling

Error Handling

Error codes and troubleshooting

Error Handling & Troubleshooting

Comprehensive guide to handling errors, implementing retries, and troubleshooting common issues with the HiveOps API.


HTTP Status Codes

CodeMeaningDescriptionAction
200OKRequest successfulContinue normally
400Bad RequestInvalid request format or parametersFix request, don't retry
401UnauthorizedInvalid or missing API keyCheck API key
403ForbiddenInsufficient balance or blocked accountAdd funds or contact support
429Too Many RequestsRate limit exceededImplement exponential backoff
500Internal Server ErrorServer-side errorRetry with exponential backoff
503Service UnavailableTemporary overload or maintenanceRetry after delay

Error Response Format

All errors return JSON in this format:

{
  "error": {
    "message": "Human-readable error description",
    "type": "error_type",
    "code": "error_code"
  }
}

Example Errors

401 Unauthorized:

{
  "error": {
    "message": "Invalid API key provided",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

429 Rate Limit:

{
  "error": {
    "message": "Rate limit exceeded. Please try again later.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

403 Insufficient Balance:

{
  "error": {
    "message": "Insufficient account balance. Please add funds to continue.",
    "type": "insufficient_quota",
    "code": "insufficient_balance"
  }
}

Error Types

1. Invalid Request Errors (400)

Common Causes:

  • Missing required parameters
  • Invalid parameter values
  • Malformed JSON
  • Model name not recognized

Example:

from openai import OpenAI

client = OpenAI(
    api_key="sk-YOUR-API-KEY",
    base_url="https://ai.hiveops.io"
)

try:
    response = client.chat.completions.create(
        model="invalid-model-name",  # Wrong model
        messages=[{"role": "user", "content": "Hi"}]
    )
except Exception as e:
    print(f"Error: {e}")
    # Error: Invalid model 'invalid-model-name' specified

Solution:

  • Validate request parameters before sending
  • Use correct model names (see API Reference)
  • Check request format against documentation

2. Authentication Errors (401)

Common Causes:

  • Missing Authorization header
  • Invalid API key
  • Expired API key (90 days)
  • Revoked API key

Example:

try:
    response = client.chat.completions.create(
        model="llama3:8b-instruct-q8_0",
        messages=[{"role": "user", "content": "Hi"}]
    )
except openai.AuthenticationError as e:
    print(f"Authentication failed: {e}")

Solutions:

  1. Verify API Key:
import os

api_key = os.getenv("HIVEOPS_API_KEY")
if not api_key or not api_key.startswith("sk-"):
    raise ValueError("Invalid or missing API key")
  1. Check Key Expiration:

  2. Ensure Correct Header:

curl https://ai.hiveops.io/models \
  -H "Authorization: Bearer sk-YOUR-API-KEY"  # Must include "Bearer "

3. Insufficient Balance (403)

Cause: Your account balance is $0 or negative.

Error Message:

Insufficient account balance. Please add funds to continue.

Solutions:

  1. Check Balance:

    • Go to Dashboard
    • View current balance in the top right
  2. Add Funds:

    • Click "Add Funds" in dashboard
    • Minimum top-up: $10
    • Maximum: $1,000 per transaction
  3. Handle in Code:

try:
    response = client.chat.completions.create(...)
except openai.APIError as e:
    if "insufficient_balance" in str(e).lower():
        print("⚠️ Balance too low! Add funds at https://hiveops.io/developer/billing")
        # Notify admin, pause processing, etc.
    raise

4. Rate Limit Errors (429)

Limits:

  • 60 requests per minute
  • 150,000 tokens per minute

Error Response Headers:

X-RateLimit-Limit-Requests: 60
X-RateLimit-Remaining-Requests: 0
X-RateLimit-Reset-Requests: 2026-03-20T12:30:00Z

Solution: Implement Exponential Backoff

Python

import time
import random
from openai import OpenAI, RateLimitError

client = OpenAI(
    api_key="sk-YOUR-API-KEY",
    base_url="https://ai.hiveops.io"
)

def call_with_retry(max_retries=5):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="llama3:8b-instruct-q8_0",
                messages=[{"role": "user", "content": "Hello"}]
            )
            return response

        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise  # Give up after max retries

            # Exponential backoff: 2^attempt seconds + jitter
            wait_time = (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limited. Retrying in {wait_time:.2f}s...")
            time.sleep(wait_time)

        except Exception as e:
            print(f"Unexpected error: {e}")
            raise

response = call_with_retry()
print(response.choices[0].message.content)

JavaScript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-YOUR-API-KEY",
  baseURL: "https://ai.hiveops.io",
});

async function callWithRetry(maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await client.chat.completions.create({
        model: "llama3:8b-instruct-q8_0",
        messages: [{ role: "user", content: "Hello" }],
      });
      return response;
    } catch (error) {
      if (error.status === 429 && attempt < maxRetries - 1) {
        // Exponential backoff
        const waitTime = Math.pow(2, attempt) * 1000 + Math.random() * 1000;
        console.log(`Rate limited. Retrying in ${waitTime / 1000}s...`);
        await new Promise((resolve) => setTimeout(resolve, waitTime));
      } else {
        throw error;
      }
    }
  }
}

const response = await callWithRetry();
console.log(response.choices[0].message.content);

5. Server Errors (500, 503)

Causes:

  • Temporary server overload
  • Model inference timeout
  • Internal system errors

Solution: Retry with Backoff

from openai import OpenAI, APIError
import time

client = OpenAI(
    api_key="sk-YOUR-API-KEY",
    base_url="https://ai.hiveops.io"
)

def call_with_server_retry(max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="llama3:8b-instruct-q8_0",
                messages=[{"role": "user", "content": "Hello"}],
                timeout=30  # 30 second timeout
            )
            return response

        except APIError as e:
            if e.status_code in [500, 503] and attempt < max_retries - 1:
                wait_time = 2 ** attempt  # 1s, 2s, 4s
                print(f"Server error. Retrying in {wait_time}s...")
                time.sleep(wait_time)
            else:
                raise

response = call_with_server_retry()

Complete Error Handling Template

Python Production Template

import time
import random
from openai import OpenAI, APIError, RateLimitError, AuthenticationError
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

client = OpenAI(
    api_key="sk-YOUR-API-KEY",
    base_url="https://ai.hiveops.io"
)

def robust_api_call(
    messages,
    model="llama3:8b-instruct-q8_0",
    max_retries=5,
    timeout=30
):
    """
    Call HiveOps API with comprehensive error handling
    """
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model=model,
                messages=messages,
                timeout=timeout
            )

            logger.info(f"Success after {attempt + 1} attempt(s)")
            return response

        except AuthenticationError as e:
            # Don't retry auth errors
            logger.error(f"Authentication failed: {e}")
            raise

        except RateLimitError as e:
            if attempt == max_retries - 1:
                logger.error("Rate limit exceeded after max retries")
                raise

            wait_time = (2 ** attempt) + random.uniform(0, 1)
            logger.warning(f"Rate limited. Retrying in {wait_time:.2f}s...")
            time.sleep(wait_time)

        except APIError as e:
            # Check for specific error types
            if "insufficient_balance" in str(e).lower():
                logger.error("Insufficient balance. Cannot retry.")
                raise

            if e.status_code in [500, 503] and attempt < max_retries - 1:
                wait_time = 2 ** attempt
                logger.warning(f"Server error {e.status_code}. Retrying in {wait_time}s...")
                time.sleep(wait_time)
            else:
                logger.error(f"API error: {e}")
                raise

        except Exception as e:
            logger.error(f"Unexpected error: {e}")
            raise

    raise Exception("Max retries exceeded")

# Usage
try:
    response = robust_api_call(
        messages=[{"role": "user", "content": "Hello!"}],
        model="llama3:8b-instruct-q8_0"
    )
    print(response.choices[0].message.content)

except Exception as e:
    print(f"Failed after all retries: {e}")
    # Handle gracefully (log, alert, fallback, etc.)

TypeScript Production Template

import OpenAI from "openai";
import type { ChatCompletionMessageParam } from "openai/resources/chat";

const client = new OpenAI({
  apiKey: process.env.HIVEOPS_API_KEY!,
  baseURL: "https://ai.hiveops.io",
});

interface RetryOptions {
  maxRetries?: number;
  timeout?: number;
  onRetry?: (attempt: number, error: any) => void;
}

async function robustApiCall(
  messages: ChatCompletionMessageParam[],
  model: string = "llama3:8b-instruct-q8_0",
  options: RetryOptions = {},
) {
  const { maxRetries = 5, timeout = 30000, onRetry } = options;

  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await client.chat.completions.create({
        model,
        messages,
        timeout,
      });

      console.log(`Success after ${attempt + 1} attempt(s)`);
      return response;
    } catch (error: any) {
      // Authentication errors - don't retry
      if (error.status === 401) {
        console.error("Authentication failed");
        throw error;
      }

      // Insufficient balance - don't retry
      if (error.message?.includes("insufficient_balance")) {
        console.error("Insufficient balance");
        throw error;
      }

      // Rate limit - retry with backoff
      if (error.status === 429 && attempt < maxRetries - 1) {
        const waitTime = Math.pow(2, attempt) * 1000 + Math.random() * 1000;
        console.warn(`Rate limited. Retrying in ${waitTime / 1000}s...`);
        onRetry?.(attempt, error);
        await new Promise((resolve) => setTimeout(resolve, waitTime));
        continue;
      }

      // Server errors - retry
      if ([500, 503].includes(error.status) && attempt < maxRetries - 1) {
        const waitTime = Math.pow(2, attempt) * 1000;
        console.warn(
          `Server error ${error.status}. Retrying in ${waitTime / 1000}s...`,
        );
        onRetry?.(attempt, error);
        await new Promise((resolve) => setTimeout(resolve, waitTime));
        continue;
      }

      // Give up
      console.error("API call failed:", error);
      throw error;
    }
  }

  throw new Error("Max retries exceeded");
}

// Usage
try {
  const response = await robustApiCall(
    [{ role: "user", content: "Hello!" }],
    "llama3:8b-instruct-q8_0",
    {
      maxRetries: 5,
      onRetry: (attempt, error) => {
        console.log(`Retry ${attempt + 1}: ${error.message}`);
      },
    },
  );

  console.log(response.choices[0].message.content);
} catch (error) {
  console.error("Failed after all retries:", error);
  // Handle gracefully
}

Common Issues & Solutions

Issue 1: "Model not found"

Error:

Invalid model 'gpt-3.5-turbo' specified

Cause: Using OpenAI model names instead of HiveOps models.

Solution: Use HiveOps model names:

OpenAI ModelHiveOps Equivalent
gpt-3.5-turbollama3:8b-instruct-q8_0
gpt-4llama-3-70b-instruct
# Wrong
model="gpt-3.5-turbo"

# Correct
model="llama3:8b-instruct-q8_0"

Issue 2: Requests Still Going to OpenAI

Symptom: Requests work, but you're being charged by OpenAI, not HiveOps.

Cause: base_url not set.

Solution:

# Make sure base_url is set
client = OpenAI(
    api_key="sk-YOUR-HIVEOPS-KEY",  # HiveOps key, not OpenAI
    base_url="https://ai.hiveops.io"  # Must include this!
)

Issue 3: Slow Response Times

Symptoms:

  • Requests timing out
  • High latency (>5 seconds)

Causes & Solutions:

  1. Large max_tokens:
# Slow (generates up to 4000 tokens)
response = client.chat.completions.create(
    model="llama3:8b-instruct-q8_0",
    messages=[...],
    max_tokens=4000
)

# Faster (limit to what you need)
response = client.chat.completions.create(
    model="llama3:8b-instruct-q8_0",
    messages=[...],
    max_tokens=500  # Only generate what you need
)
  1. Long conversation history:
# Slow (processing 50+ messages)
messages = conversation_history  # 50 messages

# Faster (use sliding window)
messages = [
    {"role": "system", "content": system_prompt},
    *conversation_history[-10:]  # Last 10 messages only
]
  1. Choose faster model:
ModelAvg Latency
mistral-7b-instruct-v0.3~200ms
llama3:8b-instruct-q8_0~300ms
gemma-2-9b-it~350ms
llama-3-70b-instruct~800ms

Issue 4: Streaming Not Working

Problem: Streaming doesn't output incrementally.

Cause: Not handling streaming correctly.

Solution (Python):

# Correct streaming implementation
stream = client.chat.completions.create(
    model="llama3:8b-instruct-q8_0",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)  # flush=True is key

Solution (JavaScript):

const stream = await client.chat.completions.create({
  model: "llama3:8b-instruct-q8_0",
  messages: [{ role: "user", content: "Count to 10" }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || "";
  process.stdout.write(content); // Write incrementally
}

Issue 5: High Costs

Problem: Bills higher than expected.

Solutions:

  1. Monitor token usage:
response = client.chat.completions.create(...)

print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total: {response.usage.total_tokens}")

# Calculate cost (Llama 3 8B)
input_cost = (response.usage.prompt_tokens / 1_000_000) * 0.01
output_cost = (response.usage.completion_tokens / 1_000_000) * 0.02
print(f"Cost: ${input_cost + output_cost:.6f}")
  1. Set budget alerts (coming soon)

  2. Use cheaper models for simple tasks:

Task ComplexityModelCost Multiplier
Simplemistral-7b-instruct-v0.31x (baseline)
Mediumllama3:8b-instruct-q8_010x
Complexllama-3-70b-instruct100x

Debugging Checklist

When things aren't working:

  • Is base_url set to https://ai.hiveops.io?
  • Is API key correct and starts with sk-?
  • Is model name correct? (Use llama3:8b-instruct-q8_0, not gpt-3.5-turbo)
  • Is account balance positive?
  • Are you within rate limits? (60 req/min)
  • Is request format valid? (Check JSON syntax)
  • Did you implement retry logic for 429/500/503 errors?
  • Are you using the latest SDK version?
    • Python: pip install --upgrade openai
    • JavaScript: npm install openai@latest

Testing Error Handling

Test Rate Limiting

import time

# Send 65 requests quickly to trigger rate limit
for i in range(65):
    try:
        response = client.chat.completions.create(
            model="llama3:8b-instruct-q8_0",
            messages=[{"role": "user", "content": f"Message {i}"}]
        )
        print(f"Request {i}: Success")
    except RateLimitError as e:
        print(f"Request {i}: Rate limited!")
        break
    time.sleep(0.1)

Test Balance Check

# Check balance before expensive operation
def check_balance_first():
    try:
        # Make a cheap request to verify account is active
        client.chat.completions.create(
            model="mistral-7b-instruct-v0.3",
            messages=[{"role": "user", "content": "test"}],
            max_tokens=1
        )
        return True
    except APIError as e:
        if "insufficient_balance" in str(e).lower():
            print("⚠️ Balance too low!")
            return False
        raise

if check_balance_first():
    # Proceed with main operation
    response = client.chat.completions.create(...)

Support & Resources

When to Contact Support

Contact [email protected] if:

  • Persistent 500 errors (not resolved by retries)
  • Suspected API key compromised
  • Billing discrepancies
  • Account locked or suspended
  • Feature requests

Do NOT contact support for:

  • How to use the API (read documentation first)
  • Model quality issues (models are third-party)
  • Rate limit increases (standard for all users)

Helpful Resources


Last Updated: March 20, 2024