Build custom chat applications using the Comput3 Network API with OpenAI-compatible endpoints.
Quick Start
Make Your First Request
Send a simple chat completion request: curl -X POST "https://api.comput3.ai/v1/chat/completions" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "hermes4:70b",
"messages": [
{"role": "user", "content": "Hello, world!"}
]
}'
Handle the Response
Process the JSON response: {
"id" : "chatcmpl-123" ,
"object" : "chat.completion" ,
"created" : 1677652288 ,
"model" : "hermes4:70b" ,
"choices" : [{
"index" : 0 ,
"message" : {
"role" : "assistant" ,
"content" : "Hello! How can I help you today?"
},
"finish_reason" : "stop"
}],
"usage" : {
"prompt_tokens" : 9 ,
"completion_tokens" : 12 ,
"total_tokens" : 21
}
}
API Endpoints
Chat Completions
Endpoint : POST /v1/chat/completions
Create a chat completion with conversation context.
The model to use for completion. Available models:
hermes4:70b - Hermes 4 model (70B parameters) for advanced reasoning
hermes4:405b - Largest Hermes 4 model (405B parameters) for complex tasks
deepseek-v3.1 - Latest DeepSeek model for coding and general tasks
kimi-k2 - Kimi K2 model for general conversation
qwen3-coder:480b - Massive Qwen3 Coder model for advanced coding tasks
qwen3-max - Large-scale reasoning and analysis
grok-code-fast-1 - Fast coding assistance
claude-sonnet-4 - Creative writing and analysis
Array of message objects representing the conversation history. Show Message Object Structure
The role of the message author: system, user, or assistant
The content of the message
Optional name for the message author
Controls randomness. Range: 0.0 to 2.0
Maximum number of tokens to generate
Whether to stream partial message deltas
Sequences where the API will stop generating tokens
SDK Examples
Python
OpenAI SDK
Requests Library
Async Example
import openai
client = openai.OpenAI(
api_key = "YOUR_COMPUT3_API_KEY" ,
base_url = "https://api.comput3.ai/v1"
)
response = client.chat.completions.create(
model = "hermes4:70b" ,
messages = [
{ "role" : "system" , "content" : "You are a helpful assistant." },
{ "role" : "user" , "content" : "Explain quantum computing" }
],
temperature = 0.7 ,
max_tokens = 500
)
print (response.choices[ 0 ].message.content)
JavaScript/Node.js
OpenAI SDK
Fetch API
React Hook
import OpenAI from 'openai' ;
const client = new OpenAI ({
apiKey: 'YOUR_COMPUT3_API_KEY' ,
baseURL: 'https://api.comput3.ai/v1'
});
const response = await client . chat . completions . create ({
model: 'gpt-4' ,
messages: [
{ role: 'system' , content: 'You are a helpful assistant.' },
{ role: 'user' , content: 'Explain blockchain technology' }
],
temperature: 0.7 ,
max_tokens: 500
});
console . log ( response . choices [ 0 ]. message . content );
Streaming Responses
Enable real-time response streaming for better user experience:
Python Streaming
JavaScript Streaming
import openai
client = openai.OpenAI(
api_key = "YOUR_COMPUT3_API_KEY" ,
base_url = "https://api.comput3.ai/v1"
)
stream = client.chat.completions.create(
model = "hermes4:70b" ,
messages = [{ "role" : "user" , "content" : "Write a short story" }],
stream = True
)
for chunk in stream:
if chunk.choices[ 0 ].delta.content is not None :
print (chunk.choices[ 0 ].delta.content, end = "" )
Advanced Features
Function Calling
Enable the model to call external functions:
functions = [
{
"name" : "get_weather" ,
"description" : "Get current weather for a location" ,
"parameters" : {
"type" : "object" ,
"properties" : {
"location" : {
"type" : "string" ,
"description" : "City name"
}
},
"required" : [ "location" ]
}
}
]
response = client.chat.completions.create(
model = "hermes4:70b" ,
messages = [
{ "role" : "user" , "content" : "What's the weather in New York?" }
],
functions = functions,
function_call = "auto"
)
Conversation Memory
Maintain conversation context across multiple requests:
class ChatSession :
def __init__ ( self , system_prompt = None ):
self .messages = []
if system_prompt:
self .messages.append({ "role" : "system" , "content" : system_prompt})
def send_message ( self , content ):
self .messages.append({ "role" : "user" , "content" : content})
response = client.chat.completions.create(
model = "hermes4:70b" ,
messages = self .messages
)
assistant_message = response.choices[ 0 ].message.content
self .messages.append({ "role" : "assistant" , "content" : assistant_message})
return assistant_message
# Usage
chat = ChatSession( "You are a helpful programming assistant." )
response1 = chat.send_message( "How do I create a Python list?" )
response2 = chat.send_message( "Can you show me an example?" )
Error Handling
Implement robust error handling for production applications:
Python Error Handling
JavaScript Error Handling
import openai
from openai import OpenAI
import time
client = OpenAI(
api_key = "YOUR_COMPUT3_API_KEY" ,
base_url = "https://api.comput3.ai/v1"
)
def chat_with_retry ( messages , max_retries = 3 ):
for attempt in range (max_retries):
try :
response = client.chat.completions.create(
model = "hermes4:70b" ,
messages = messages
)
return response.choices[ 0 ].message.content
except openai.RateLimitError:
if attempt < max_retries - 1 :
time.sleep( 2 ** attempt) # Exponential backoff
continue
raise
except openai.APIError as e:
print ( f "API error: { e } " )
raise
except Exception as e:
print ( f "Unexpected error: { e } " )
raise
Rate Limiting and Optimization
Managing Rate Limits
Implement a queue system for high-volume applications: import asyncio
from asyncio import Semaphore
class RateLimitedClient :
def __init__ ( self , max_concurrent = 5 ):
self .semaphore = Semaphore(max_concurrent)
self .client = OpenAI(
api_key = "YOUR_COMPUT3_API_KEY" ,
base_url = "https://api.comput3.ai/v1"
)
async def chat_completion ( self , messages ):
async with self .semaphore:
response = await self .client.chat.completions.create(
model = "hermes4:70b" ,
messages = messages
)
return response.choices[ 0 ].message.content
Optimize token usage to reduce costs: def optimize_messages ( messages , max_tokens = 4000 ):
"""Truncate conversation history to fit within token limits"""
total_tokens = sum ( len (msg[ "content" ].split()) * 1.3 for msg in messages)
while total_tokens > max_tokens and len (messages) > 2 :
# Remove oldest messages but keep system message
if messages[ 0 ][ "role" ] == "system" :
messages.pop( 1 )
else :
messages.pop( 0 )
total_tokens = sum ( len (msg[ "content" ].split()) * 1.3 for msg in messages)
return messages
Best Practices
Security
Store API keys as environment variables
Use HTTPS for all requests
Implement proper authentication
Validate and sanitize user inputs
Performance
Use appropriate models for each task
Implement response caching
Use streaming for long responses
Monitor token usage and costs
Error Handling
Implement retry logic with exponential backoff
Handle rate limiting gracefully
Log errors for debugging
Provide fallback responses
User Experience
Show loading states during API calls
Implement typing indicators
Cache frequent responses
Provide offline functionality where possible