Models Deployment

Deploy and access AI models from the Kloud Team catalog

Deploying AI Models

Learn how to deploy models from the catalog and integrate them into your applications using the Kloud Team API.

Deployment Process

1

Select a Model

Click on the model card you want to deploy in the Models Catalog. Review the specifications and pricing.

2

Use A Model

Use a model from catalog details page, click on the Use This Model button to create an access token and deployment for you.

Use a model

Click Deploy to launch your model endpoint and access token.

deploy-model

3

Access Your Model

Once deployed, you can access your model endpoint and API key from the dashboard.

Getting Your Credentials:

Navigate to AI [Machine Learning] > Models > My Library in the dashboard to view your deployed models. Each model displays:

  • Endpoint URL – Your model's API endpoint (e.g., https://llm.kloud.team)
  • Access Key – Your authentication token for API requests
  • Model Name – The identifier to use in API calls

You can copy both the endpoint URL and access key directly from the dashboard by clicking the copy icons.

AI Models Dashboard

Dashboard Features:

  • Monitor usage and performance
  • View token consumption
  • Copy endpoint and access key

API Usage

Chat Completion API

The standard endpoint for chat-based models (Claude, GPT, Grok, etc.):

curl https://llm.kloud.team/v1/chat/completions \
  -H "Authorization: Bearer YOUR_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-model-name",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Replace YOUR_ACCESS_KEY with the access key from your dashboard and your-model-name with your deployed model identifier.

Request Parameters

ParameterTypeRequiredDescription
modelstringYesYour deployed model identifier
messagesarrayYesArray of message objects with role and content
temperaturefloatNoSampling temperature (0-2), default: 1
max_tokensintegerNoMaximum tokens in response
top_pfloatNoNucleus sampling parameter (0-1)
streambooleanNoEnable streaming responses, default: false

Response Format

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "your-model-name",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help you today?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Code Examples

Python

import requests

url = "https://llm.kloud.team/v1/chat/completions"
headers = {
    "Authorization": "Bearer YOUR_ACCESS_KEY",
    "Content-Type": "application/json"
}
data = {
    "model": "your-model-name",
    "messages": [
        {"role": "user", "content": "Hello!"}
    ]
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

Node.js

const axios = require('axios');

const url = 'https://llm.kloud.team/v1/chat/completions';
const headers = {
  'Authorization': 'Bearer YOUR_ACCESS_KEY',
  'Content-Type': 'application/json'
};
const data = {
  model: 'your-model-name',
  messages: [
    { role: 'user', content: 'Hello!' }
  ]
};

axios.post(url, data, { headers })
  .then(response => console.log(response.data))
  .catch(error => console.error(error));

cURL

curl https://llm.kloud.team/v1/chat/completions \
  -H "Authorization: Bearer YOUR_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-model-name",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 150
  }'

Streaming Responses

Enable streaming for real-time response generation:

import requests
import json

url = "https://llm.kloud.team/v1/chat/completions"
headers = {
    "Authorization": "Bearer YOUR_ACCESS_KEY",
    "Content-Type": "application/json"
}
data = {
    "model": "your-model-name",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": True
}

response = requests.post(url, headers=headers, json=data, stream=True)

for line in response.iter_lines():
    if line:
        decoded_line = line.decode('utf-8')
        if decoded_line.startswith('data: '):
            chunk = json.loads(decoded_line[6:])
            if chunk['choices'][0]['delta'].get('content'):
                print(chunk['choices'][0]['delta']['content'], end='')

Authentication

All API requests require authentication using your Access Key:

Authorization: Bearer YOUR_ACCESS_KEY

Get your access key from AI [Machine Learning] > Models > My Library in the dashboard.

Security:

Keep your access key secure and never commit it to version control. Use environment variables to store credentials.

Error Handling

Common Error Codes

CodeDescriptionSolution
401UnauthorizedCheck your access key
404Model not foundVerify model name in dashboard
429Rate limit exceededReduce request frequency
500Server errorRetry with exponential backoff

Error Response Format

{
  "error": {
    "message": "Invalid API key provided",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

Rate Limits

  • Default rate limits apply per access key
  • Monitor usage in the dashboard
  • Contact support for increased limits

Best Practices

API Integration

  1. Handle Errors Gracefully – Implement retry logic with exponential backoff
  2. Monitor Token Usage – Track consumption to manage costs
  3. Use Appropriate Models – Match model capabilities to your use case
  4. Cache Responses – Reduce redundant API calls when possible
  5. Set Timeouts – Prevent hanging requests

Performance Optimization

  • Use streaming for long responses
  • Implement request queuing for high-volume applications
  • Choose faster models (Haiku, Grok) for latency-sensitive tasks
  • Batch similar requests when possible