Models Deployment - Kloud Team

Deploying AI Models

Learn how to deploy models from the catalog and integrate them into your applications using the Kloud Team API.

Deployment Process

Select a Model

Click on the model card you want to deploy in the Models Catalog. Review the specifications and pricing.

Use A Model

Use a model from catalog details page, click on the Use This Model button to create an access token and deployment for you.

Use a model

Click Deploy to launch your model endpoint and access token.

deploy-model

Access Your Model

Once deployed, you can access your model endpoint and API key from the dashboard.

Getting Your Credentials:

Navigate to AI [Machine Learning] > Models > My Library in the dashboard to view your deployed models. Each model displays:

Endpoint URL – Your model's API endpoint (e.g., https://llm.kloud.team)
Access Key – Your authentication token for API requests
Model Name – The identifier to use in API calls

You can copy both the endpoint URL and access key directly from the dashboard by clicking the copy icons.

AI Models Dashboard

Dashboard Features:

Monitor usage and performance
View token consumption
Copy endpoint and access key

API Usage

Chat Completion API

The standard endpoint for chat-based models (Claude, GPT, Grok, etc.):

curl https://llm.kloud.team/v1/chat/completions \
  -H "Authorization: Bearer YOUR_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-model-name",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Replace YOUR_ACCESS_KEY with the access key from your dashboard and your-model-name with your deployed model identifier.

Request Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Your deployed model identifier
`messages`	array	Yes	Array of message objects with `role` and `content`
`temperature`	float	No	Sampling temperature (0-2), default: 1
`max_tokens`	integer	No	Maximum tokens in response
`top_p`	float	No	Nucleus sampling parameter (0-1)
`stream`	boolean	No	Enable streaming responses, default: false

Response Format

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "your-model-name",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help you today?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Code Examples

Python

import requests

url = "https://llm.kloud.team/v1/chat/completions"
headers = {
    "Authorization": "Bearer YOUR_ACCESS_KEY",
    "Content-Type": "application/json"
}
data = {
    "model": "your-model-name",
    "messages": [
        {"role": "user", "content": "Hello!"}
    ]
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

Node.js

const axios = require('axios');

const url = 'https://llm.kloud.team/v1/chat/completions';
const headers = {
  'Authorization': 'Bearer YOUR_ACCESS_KEY',
  'Content-Type': 'application/json'
};
const data = {
  model: 'your-model-name',
  messages: [
    { role: 'user', content: 'Hello!' }
  ]
};

axios.post(url, data, { headers })
  .then(response => console.log(response.data))
  .catch(error => console.error(error));

cURL

curl https://llm.kloud.team/v1/chat/completions \
  -H "Authorization: Bearer YOUR_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-model-name",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 150
  }'

Streaming Responses

Enable streaming for real-time response generation:

import requests
import json

url = "https://llm.kloud.team/v1/chat/completions"
headers = {
    "Authorization": "Bearer YOUR_ACCESS_KEY",
    "Content-Type": "application/json"
}
data = {
    "model": "your-model-name",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": True
}

response = requests.post(url, headers=headers, json=data, stream=True)

for line in response.iter_lines():
    if line:
        decoded_line = line.decode('utf-8')
        if decoded_line.startswith('data: '):
            chunk = json.loads(decoded_line[6:])
            if chunk['choices'][0]['delta'].get('content'):
                print(chunk['choices'][0]['delta']['content'], end='')

Authentication

All API requests require authentication using your Access Key:

Authorization: Bearer YOUR_ACCESS_KEY

Get your access key from AI [Machine Learning] > Models > My Library in the dashboard.

Security:

Keep your access key secure and never commit it to version control. Use environment variables to store credentials.

Error Handling

Common Error Codes

Code	Description	Solution
401	Unauthorized	Check your access key
404	Model not found	Verify model name in dashboard
429	Rate limit exceeded	Reduce request frequency
500	Server error	Retry with exponential backoff

Error Response Format

{
  "error": {
    "message": "Invalid API key provided",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

Rate Limits

Default rate limits apply per access key
Monitor usage in the dashboard
Contact support for increased limits

Best Practices

API Integration

Handle Errors Gracefully – Implement retry logic with exponential backoff
Monitor Token Usage – Track consumption to manage costs
Use Appropriate Models – Match model capabilities to your use case
Cache Responses – Reduce redundant API calls when possible
Set Timeouts – Prevent hanging requests

Performance Optimization

Use streaming for long responses
Implement request queuing for high-volume applications
Choose faster models (Haiku, Grok) for latency-sensitive tasks
Batch similar requests when possible

Catalog Overview Video Generation