Open Source Models, Enterprise Infrastructure
We run the most popular open source models on our servers. Same API, zero setup, full control.
Data stays on our servers
Your data never leaves our infrastructure. No third-party routing for self-hosted models.
Dedicated GPU allocation
Models run on dedicated NVIDIA A100/H100 GPUs with guaranteed compute capacity.
99.9% SLA
Enterprise-grade availability with automated failover and health monitoring.
20 Production-Ready Models
All models run on our optimized inference stack and dedicated GPU infrastructure. Use them with the same OpenAI-compatible API you already know.
Llama 4 Behemoth
Meta · 2T MoE
Input / 1M
$0.90
Output / 1M
$2.70
airouter-cloud/llama-4-behemoth
Llama 4 Maverick
Meta · 400B MoE
Input / 1M
$0.20
Output / 1M
$0.60
airouter-cloud/llama-4-maverick
Llama 4 Scout
Meta · 109B MoE
Input / 1M
$0.10
Output / 1M
$0.30
airouter-cloud/llama-4-scout
Qwen 3 235B
Alibaba · 235B MoE
Input / 1M
$0.25
Output / 1M
$0.75
airouter-cloud/qwen-3-235b
Qwen 3 32B
Alibaba · 32B
Input / 1M
$0.08
Output / 1M
$0.16
airouter-cloud/qwen-3-32b
Qwen 3 8B
Alibaba · 8B
Input / 1M
$0.02
Output / 1M
$0.04
airouter-cloud/qwen-3-8b
DeepSeek V3.2
DeepSeek · 685B MoE
Input / 1M
$0.27
Output / 1M
$1.10
airouter-cloud/deepseek-v3.2
DeepSeek R1.1
DeepSeek · 685B MoE
Input / 1M
$0.55
Output / 1M
$2.19
airouter-cloud/deepseek-r1.1
Mistral Large 3
Mistral · 160B MoE
Input / 1M
$2.00
Output / 1M
$6.00
airouter-cloud/mistral-large-3
Mistral Medium 3
Mistral · 70B
Input / 1M
$0.40
Output / 1M
$2.00
airouter-cloud/mistral-medium-3
Mistral Small 3.2
Mistral · 24B
Input / 1M
$0.10
Output / 1M
$0.30
airouter-cloud/mistral-small-3.2
Gemma 3 27B
Google · 27B
Input / 1M
$0.07
Output / 1M
$0.14
airouter-cloud/gemma-3-27b
Gemma 3 12B
Google · 12B
Input / 1M
$0.04
Output / 1M
$0.08
airouter-cloud/gemma-3-12b
Phi-5
Microsoft · 14B
Input / 1M
$0.04
Output / 1M
$0.08
airouter-cloud/phi-5
Command A
Cohere · 111B
Input / 1M
$0.50
Output / 1M
$1.50
airouter-cloud/command-a
Qwen 3 Coder 32B
Alibaba · 32B
Input / 1M
$0.08
Output / 1M
$0.16
airouter-cloud/qwen-3-coder-32b
DeepSeek Coder V2.5
DeepSeek · 236B MoE
Input / 1M
$0.14
Output / 1M
$0.28
airouter-cloud/deepseek-coder-v2.5
StarCoder3
BigCode · 22B
Input / 1M
$0.05
Output / 1M
$0.10
airouter-cloud/starcoder3
Llama Guard 4
Meta · 12B
Input / 1M
$0.03
Output / 1M
$0.06
airouter-cloud/llama-guard-4
Mistral Nemo 2
Mistral · 12B
Input / 1M
$0.03
Output / 1M
$0.06
airouter-cloud/mistral-nemo-2
Custom Model Deployment
Need a specific model? We can deploy any HuggingFace model on our infrastructure within 24 hours. Fine-tuned models, custom architectures, private weights — we handle it all.
Same API you already use
Self-hosted models use the same endpoint and request format. Just use airouter-cloud/ as the provider prefix.
from openai import OpenAI
client = OpenAI(
base_url="https://api.airouter.kz/api/v1",
api_key="air_live_your_key_here"
)
# Use a self-hosted model — same API as any other model
response = client.chat.completions.create(
model="airouter-cloud/llama-4-maverick",
messages=[
{"role": "user", "content": "Write a Python quicksort function"}
]
)
print(response.choices[0].message.content)Ready to get started?
Start using self-hosted models today with the same API key and endpoint. No configuration needed.