Skip to content
20 models · Dedicated GPUs · Same API

Open Source Models, Enterprise Infrastructure

We run the most popular open source models on our servers. Same API, zero setup, full control.

Data stays on our servers

Your data never leaves our infrastructure. No third-party routing for self-hosted models.

Dedicated GPU allocation

Models run on dedicated NVIDIA A100/H100 GPUs with guaranteed compute capacity.

99.9% SLA

Enterprise-grade availability with automated failover and health monitoring.

20 Production-Ready Models

All models run on our optimized inference stack and dedicated GPU infrastructure. Use them with the same OpenAI-compatible API you already know.

Llama 4 Behemoth

Meta · 2T MoE

1M
GeneralCodeReasoning

Input / 1M

$0.90

Output / 1M

$2.70

airouter-cloud/llama-4-behemoth

Llama 4 Maverick

Meta · 400B MoE

1M
GeneralVisionCode

Input / 1M

$0.20

Output / 1M

$0.60

airouter-cloud/llama-4-maverick

Llama 4 Scout

Meta · 109B MoE

10M
GeneralVision

Input / 1M

$0.10

Output / 1M

$0.30

airouter-cloud/llama-4-scout

Qwen 3 235B

Alibaba · 235B MoE

128K
GeneralCodeReasoning

Input / 1M

$0.25

Output / 1M

$0.75

airouter-cloud/qwen-3-235b

Qwen 3 32B

Alibaba · 32B

128K
GeneralCode

Input / 1M

$0.08

Output / 1M

$0.16

airouter-cloud/qwen-3-32b

Qwen 3 8B

Alibaba · 8B

128K
General

Input / 1M

$0.02

Output / 1M

$0.04

airouter-cloud/qwen-3-8b

DeepSeek V3.2

DeepSeek · 685B MoE

128K
GeneralCode

Input / 1M

$0.27

Output / 1M

$1.10

airouter-cloud/deepseek-v3.2

DeepSeek R1.1

DeepSeek · 685B MoE

128K
Reasoning

Input / 1M

$0.55

Output / 1M

$2.19

airouter-cloud/deepseek-r1.1

Mistral Large 3

Mistral · 160B MoE

256K
GeneralCodeReasoning

Input / 1M

$2.00

Output / 1M

$6.00

airouter-cloud/mistral-large-3

Mistral Medium 3

Mistral · 70B

256K
General

Input / 1M

$0.40

Output / 1M

$2.00

airouter-cloud/mistral-medium-3

Mistral Small 3.2

Mistral · 24B

128K
General

Input / 1M

$0.10

Output / 1M

$0.30

airouter-cloud/mistral-small-3.2

Gemma 3 27B

Google · 27B

128K
GeneralVision

Input / 1M

$0.07

Output / 1M

$0.14

airouter-cloud/gemma-3-27b

Gemma 3 12B

Google · 12B

128K
GeneralVision

Input / 1M

$0.04

Output / 1M

$0.08

airouter-cloud/gemma-3-12b

Phi-5

Microsoft · 14B

128K
GeneralReasoning

Input / 1M

$0.04

Output / 1M

$0.08

airouter-cloud/phi-5

Command A

Cohere · 111B

256K
GeneralCode

Input / 1M

$0.50

Output / 1M

$1.50

airouter-cloud/command-a

Qwen 3 Coder 32B

Alibaba · 32B

128K
Code

Input / 1M

$0.08

Output / 1M

$0.16

airouter-cloud/qwen-3-coder-32b

DeepSeek Coder V2.5

DeepSeek · 236B MoE

128K
Code

Input / 1M

$0.14

Output / 1M

$0.28

airouter-cloud/deepseek-coder-v2.5

StarCoder3

BigCode · 22B

128K
Code

Input / 1M

$0.05

Output / 1M

$0.10

airouter-cloud/starcoder3

Llama Guard 4

Meta · 12B

128K
Safety

Input / 1M

$0.03

Output / 1M

$0.06

airouter-cloud/llama-guard-4

Mistral Nemo 2

Mistral · 12B

128K
General

Input / 1M

$0.03

Output / 1M

$0.06

airouter-cloud/mistral-nemo-2

Custom Model Deployment

Need a specific model? We can deploy any HuggingFace model on our infrastructure within 24 hours. Fine-tuned models, custom architectures, private weights — we handle it all.

Same API you already use

Self-hosted models use the same endpoint and request format. Just use airouter-cloud/ as the provider prefix.

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.airouter.kz/api/v1",
    api_key="air_live_your_key_here"
)

# Use a self-hosted model — same API as any other model
response = client.chat.completions.create(
    model="airouter-cloud/llama-4-maverick",
    messages=[
        {"role": "user", "content": "Write a Python quicksort function"}
    ]
)

print(response.choices[0].message.content)

Ready to get started?

Start using self-hosted models today with the same API key and endpoint. No configuration needed.