20 models · Dedicated GPUs · Same API

Open Source Models, Enterprise Infrastructure

We run the most popular open source models on our servers. Same API, zero setup, full control.

Get Started View Documentation

Data stays on our servers

Your data never leaves our infrastructure. No third-party routing for self-hosted models.

Dedicated GPU allocation

Models run on dedicated NVIDIA A100/H100 GPUs with guaranteed compute capacity.

99.9% SLA

Enterprise-grade availability with automated failover and health monitoring.

20 Production-Ready Models

All models run on our optimized inference stack and dedicated GPU infrastructure. Use them with the same OpenAI-compatible API you already know.

Llama 4 Behemoth

Meta · 2T MoE

GeneralCodeReasoning

Input / 1M

$0.90

Output / 1M

$2.70

airouter-cloud/llama-4-behemoth

Llama 4 Maverick

Meta · 400B MoE

GeneralVisionCode

Input / 1M

$0.20

Output / 1M

$0.60

airouter-cloud/llama-4-maverick

Llama 4 Scout

Meta · 109B MoE

10M

GeneralVision

Input / 1M

$0.10

Output / 1M

$0.30

airouter-cloud/llama-4-scout

Qwen 3 235B

Alibaba · 235B MoE

128K

GeneralCodeReasoning

Input / 1M

$0.25

Output / 1M

$0.75

airouter-cloud/qwen-3-235b

Qwen 3 32B

Alibaba · 32B

128K

GeneralCode

Input / 1M

$0.08

Output / 1M

$0.16

airouter-cloud/qwen-3-32b

Qwen 3 8B

Alibaba · 8B

128K

General

Input / 1M

$0.02

Output / 1M

$0.04

airouter-cloud/qwen-3-8b

DeepSeek V3.2

DeepSeek · 685B MoE

128K

GeneralCode

Input / 1M

$0.27

Output / 1M

$1.10

airouter-cloud/deepseek-v3.2

DeepSeek R1.1

DeepSeek · 685B MoE

128K

Reasoning

Input / 1M

$0.55

Output / 1M

$2.19

airouter-cloud/deepseek-r1.1

Mistral Large 3

Mistral · 160B MoE

256K

GeneralCodeReasoning

Input / 1M

$2.00

Output / 1M

$6.00

airouter-cloud/mistral-large-3

Mistral Medium 3

Mistral · 70B

256K

General

Input / 1M

$0.40

Output / 1M

$2.00

airouter-cloud/mistral-medium-3

Mistral Small 3.2

Mistral · 24B

128K

General

Input / 1M

$0.10

Output / 1M

$0.30

airouter-cloud/mistral-small-3.2

Gemma 3 27B

Google · 27B

128K

GeneralVision

Input / 1M

$0.07

Output / 1M

$0.14

airouter-cloud/gemma-3-27b

Gemma 3 12B

Google · 12B

128K

GeneralVision

Input / 1M

$0.04

Output / 1M

$0.08

airouter-cloud/gemma-3-12b

Phi-5

Microsoft · 14B

128K

GeneralReasoning

Input / 1M

$0.04

Output / 1M

$0.08

airouter-cloud/phi-5

Command A

Cohere · 111B

256K

GeneralCode

Input / 1M

$0.50

Output / 1M

$1.50

airouter-cloud/command-a

Qwen 3 Coder 32B

Alibaba · 32B

128K

Code

Input / 1M

$0.08

Output / 1M

$0.16

airouter-cloud/qwen-3-coder-32b

DeepSeek Coder V2.5

DeepSeek · 236B MoE

128K

Code

Input / 1M

$0.14

Output / 1M

$0.28

airouter-cloud/deepseek-coder-v2.5

StarCoder3

BigCode · 22B

128K

Code

Input / 1M

$0.05

Output / 1M

$0.10

airouter-cloud/starcoder3

Llama Guard 4

Meta · 12B

128K

Safety

Input / 1M

$0.03

Output / 1M

$0.06

airouter-cloud/llama-guard-4

Mistral Nemo 2

Mistral · 12B

128K

General

Input / 1M

$0.03

Output / 1M

$0.06

airouter-cloud/mistral-nemo-2

Custom Model Deployment

Need a specific model? We can deploy any HuggingFace model on our infrastructure within 24 hours. Fine-tuned models, custom architectures, private weights — we handle it all.

Request Custom Model

Same API you already use

Self-hosted models use the same endpoint and request format. Just use airouter-cloud/ as the provider prefix.

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.airouter.kz/api/v1",
    api_key="air_live_your_key_here"
)

# Use a self-hosted model — same API as any other model
response = client.chat.completions.create(
    model="airouter-cloud/llama-4-maverick",
    messages=[
        {"role": "user", "content": "Write a Python quicksort function"}
    ]
)

print(response.choices[0].message.content)

Ready to get started?

Start using self-hosted models today with the same API key and endpoint. No configuration needed.

Create Free Account Read the Docs