Skip to content
Dec 17, 2025·8 min read

Model Selection for Compliance: How to Build a Fact Pack

Model selection for compliance is easier to approve when you bring facts: logs, risks, retention periods, access rules, and a list of controls.

Model Selection for Compliance: How to Build a Fact Pack

Why compliance asks for facts, not promises

Compliance does not look at the model’s flashy name, but at the data flow. The LLM name alone does not explain whether personal data will leave the country, who will see the requests, or whether the action history can be restored later.

That is why model selection is rarely approved based on phrases like “the provider is reliable” or “the data is protected.” Those words sound confident, but they cannot be checked. A compliance officer does not need a general conclusion; they need a set of facts that can go into the approval folder and be shown during an internal audit.

The questions are usually very simple: who sends the requests and on whose behalf, where requests, responses, logs, and cache are processed, how long the data is kept, who can access the logs and under what rules, and what measures cover the risks of leaks, model errors, and disputed answers.

If the team brings only a presentation and the promise that “we control everything,” approval almost always slows down. Compliance sends the document back with follow-up questions because it needs to check access boundaries, retention periods, and audit trails. It is easier to put the fact pack together once than to answer the same questions for weeks in email and meetings.

This is easy to see in practice. Suppose a bank wants to launch a support chat. The phrase “we chose a well-known model” does not help. But a card with the model version, provider, log storage location, deletion period, PII masking, and audit logs already gives a basis for the decision.

What to include in the model selection card

The card is not for decoration. Compliance should be able to quickly understand why you chose this model, what data it will see, and where the weak points are. It is better to keep it short, but every statement should be verifiable.

Start by describing the use case itself. Not “a customer chatbot,” but the exact task: answers for support agents, case review, internal search, draft emails. Next to it, state the goal in one line: reduce response time, improve answer completeness, or remove manual ticket sorting. If the goal is vague, the rest of the card loses its purpose.

Then record the data used in prompts. Compliance does not need an abstract “user text,” but clear categories: full name, contract number, case history, internal documents, anonymized logs. It is useful to note right away what must not be sent to the model and what gets masked before the call.

A minimal card usually includes:

  • use case, business goal, and process owner
  • what data goes into the prompt, response, and logs
  • the chosen model and 1–2 alternatives with a short comparison
  • provider, processing region, retention period, and deletion method
  • known limitations, assumptions, and manual checks

It is best to keep the comparison with alternatives to one page. In many cases, four criteria are enough: answer quality on your examples, price, latency, and data retention requirements. That is usually enough to show why one model fits better than another, rather than because “the team just liked it.”

If the team works through a single gateway such as AI Router, the card should still record not only the gateway name, but also the exact model, provider, and processing location. That removes confusion during approval.

It is better to write limitations directly. A model may mix up terms, struggle with rare document formats, or require human review. Such notes do not weaken the card. On the contrary, they show that the team understands the solution’s boundaries.

How to describe logs and tracing

The phrase “we have logs” is not enough. You need a clear scheme: what exactly you log, where it is stored, who can see it, and how to trace a disputed request in 10 minutes instead of half a day.

Keep the same set of fields in every record. That helps during incidents and during internal checks. Usually request_id for end-to-end search, the model and provider name, request and response time, call status, error code, and latency in milliseconds are enough.

These fields should be tied to your application. Then the team can open a ticket, take the request_id, and see the whole chain: who sent the request, which service processed it, which model was called, how long the response took, and at which step the problem appeared.

Mask PII before writing to the log. Do not rely on manual discipline. The rule should trigger automatically: phone numbers, IIN, card numbers, email, and other sensitive fields are replaced with masks before storage. For compliance, this is usually more important than the full prompt text.

Access to logs should also be described clearly. The developer sees technical fields and errors. The security or compliance officer can view audit records and access history. The operations team looks for failures but does not read unnecessary user data. If someone opened a record, that should also stay in the journal.

It is useful to add a small investigation scenario. For example: “A customer complaint came in at 14:20. The team finds the request_id from the CRM, locates the call at 14:18, sees the model, provider, status 429, and latency of 18,000 ms. Then the team checks retries and key limits.” One example like this removes many unnecessary questions.

How to break down the risks

It is easier for compliance to approve a solution when it sees not a long list of threats, but a simple matrix: risk, impact, likelihood, control measure, and risk owner. That is usually enough if the wording is precise and free of vague language.

It helps to divide risks into three groups. The first is data risks: personal data leaks, requests stored outside the needed jurisdiction, sensitive fields entering logs. The second is response risks: made-up facts, unsafe advice, missing prohibited content, incorrect classification. The third is access risks: excessive employee permissions, weak API keys, no restrictions by environment or team.

For each line, state at least two values: the impact and the likelihood. Impact is better described in business language: fine, customer complaint, process interruption, manual rechecking of hundreds of applications. Likelihood should also be simple and clear: rare, sometimes, often. The phrase “the risk is high” explains very little.

Then tie each risk to a specific control measure. For data risks, this may be PII masking before sending, in-country storage, separate audit logs for requests and responses. For response risks, output policy checks, task-type restrictions, manual confirmation for sensitive actions. For access risks, you need roles, key limits, separate keys for teams, and an administrator activity log.

About fallback separately

Fallback is better placed in a separate block, not hidden in a note. Describe when the system moves to another model: on timeout, rising errors, low confidence, or failed response checks. Specify which backup model is used, whether the same logging rules remain in place, and who gets notified.

If you have model routing, these rules should be fixed at the route level so the backup scenario works the same for every team.

At the end, name the person who accepts the risk. Usually it is not one person. The process owner accepts business risk, security is responsible for access and logs, and compliance is responsible for data and regulatory requirements. If this is not stated, the package is almost always sent back for revision.

How to describe storage and deletion of data

Move without rewriting code
Change only base_url and keep working with your current code and prompts.

Compliance usually wants not a general phrase about security, but a data map. Show what data you store, where exactly it lives, and when it disappears from the system.

First, split data by type. Describe prompts and responses separately from audit logs, because they usually have different lifetimes and different access groups. If you use a gateway or your own infrastructure, specify the hosting country, storage type, encryption, and who can read the records.

Usually four blocks are enough:

  • prompts and responses: where they are stored, in what form, and for how many days
  • audit logs: what fields are written, whether they include full text or only metadata
  • attachments and files: where original and derived copies are kept
  • backups: how often they are created and when they are deleted

Be especially precise with audit logs. State what goes into the record: request_id, time, model, provider, response status, masked user ID, triggered policies, errors. If the full request text is not written to the log, say so directly. If it is written, explain why.

Set retention periods for each layer, not one line for the whole service. For example, request text may be kept for 14 days to investigate incidents, audit logs for 180 days for checks, backups for 30 days before automatic deletion. That format is much easier to approve than the phrase “we store it briefly.”

The deletion process also needs to be explicit: who starts deletion, what event triggers it, which systems it affects, how caches and indexes are removed, and when backups are cleared. If data must be stored inside Kazakhstan, say that directly.

One more useful block is what you do not store. For example, you do not write PII into logs, you do not keep full text in analytics, and you do not leave test dumps after debugging. These phrases often remove half the questions during approval.

What controls and access rules are needed

Compliance does not check abstract “AI security,” but a set of rules. Who can call the model, which models are allowed for the service, what the system removes from the request before sending, and where this is visible in the logs.

For each service, keep a short allowlist of models. A support chat, an internal search tool, and a legal assistant should not have access to the entire catalog. In the LLM selection card, it is better to state which models are allowed, for which data they may be used, and who approved the list. If the service needs a backup option, that should be approved separately too.

Access is better granted not to “the whole team,” but by service, environment, and role. One key for production, one for testing, one for a partner. Limits should be set at the key level so one use case does not consume all resources and cause a cost spike. A simple example: the test bot gets a low limit, while the live support service gets its own request and token cap.

Mask personal data before calling the model, not after. Phone numbers, IIN, card numbers, addresses, and email addresses are better replaced with tokens at the input stage. For compliance, it is useful to show not only the fact of masking, but also the rule version: who changed it, when, and for which fields.

An access change log is needed in most cases. It records who granted access, who removed it, under which request, for how long, and with whose approval. If access is not reviewed for months, questions appear quickly.

If the product must label AI content, that rule should also be described as a control measure, not a wish. State where the user will see the label, in which scenarios it is mandatory, and who is responsible for its presence. For companies in Kazakhstan, it is better not to leave this point “for later.”

A good set of rules looks boring, and that is normal. For approval, that is a plus: the reviewer sees not promises, but a list of restrictions, logs, and responsibility points.

How to prepare the approval package

The package moves faster when compliance sees not a presentation, but a set of verifiable artifacts. Phrases like “the model is safe” do not work. You need documents that show which data goes into the model, what the team writes into logs, where everything is stored, and who is responsible for each control measure.

Start with one use case, not with “using LLMs across the company” in general. For example: a chat for support agents that receives the message text, ticket number, and an internal knowledge-base article. Immediately separate the data by type: what may be sent to the model, what must be masked, and what must never be passed under any conditions.

Attach a sample log separately and remove PII in advance. Compliance usually looks not only at the request itself, but also at the metadata: time, service, model, prompt version, response status, user or service ID, triggered limits, and content labels.

It is convenient to bring the package together in a simple table:

BlockWhat to showWho is responsible
Use casegoal, users, input data, restrictionsproduct owner
Logssample record without PII, tracing fields, retention periodtech lead or SRE
Risks and controlsrisk, what may happen, how it is covered, how it is checkedcompliance and security
Storagewhere prompts, responses, logs, and backups are keptarchitecture team
Deletiontimelines, deletion trigger, who starts it and who checks itdata owner

It is better to attach the storage and deletion flow as a separate diagram in the internal package, but in the text a short description is enough: where the data appears, where it goes next, and when it disappears. If the data must be stored inside Kazakhstan, write that directly, without vague language.

At the end, check one thing: every item should have a specific person or team assigned. Then model selection looks like a managed process, not a debate about a “trusted” model.

Example: a bank support chat

Collect facts on every call
Connect one OpenAI-compatible endpoint and keep the model, provider, and route in one place.

A bank launches a customer support chat. The bot answers common questions about tariffs, limits, fees, and processing times for transactions. For compliance, that is not enough. It needs a diagram showing which data passes through the model, what the system stores, and who is responsible.

In the working version, the bot does not send the customer’s raw text as is. Before the call, the system masks the IIN, card number, and other details that could identify the person. If the customer writes: “Check card 4400 1234 5678 9999 and IIN 990101300123,” the model receives already cleaned text with markers instead of sensitive fields.

The team separately records what is needed to investigate incidents and for internal checks: request_id, request time, and case category. That is usually enough to reconstruct the sequence of events, understand which route the system chose, and check why the bot gave that answer. Full request text should not be stored without a clear reason and retention period.

Some requests are immediately classified as sensitive. These include disputed charges, card blocking, customer identification, and suspicious transactions. The system sends such requests to a separate environment where data is stored inside the country. If the team uses AI Router, these calls can be routed to models in local infrastructure and kept with unified audit logs for that scenario.

For approval, compliance usually receives not a presentation, but a short set of facts: a data flow diagram, retention periods, a list of roles, and a table of controls. In this example, the roles are simple: product is responsible for use cases, security for masking and access, the platform for logs and routing, and compliance for retention rules. Such a document is read faster and approved more calmly.

Where teams most often make mistakes

Most often, teams write too broadly. The phrase “the data is protected” sounds confident, but it is almost useless for approval. You need a scheme: where the request comes from, which fields the system masks, what goes to the provider, where the logs are stored, and who has access.

Confusion around logs happens all the time. The team puts application logs, API gateway logs, and the provider’s model activity traces into one section. As a result, compliance does not understand where to look for an incident. It is better to separate these layers right away: what your service writes, what the gateway writes, and what is not stored by you at all and remains outside your environment.

Another common mistake is silence about the backup model. While everything works, this is invisible. But on the day of a failure, the service may switch to another route, and the new model may have different retention rules, a different processing region, or a different risk profile. This switch must be described in advance and approved separately.

Test data is forgotten almost as quickly as it is created. Pilot prompts, evaluation exports, and examples with personal data often stay longer than production logs. Compliance will almost immediately ask for the deletion period, the cleanup method, and the person who checks it.

The audit log also often ends up being “shared” and no one’s responsibility. Until an owner is assigned by name or role, nobody watches record completeness, retention periods, or access. Because of that, even a good document falls apart at the last step.

Usually five things are enough: a step-by-step data flow diagram, separate descriptions for application logs, gateway logs, and provider logs, the rule for switching to a backup model, the deletion period for test and evaluation data, the audit log owner, and the review process. Without this, the package is almost always sent back for revision.

Quick check before sending

Split access by service
Give teams one gateway for different models and keep consistent key rules.

Compliance often sends the package back not because the risk is high, but because it lacks verifiable facts. Before approval, remove phrases like “the data is protected” and replace them with things that can be shown: a table, a log, an access diagram, a retention period.

Before sending, check five things:

  • the “risk – control measure – owner” table has no empty cells
  • there is a real log sample without PII, preferably 5–10 lines with request_id, time, model, response status, call source, and masking label
  • each storage layer has a region listed: application, API gateway, vector database, logs, backups
  • the access grant and revocation process is described step by step
  • there is a plan for model failure and incidents: what the system does on timeout, where traffic goes, who gets notified, and who decides when to enable the backup scenario

One small detail often decides the outcome of approval: model names, providers, and environments must match across all files. If one model appears in the card and another in the log, the package is almost always returned.

What to do next

Start with one document, not a pile of emails and spreadsheets. If the team has one LLM selection card for each scenario, approval will move faster: compliance will see the same fields, and product and IT will not have to rebuild the package from scratch each time.

First, fix the required template: scenario goal, data type, allowed models, jurisdiction restrictions, service owner, and retention period. Then mark the places where data cannot leave the country. After that, run one set of real requests through several models and compare risk, latency, price, answer quality, and the share of manual checks on the same test. Only then check the control measures before the pilot: PII masking, role-based access, key limits, activity logs, and a clear incident review process.

That is how model selection stops being a debate about promises and becomes a comparison of facts. For an internal bank assistant, one model may be slightly more accurate but store logs outside the needed jurisdiction. Another may answer 300 ms faster and pass approval without exceptions.

If the team needs one API to work with different models and still needs unified audit logs, PII masking, and data storage in Kazakhstan, they can look at AI Router on airouter.kz. The service has one OpenAI-compatible endpoint, but the approval card should still record the specific models, providers, and retention rules for each scenario.