Feb 10, 2026·8 min read

Questions to Ask an LLM Provider Before Signing a Contract: What to Clarify

Questions for an LLM provider help you check logs, data retention, model updates, and what happens during outages and incidents before signing the contract.

Why ask questions before signing

Problems with LLMs rarely start on launch day. They usually show up later: the legal team has already approved the contract, the team has connected the API, and then it turns out the provider keeps logs longer than you want, changes the model without notice, or responds to incidents only during business hours.

Fixing that after the fact is hard. If the contract does not clearly define logs, data retention, model updates, and support, you will get a generic policy or a manager’s verbal promise. That is not enough for procurement. When a dispute comes up, people look at the contract and its attachments, not a call or a chat message.

It is better to gather all your questions in one list before the first procurement meeting. That makes it easier to compare providers and avoid missing details. This is especially important if you have requirements for data storage in Kazakhstan, PII removal, audit logs, or incident response times.

Ask for concrete, verifiable terms right away: how long requests, responses, and metadata are kept, what country the data lives in, who has access, how the provider notifies you about model changes, and who takes incidents at night or on weekends. Marketing language sounds nice, but it is almost useless in practice.

What data goes into the model

The provider almost never gets only the text of the question. Along with the prompt, system instructions, chat history, user ID, file name, interface language, request time, and other service fields often go along for the ride. If the team connects chat, document search, or a voice workflow, the amount of data grows fast.

At the meeting, ask for a precise list of fields for each request type, not a general description. Clarify separately what goes directly into the model, what is stored only in logs, and what stays on your side. Otherwise you may later find email addresses, contract numbers, or internal operator notes in the service logs.

Files are where confusion happens most often. The provider may send the file itself, extracted text, a thumbnail image, the file name, and technical metadata to the model. If you work with forms, medical records, or scanned documents, ask directly whether EXIF, file names, MIME types, file sizes, storage links, and the data of the user who uploaded the file are transmitted.

Another separate question is whether the provider uses your data for model training, fine-tuning, quality evaluation, or manual labeling. The phrase “we do not train models” does not always close the topic. A provider may not train the base model, but still keep requests for internal testing, fraud prevention, or incident analysis.

It helps to ask four direct questions:

Which fields are included in the default request, and which can be turned off.
What happens to files, attachments, and their metadata.
Whether requests and responses are used for training, evaluation, or manual review.
Whether production data and test examples can be separated by different keys, projects, or environments.

This kind of separation significantly lowers risk. For testing, it is better to use synthetic examples or pre-masked data rather than copies of real customer conversations. If you are a bank, clinic, or contact center, it is worth asking right away where PII is masked: before sending data to the model or only after.

These questions are best asked before the pilot, not after the first integration. Once the data flow is built into the product, changing it takes longer and costs more.

How logs are organized

Logs can either settle a dispute in five minutes or create a new problem for a year. That is why you should ask about them as strictly as you ask about price and SLA. If the provider answers in vague terms, that is already a warning sign.

Ask which events are written to logs. Usually there is not just one log, but several: API access, application errors, billing, employee actions, filter triggers, and limit hits. Each type carries a different risk. An error code log is almost harmless, while a log with the full request text may contain personal data, internal documents, or parts of customer conversations.

Also ask whether the prompts and model responses themselves are recorded. If they are, in what form: full, partial, masked, hashed, or only as a request ID. A simple question often works better than any description: “Show me a real log record without secrets.”

Ask for the retention period for each log type. Not “how long do you keep logs,” but the exact period by category. Access logs may be kept for 90 days, employee audit logs for a year, and request content may not be stored at all. If the timeframes are all the same, the provider probably has not thought this through.

It is just as important to understand who inside the company can see those records. Clarify whether support, SRE, security, and developers have separate permissions. Ask who grants log access and whether there is an audit trail when an employee opens someone else’s request.

For a bank, clinic, or large retailer, it is often hard to pass an internal review without exporting logs. So ask whether you can get them for audit by date, API key, request ID, in JSON or CSV. If a provider, for example AI Router, says it offers audit logs and stores data inside the country, it is still worth checking with concrete examples and retention periods instead of taking it on faith.

Where data is stored and how it is deleted

You should not only ask whether the provider stores data, but also where exactly it is stored. One and the same service may keep prompts, responses, files, logs, and backups in different countries and with different subcontractors. For a company in Kazakhstan, this often affects both risk and compliance with internal rules.

Start with a simple question: name the country, data center, and location for each data type. If the provider says the live data is stored locally, ask right away where backups, replicas, and temporary caches are located. The problem is often not the main database, but the copies people remember too late.

It helps to clarify five things:

where prompts, responses, files, and metadata are stored;
where backups are kept and for how long;
which contractors or cloud providers have access to that data;
what is deleted immediately on a customer request and what is deleted only on a schedule;
how personal data is masked before being sent to the model.

Deletion is also better handled step by step. A good answer is specific: the active storage is cleared within one timeframe, logs within another, and backups within a third. The phrase “we delete data on request” guarantees nothing. Ask for the deletion order after contract termination and after a routine request from your team.

Also check how personal data is masked. If a customer writes an ID number, phone number, or address in chat, the provider should explain what happens to those fields before they are sent to the model. Ask whether data is masked automatically, whether rules can be configured for your scenarios, and whether the original value ends up in logs.

It helps to put this on one page: who stores the data, where the copies are, who has access, and how many days it takes until everything is deleted. If the service runs through a single gateway, like AI Router, and says it stores data inside Kazakhstan and masks PII, still ask for exact deletion times and a list of external participants. Contracts need details, not promises.

How the provider updates models

Run a PII scenario

Test masking of sensitive fields before you send traffic to production.

Test PII

Problems often begin after a quiet model update. Yesterday the bot answered exactly as expected; today it is longer, more expensive, and starts mixing up wording. That is why you need not a vague “we regularly improve models,” but a clear process for changes.

First, find out whether you can lock to a specific model version instead of living on a changing alias. If the provider changes the model under the same name, you lose control over quality, price, and behavior. For production, it is safer to pin a version and decide separately when to move to a new one.

Then ask how the team warns you about version changes. A normal setup includes advance notice, a clear cutoff time before switching, and a list of changes. It is not only about answer quality, but also limits, pricing, output format, tool calling support, and JSON. If you work through a single OpenAI-compatible endpoint, check that too: does anything change behind the same model ID, or do you always see the exact model identifier?

You also need a rollback process. If errors increase or accuracy drops after an update, who brings back the previous version and how quickly? A good answer includes a timeframe, a responsible person, and a clear rule for when the team triggers rollback.

Another uncomfortable but necessary question: what happens if the model is taken out of service? The provider should have a replacement plan. Otherwise the team only learns about the problem after the outage. It is better to know in advance whether they will offer a close replacement, give you a migration window, and help you check prompts, limits, and cost.

It is also worth fixing responsibility for retesting in advance. The provider rarely knows your metrics better than your own team. So the contract should say who runs regression tests, who checks the quality threshold, and who gives the final approval for the switch.

A quick example. A bank updated the model in its support chat without a second review. The new version was just as fast, but it more often asked the customer to “clarify the question” and did a worse job of guiding the conversation to a resolution. Formally, the service still worked. In practice, the business was already losing leads.

What happens during an incident

An outage rarely looks neat. More often, latency rises at night, some requests fail, and the customer team hears about it from user complaints. Before signing, it is worth learning not only the general SLA, but also the working process: who notices the problem, who takes the report, and what the provider does in the first 15 to 30 minutes.

Ask to see the incident flow. Not as a promise to “respond quickly,” but step by step: how the outage is recorded, who makes the decision, when an engineer is brought in, and how customers are updated on status. If the provider cannot describe the process in simple words, there will be chaos in a real incident.

Also ask who is on call during the day and at night. Is there a 24/7 on-call setup, or do tickets just wait until morning at night? For LLM services, this is a common weak point, especially if your product runs nonstop.

Ask about specific timeframes:

how many minutes it takes for the team to confirm the incident;
how long it takes before the customer gets the first useful update;
how quickly the provider notifies customers about a widespread outage;
how often new status updates are sent until the issue is resolved.

If the answers are vague, that is a bad sign. The phrase “we will respond as soon as possible” does not help when a live chat for bank, retail, or healthcare customers goes down.

Another question is often forgotten: how will the provider limit damage while fixing the root cause? For example, can it quickly enable rate limiting, route traffic to a backup model, disable a problematic region, mask sensitive fields in logs, or temporarily stop recording part of the data?

After any serious outage, the provider should deliver a review. You need a short report: what happened, who was affected, what data was at risk, what the team has already fixed, and what it will change so the outage does not happen again. Without that, it is hard to close the internal review and move on.

How to run the review

Review works better when you use one real process instead of an abstract list of requirements. Pick a scenario the team really wants to launch: support chat, search across an internal knowledge base, ticket triage, or an assistant for operators.

Then build a small test set. Usually 5 to 10 typical requests with expected answers, a couple of edge cases, and at least one example with personal data, sensitive text, or a user error are enough. That quickly moves the conversation from broad promises to verifiable answers.

After that, keep it simple. Describe the scenario on one page: who sends the request, what data goes into the model, where logs are needed, and who reviews the result. Prepare real prompts and responses, not only “ideal” examples. Add a long request, an unclear question, and a case where the model should not make things up.

It is better to send the questions in writing. That makes it easier to compare what the provider promised about logs, retention periods, data deletion, model updates, and incident handling. Then ask for test access or a live demo. Let the team show where logs are visible, how the model changes, how key-level limits are enabled, and what happens when an error occurs.

Put the answers into one table and share it with legal, security, and the product team. If the answers do not match, that is already a useful signal. For example, sales may say data is deleted on request, while support says backups live for another 30 days.

If the provider promises compatibility with your current SDK or familiar API format, test it with your own code. A presentation proves almost nothing. It is better to spend one day on that check than to rewrite the integration later and argue over what was actually promised in the contract.

Where people most often go wrong

Data storage in Kazakhstan

If local data storage matters to you, start a pilot and check the details on your own flow.

Start pilot

The most common mistake is agreeing to tidy phrases like “logs are kept for a short time” or “we respond quickly to incidents.” For a contract, those answers are almost useless. You need numbers, format, and responsibility: how many days logs are kept, where exactly they are stored, who has access, and how many hours it takes to get a response during an outage.

Another common problem is forgetting about backups. The provider may honestly delete data from the primary system, but a copy remains in backup storage for weeks or months. If you do not discuss that in advance, unpleasant details later appear about retention periods, encryption, and admin access.

Model updates are also often underestimated. For production, that is a bad approach. Even a small version change can break ticket classification, answer style, or extraction accuracy. Ask how the provider warns you about changes: by email, through a status page, in the dashboard, how many days in advance, and whether you can stay on the old version for a while.

People also tend to test the service too gently. The team runs harmless tests, and then real traffic brings personal data, long documents, edge-case requests, and load spikes. If you are evaluating a provider, ask for testing on real scenarios, but with sensitive data masked.

It is usually worth checking five things separately:

the exact retention period for logs and backups;
the data deletion process on request;
the channel and timing of model update notifications;
the SLA and contact for urgent cases;
the service’s behavior under peak load and when upstream providers fail.

And one mistake shows up all the time: a pilot is launched without a live contact for emergencies. A general support address will not save you if the API stops responding at night. You need a clear escalation path: who answers first, who joins next, and how quickly the provider team gives you a status update.

Quick check before the meeting

Before talking to a provider, you do not need a long questionnaire. Five points are enough to quickly show whether the team knows how to handle data and incidents. If there are no clear answers right away, that is already a signal.

It is better to ask for exact wording, not general promises: timeframes, communication channels, who is responsible, and where it is documented. That is more useful than long talks about security.

For logs, ask for the exact retention period in days or months.
For data, clarify the deletion period not only from the main system, but also from backups.
For models, ask how the provider tells you about a version change, routing change, or a provider swap under the hood.
For incidents, find out whether there is an urgent communication channel: separate email, chat, phone, on-call team.
For employee access, ask for a written answer: who can see logs, prompts, and responses, under what conditions, and how this is audited.

If the provider answers vaguely, ask for a short follow-up email after the meeting. It is useful to have one page left over with concrete terms: 30 days for logs, 7 days for deletion, 14 days’ notice before a model change, a 24/7 emergency channel, and employee access only by request.

That kind of list saves time during contract approval and quickly separates a working service from a team that is not ready for production yet.

Example: support chat for customers

Compare 500+ models

Run the same request through 500+ models from 68+ providers through one API.

Compare models

A company launches a support chat for customers and wants to take some of the load off operators. The setup seems simple: the operator sends the model the ticket history, order number, previous replies, and a short note about the issue. In the first weeks everything looks fine: replies are fast, the queue is shorter, and the team is happy.

Problems often start later. After a month, the provider changes the model without notice or quietly moves traffic to another version. The same prompt now gives a different tone: answers become drier, sometimes sharper, and in edge cases the model sounds too confident. For a support chat, that is unpleasant. One bad response can reach the customer in the wrong style and create a complaint.

Then another weak point appears. The support manager wants to pull a disputed conversation and understand exactly what the model saw. But the service logs are incomplete: there is a request ID and a time, but no full text, model version, or request route. If the operator passed personal data, it becomes even more important to understand what the provider stored, where it is kept, and when it is deleted.

That scenario is a good way to test the contract before launch. You need direct answers to a few questions:

Does the provider store the full request and response, or only metadata?
Can the team see the model version and whether it was replaced?
Can text storage be turned off or PII be masked?
How quickly does support respond to a disputed answer or an outage?
Who will provide a report on a specific conversation if the customer files a complaint?

These are exactly the kinds of questions you should ask before the pilot. If the answers are vague, the risk will not show up on the day you sign the contract, but on the day of the first customer complaint.

What to do after the first conversation

After the call, do not rely on memory. Open one table and enter the answers from all providers in the same format. That makes the differences obvious, especially if everything sounded “roughly the same” during the meeting.

Usually a few columns are enough: log and input data retention time, data deletion process on request, how updates or model replacements are communicated, incident response time, and what the provider is ready to commit to in the contract and SLA.

Compare numbers and rules, not broad promises. The phrase “we help quickly” means nothing if there is no first-response time, no escalation channel, and no clear process. The same goes for data storage: one provider may say it “does not keep anything,” but prompts, metadata, or request traces may still remain in its logs.

If gray areas remain after the conversation, send a short email and ask for a written reply. Verbal statements are easy to change, while written answers are easy for legal, security, and development teams to review.

The next step is a short pilot on your own scenario, not a demo from a presentation. Take one real flow: support chat, internal document search, or ticket triage. In a few days you will see what happens with latency, logs, model errors, and support when something goes wrong.

A good pilot usually checks three things: whether your traffic goes through without rewriting the code, whether you have enough log transparency, and whether the provider responds within the promised time. If even one of those starts slipping during testing, production will not get easier.

If you need one OpenAI-compatible endpoint, data storage in Kazakhstan, and audit logs, it makes sense to compare AI Router separately on those points. With airouter.kz, you can switch the base_url to api.airouter.kz and keep working with the same SDKs and prompts, so it is easy to test on your own code instead of a demo.

In the end, you should be left not with a “list of impressions,” but with a short fact-based ranking. That makes it much easier to decide who to move into the pilot and who to remove from the list right away.

Frequently asked questions

What should I ask the provider first?

Ask for exact terms, not broad promises. Clarify how long logs are kept, where data is stored, how deletion works, how model changes are handled, and who responds to incidents at night and on weekends.

What else can go into the model besides the text of the request?

Start with the full contents of the request. The provider should say whether only the prompt goes into the model or also the chat history, system instructions, user ID, file name, metadata, and other service fields.

What should I ask about files and attachments?

Break files down step by step. Ask whether the service sends the file itself, extracted text, the file name, EXIF, MIME type, the storage link, and the data of the person who uploaded the file.

How can I tell whether the provider uses my data for training or review?

Do not stop at the phrase “we do not train models on your data.” Ask whether the provider keeps requests for quality checks, manual review, fraud prevention, or incident analysis.

How do I know what is in the provider’s logs?

Ask for an example of a real log entry with secrets removed. That will quickly show whether the service records the full request and response text, only metadata, or just a request ID.

How can I check where the data is actually stored?

Ask a direct question for each data type: where prompts, responses, files, logs, and backups are stored. If local storage in Kazakhstan matters, the provider should name not only the main storage location, but also the place for backups and temporary caches.

What should I clarify about data deletion and backups?

A good answer always splits deletion into stages. Find out how many days it takes to clear working storage, logs, and backups, and ask for the same process after the contract ends.

What should I ask about model updates?

For production, it is better to pin a specific model version instead of relying on a changing name. Also ask how the provider warns you about version changes, who performs rollbacks, and how long that takes.

How can I tell whether the provider handles incidents properly?

Look at the live process, not just the SLA. Clarify who receives incidents 24/7, how many minutes it takes to get the first meaningful update, and whether the team can quickly switch traffic to a backup model or limit the problematic flow.

What is the best way to run a pilot before signing the contract?

Take one real scenario and test it with your own code. During the pilot, it is useful to check three things: whether traffic goes through without reworking the integration, whether you have enough log visibility, and whether support responds within the promised time.