Dec 30, 2024·8 min read

Deleting Data at a Provider: What to Ask Before Buying

Data deletion at a provider should not be checked by word of mouth. Before buying, ask for contract clauses, logs, cleanup timelines, and the audit process.

Why the provider’s word is not enough

When you check how a provider deletes data, a promise from a call or an email after a demo is not enough. A manager may sound confident, but in a dispute, the contract, the data processing appendix, retention periods, and the deletion process are what matter.

The risk is even higher with an LLM provider than it may seem. Even if the service does not store the prompt text in the main system, copies often remain in nearby layers. This is not an unusual failure; it is a normal part of operations. Data traces can live in access logs, error logs, traces, monitoring, backup copies, retry queues, and temporary dumps.

That is why the phrase “we do not store anything” is almost always too vague. It does not answer what counts as data, where it appears automatically, and who is responsible for clearing each copy. For procurement, that is not enough.

The deletion timeline also cannot be left vague. If the provider says “we delete on request” or “we delete within a reasonable time,” you will not be able to verify the result. One vendor clears data within 24 hours, another only after 90 days together with the backup cycle. For your team, that is a very different risk.

Without proof, the dispute quickly comes down to words. You will say the data should have disappeared. The provider will say it deleted everything. If there is no audit log, retention policy, record of the deletion procedure, or at least a clear incident report, it will be hard to prove anything.

This matters especially before buying an API gateway or a production model. Suppose a bank sends parts of customer messages in requests, and the provider later says everything was erased. If fragments of text show up a month later in support logs or a backup, it is too late to go back and find out what sales meant by the word “deletion.”

A reliable sign is simple: the provider already states what it deletes, where it deletes it, within what timeframe, and how it proves it. If that is missing before signing, it will not get better after launch.

What counts as data

When a team hears the promise “we delete data,” it often thinks only about the prompt text. That is not enough for verification. In an LLM environment, data should include everything that can be read, reconstructed, found in logs, or restored from backup.

The first layer is obvious: prompts and model responses. But teams often forget the system prompt, the conversation history, tool call results, and chunks of context from RAG. If an employee sends a contract to the model and the model returns a short summary, deletion should cover the original text, the response, and any intermediate fragments the system added to the request.

Then come files and service fields. Data includes PDFs, images, audio, CSV files, as well as file names, user ID, project ID, request ID, IP address, request time, selected model, tokens, and routing tags. If you work through an API gateway, the same request may appear both at the gateway and at the final model provider. The deletion promise has to be checked across the whole chain, not just in the main interface.

Another risk area is logs and error traces. When a request fails, developers often write part of the request body, the file name, or the model response into debug logs. Support then searches those records to find the issue. On paper, the provider may not store “content,” but that same content lives in logs, APM, the alerting system, or support tickets.

There are also less visible places: caches, queues, and temporary storage. A service may keep the request in a retry queue, a prompt cache, a temporary bucket for large attachments, or on a preprocessing node. These copies do not live long, but they are still storage. If the data can be retrieved, it is stored.

The final layer is backups and snapshots. A team deletes a record from the main database and considers the matter closed, even though a copy still sits in backup storage for 30, 60, or 90 days. That timeline is not always a problem if the cleanup process is clearly described. The problem starts when the provider talks only about deletion from the “live system” and says nothing about anything else.

A useful question is simple: in which exact places can one request, file, or response remain, even temporarily? Until the provider names those places one by one, the picture is incomplete.

What to lock into the contract

The contract should answer one question: what exactly does the provider consider client data? If that is missing, the dispute will start after the first incident or at termination. Many vendors write things broadly and then interpret them too narrowly.

It is better to list the categories directly in the contract or an appendix. This usually includes model prompts and responses, uploaded files, tables, images, audio, embeddings, fine-tuning datasets, caches, request metadata, and log content if it contains prompt text or parts of the response.

Also set the deletion timeline clearly. One timeline is needed after a client request, and another after the contract ends. A phrase like “within a reasonable time” does not help. You need numbers. For example, 30 days for production systems and 90 days for backups, if the provider cannot clear them faster.

The most common problem hides in backups and logs. The provider may delete the data from the main database but leave it in journals, task queues, or archived copies. The contract should state three things: where such data may remain, how long it is kept, and what happens when the period ends — deletion or anonymization. If some logs must be kept for security, billing, or legal reasons, ask for the exact contents of those logs.

You also need a clear deletion confirmation format. Not “we will confirm on request,” but a specific document: an email or report with the date, the list of deleted data sets, the systems affected, and a description of any exceptions. Otherwise, you get a short email reply that proves nothing later.

For teams in Kazakhstan, this is especially sensitive. Often you need to show not just the fact of deletion, but the path data took inside the infrastructure.

If you connect the LLM through a gateway like AI Router, check the responsibility chain right away. Who deletes the data in the gateway itself, who is responsible for the external model provider, and who gives confirmation if the request passed through several systems.

A full audit is not necessary for everyone. In practice, it is often enough to agree in advance on a set of evidence: an excerpt from the retention policy, a fragment of the audit log for your deletion request, a ticket or task ID, backup and cleanup timing confirmation, and the contact details of the responsible person on the provider’s side.

If the vendor is not willing to put this on paper, their promise about deletion does not mean much.

What technical signals to request

If the provider says, “we delete data on request,” ask for proof of real work, not a promise. You need artifacts from a live system, not slides. They show where data passes through, where it gets delayed, and who can actually reach it.

For an LLM provider or API gateway, this is especially important. Data rarely lives in one database. It may go into API logs, a task queue, a cache, a monitoring system, file storage, backups, and a contractor that provides inference or support.

Usually, it is enough to ask for five things:

A data flow diagram across all systems, from the incoming request to logs, backups, and external providers.
A retention table for each layer, with runtime logs, debug logs, cache, files, backups, and the audit log listed separately.
A list of contractors with a simple role for each: who stores data, who processes it, who sees metadata, and who receives the full request.
A sample audit log for deleting one record, with a timestamp, request ID, system, result, and the person or service that started the deletion.
The PII masking process in logs: what is hidden, at what stage, and whether the full text can reach the debug channel before masking.

A good data flow diagram is not just one box labeled “cloud.” It shows each layer separately. If you are buying an LLM provider, the diagram should include the inbound endpoint, routing, prompt cache, error log, analytics, attachment storage, and downstream models. That is exactly where copies often remain, even after the main record has already been deleted.

The retention table also has to be specific. A phrase like “logs are kept for a limited time” does not help. You need day counts for each layer and a rule for backups: is the record deleted immediately, marked for deletion, or removed only after rotation.

The audit log has a simple test. Ask for an example of one deletion where the full path is visible: the request arrived, the system found the object, started deletion across the needed layers, recorded the outcome, and recorded any error if one layer did not respond. A screenshot without an ID and time proves very little.

The provider should also answer directly about PII masking. Ask whether phone numbers, email addresses, IINs, card numbers, and names are masked before they are written to logs, or only afterward. If the full text is stored first and then “cleaned,” that is a weak point.

If the provider cannot gather these materials quickly, data storage review is almost certainly based on words rather than process.

How to check it step by step

Give security a live example

Run your own scenario and see which artifacts the security team can review.

Start working

Start with one shared questionnaire. Procurement, security, and legal should build it together, otherwise the provider will give three different answers to the same question. In one document, record where prompts and responses are stored, how long logs live, what goes into backups, who can restore records, and what the deletion request looks like.

Then compare the answers not with a presentation, but with the documents. If the manager says in an email that data will be deleted in 7 days, but the data processing agreement says “within a reasonable time,” follow the contract. Compare the wording in the contract, retention policy, log description, and SLA. It is better to resolve any mismatch before procurement than after launch.

Next, a simple scenario helps:

Collect one list of questions and send it to the provider as a single package.
Put the answers into a table and mark where they match the contract and where they differ.
Run a test on a safe dataset. It is better to use synthetic records with rare markers so they are easy to find.
Submit a deletion request for those test records and ask for the ticket number.
Ask for confirmation: when they deleted it, what they deleted, what remained in backups, and when that will disappear too.

The test is not just for show. Send the data through the same path you plan to use in production: through the API, the chosen model, logs, cache, and, if there is one, the intermediate gateway. For teams in Kazakhstan, this should not be overlooked. You need to understand the full route of the data, including storage inside the country, PII masking, and the audit trail.

A good sign is when the provider shows not just an email saying “everything has been deleted,” but a proper confirmation with the request time, execution time, list of affected systems, and the backup cleanup timeline. If the service has an audit-log export on request, ask for a sample in advance.

If the vendor cannot explain the confirmation format, the backup deletion timeline, or the boundaries of log storage, their promise is still worth very little.

Where teams make the most mistakes

The most common mistake is simple: the team hears “we do not train models on your data” and takes that to mean data deletion. Those are not the same thing. A provider may not use your prompts for training, but still store them in logs, queues, caches, object storage, and service copies.

The second mistake appears when teams check only what is visible in the interface. The dashboard may have a delete button, a disabled logging status, or a short retention setting. But procurement needs an answer not about the screen, but about the backend: what happens to the request after the API call, where it sits temporarily, who writes the audit logs, how long traces live, and how service tables are cleared.

Teams often forget backups and disaster copies. The provider honestly deletes the record from the main database, but a day later it is still present in a backup with a 30- or 90-day retention cycle. For banks, telecom, and the public sector, that is no small matter. If you have data residency requirements, ask where those copies are physically stored too.

Another typical mistake is accepting a general certificate or a nice security policy instead of a precise answer. SOC 2, ISO, or an internal security deck can be useful, but they do not answer the direct question: which data is deleted, from which systems, within what timeframe, and how that is confirmed. If the provider cannot name the tables, storage systems, log types, and retention periods, you do not have a review; you have a promise.

The fifth mistake is about contractors. Many LLM providers route requests further themselves: to a model, to cloud storage, to a log pipeline, to a monitoring system, or to external storage. In that case, deletion must be checked not with one counterparty, but across the whole chain. Otherwise, the main provider deletes the data on its side, but a copy remains with a downstream partner.

Warning signs usually look like this: they answer only “we do not train on your data”; they do not give deletion timelines for each system; they do not describe backup and disaster recovery policies; they hide the list of subprocessors; instead of an answer, they send a generic PDF about compliance.

The rule is strict but useful: if the vendor cannot walk you through the data path from request to deletion and name the systems, the data storage review has not been passed. For procurement, that is already enough to put the deal on hold.

A simple scenario before buying

Check the setup before buying

If logs and data residency matter to you, start with a short test in AI Router.

Request access

A bank tests an LLM service that suggests replies for a support chat operator. The team does not use live customer messages in the pilot. First, it builds an anonymized set of conversations: it removes full names, account numbers, phone numbers, addresses, and anything that could identify a person.

Even with that dataset, the argument usually starts not with answer quality, but with data deletion. The lawyer is not satisfied with the vague phrase “we do not keep anything.” They need the retention period after contract termination, the deletion confirmation process, and wording about backups, temporary files, and logs.

Security looks at it from another angle. The team asks the provider for a list of storage locations where traces of the pilot may remain: the main database, object storage, cache, queues, backups, and access logs. Then they request an example audit event: who deleted the data, when it happened, for which dataset, and what status the system returned.

Before the test, it is enough to fix four conditions:

the provider confirms in writing the deletion timeline after the pilot ends or the contract is terminated;
the provider shows which storage systems are involved in processing;
the team receives a sample deletion audit-log entry;
the parties agree in advance on a deletion test using a separate dataset.

The test itself will feel ordinary, and that is a good thing. The bank uploads a small control set of conversations, marks the exact time and request IDs, and after the agreed period sends the deletion command. After that, the provider should do more than reply by email; it should show the traces of the operation: a log entry, the deletion task status, and confirmation that the data is no longer accessible through search, re-export, or recovery from the working environment.

If the provider works through a gateway, the questions do not change. You still need to check where the data is stored, who keeps the audit logs, and how the team will receive deletion confirmation.

The buying decision should be made only after such a test. If the provider drags its feet, gets confused about timelines, or cannot name all the storage systems, that is also a pilot result.

Short checklist before signing

Check PII in logs

Connect LLM through AI Router and test PII requirements on a real flow.

Open account

Before signing, do not ask for a broad promise like “we delete everything.” Ask for a short set of verifiable points. If the vendor answers vaguely or turns to marketing, that is already a signal.

For LLM procurement, this minimum is usually enough to tell whether the team has a real process. That is especially true if requests pass through several layers: API gateway, logs, monitoring, backups, and internal queues.

The contract includes a deletion timeline in days. Not “within a reasonable time,” but a specific number for production data, logs, and temporary files.
The provider gives a list of systems where copies of the data may remain: main database, logs, cache, task queues, backups, and test environments.
The backup cleanup process is described. If an archived record cannot be deleted immediately, the provider should say exactly when the archive will be overwritten and who has access until then.
There is an activity log for client requests. You need a record of who received the deletion request, when the process was started, and how it ended.
A responsible person and escalation channel are named. If the issue gets stuck between the account manager and support, the deletion deadline loses all meaning.

A good sign is when the same process is described both in the contract and in the technical materials. A bad sign is when the lawyers promise deletion in 30 days, but the engineers say backups live for 90.

A simple example: a bank sends customer messages to an LLM API and then asks to delete a specific set of requests. If the provider cannot show where that data may have been stored and who will confirm cleanup, the data storage review has failed.

If you are buying a service that routes requests to different models or runs its own models on its own infrastructure, run this checklist for each layer separately. The problem usually appears exactly where everyone hoped a general statement would be enough.

What to do next

After the first meeting with the provider, do not wait for the final tender. Build your own question template right away and use it on every candidate. That makes answers easier to compare, and promises about deletion do not get lost in emails and calls.

A template like this usually needs four blocks: where requests, responses, and logs are stored; what the default retention period is and whether it can be changed; how the provider masks PII and who sees the original data; which documents confirm deletion and who signs off on those terms.

Then run a small pilot, but not on live data. Use a safe dataset without personal data and without trade secrets. In the pilot, check not only model quality, but also service behavior: what goes into logs, which events are visible in audit, whether access can be limited by key, and how quickly support answers questions about retention and deletion.

If the provider passes the pilot, put the requirements into the contract. A manager’s email or a chat message will not help in a dispute. You need wording in the data processing agreement, in the retention policy appendix, and, if possible, in a separate deletion-by-request procedure.

When comparing candidates, do not look only at price and model list. For LLM procurement, local storage, audit logs, and PII masking are often more important. That is where projects most often get stuck in banks, telecom, and the public sector.

For teams in Kazakhstan, the difference between “regional storage” and storage inside the country is often critical. So ask directly: do the data stay in Kazakhstan or move to another region, is content flagged under AI law requirements, are audit logs available, and can PII masking be enabled.

If you are comparing gateways like AI Router, check the same things in the documents and in the test, not through general promises. airouter.kz states data storage inside Kazakhstan, PII masking, audit logs, key-level rate limits, and a single OpenAI-compatible endpoint, but even then the exact contract terms and supporting artifacts decide the outcome.

The good ending here is simple: one question template, one safe pilot, and one contract where retention periods and deletion procedures are written without ambiguous wording.

Frequently asked questions

Why can’t you trust the phrase “we don’t store anything”?

Because that kind of statement proves almost nothing. A provider may not keep the prompt text in the main database, but copies often remain in logs, traces, queues, caches, and backups.

Ask for a precise answer: what is deleted, from which systems, within what timeframe, and how it is confirmed.

What should a provider count as client data?

Look wider than the prompt alone. Client data usually includes prompts and responses, system prompts, conversation history, files, RAG fragments, embeddings, request metadata, and pieces of content that end up in logs.

If the service can read it, find it, or restore it, it is still data, and the provider should name it clearly.

What must the contract say about data deletion?

Lock down the data categories, the deletion timelines after a request and after termination, the process for clearing logs and backups, and the form of deletion confirmation. Don’t leave wording like “within a reasonable time.”

A proper contract names the systems and the timeline in days. That gives you fewer disputes after an incident or at project close.

What deletion timeline is acceptable?

There is no single number for everyone, but the timeframe must be specific. Providers often set one timeline for live systems and another for backups.

If they answer without numbers, treat it as if there is no deadline at all. For procurement, that is not enough.

Should you ask separately about logs and backups?

Yes, and it is better to ask separately. That is where copies most often remain after deletion from the main system.

Ask which logs the service keeps, how long they live, what goes into backups, and when archives are overwritten. If the provider stays vague about these layers, the picture is incomplete.

What technical proof should you request from the provider?

Ask for artifacts from a live system, not a presentation. Usually a data flow diagram, a retention table, a sample deletion audit log, a list of contractors, and an explanation of PII masking are enough.

A good sign is when the example includes time, request ID, affected systems, and the outcome of the operation. A screenshot without details is of little value.

How can you test the deletion promise before signing?

Run a small test on safe data with distinctive markers. Send it through the same path you plan to use in production, then submit a deletion request and ask for the ticket number.

After that, check the confirmation: what was deleted, when it was deleted, what remained in backups, and when that will disappear too.

Does “we do not train on your data” guarantee anything?

Those are two different things. A provider may not use your data for training, but still store it in logs, caches, files, and service copies.

For you, storage matters just as much as training. So ask both questions separately.

What if the request goes through an API gateway and then to an external model?

In that setup, check the whole chain, not just one screen or one contract. The request may pass through the gateway, the external model provider, the log pipeline, and support.

Find out right away who deletes data at each stage, who is responsible for downstream partners, and who issues the final confirmation.

When should you stop the deal and stop trusting the promises?

Pause the deal if the provider cannot name the systems, does not give timelines for each layer, hides subprocessors, or falls back on generic compliance language. That kind of answer will not help in a dispute.

Another bad sign is when the sales team promises one thing and the contract or engineering team says another. It is better to stop before purchase than to sort it out after launch.