Aug 03, 2024·8 min read

Domain Search Glossary: Often More Useful Than the Model

A domain search glossary helps the system understand company terms, synonyms, and codes. Often it brings more accuracy than switching models.

What goes wrong without a shared glossary

Without a shared glossary, search quickly loses the meaning of the query. In one company, the same object may be called different things: "contract", "agreement", "deal", "form for signature". For a person, these are almost the same. For a search system, they are different words and different paths to documents.

Because of that, an employee often finds not the best material, but the one where the wording just happened to match. In the knowledge base, sales writes "lead", lawyers write "prospective client", and support writes "new account". The meaning is close, but the results become uneven: one person gets an exact answer, another asks about almost the same thing and finds nothing useful.

Abbreviations are another problem. Inside a company, they live their own life. One team writes the full name, another has already shortened it to three letters, and a third uses an older version. Search and the model do not know that "DMS", "voluntary health insurance", and the internal tag in your HR system mean the same thing.

The same thing happens with codes, roles, and product names. To a model, "P-214", "Loan 2.0", or "point curator role" is just a set of symbols if you have not explained what it means in advance. It fills in the meaning from broad language patterns. For internal work, that is usually not enough.

That is where the effect appears that teams do not notice right away: employees get different answers to the same question. One asks about the "SME limit", another writes "conditions for small and medium business", and a third searches by the name of an internal program. The intent is the same, the wording is different, and the results diverge.

Without a glossary, the system only looks smart on simple queries. As soon as internal jargon, old abbreviations, or product codes come into play, RAG quality drops. And the problem is usually not a weak model. More often, it simply was not given the company's glossary, which everyday work is built on.

What a corporate glossary should contain

A good corporate terminology dictionary is not a showcase of "correct words" from a presentation. It is a map of how people actually write in emails, requests, policies, chats, and old documents. Its goal is simple: help search understand where there is one meaning and where there are different entities.

One entry usually includes:

the main term;
working variants used by teams;
abbreviations and colloquial forms;
codes, form numbers, tariff names, service names, and document numbers;
a short note on where the term is appropriate and what it should not be confused with.

Without this, search across internal data quickly loses accuracy. An employee enters "KYC form", the database has "customer identification form", and in a conversation someone writes just "questionnaire". A person can easily see the connection. Search without a glossary cannot.

The benefit is especially clear where Russian, English, and internal jargon are mixed. One team says "dashboard", another says "catalog", and a third says "product feed". If this is one object, the glossary should connect them. If they are different things, it should separate them and briefly explain the difference.

Codes also cannot be ignored. People often search not by name, but by "F-12", "TP-3", "tariff 451", or an old order number. These labels give a strong boost in quality because they constantly appear in Excel, templates, scans, and tickets.

It is useful to store outdated and undesired wording separately. An old product name, an old form code, or one team's slang still lives in archives. There is no need to delete it. It is better to mark it so it can be searched for, but the current term is ranked higher in answers and results.

And one more thing that is often underestimated: a short explanation. Just one line can sometimes help more than fine-tuning the model. For example, "card limit" and "credit limit" sound similar, but in a specific company they may mean different things.

If an entry cannot be shown to a new employee and they would not understand when to use the term, the entry is not ready yet.

Why a glossary is often more useful than switching models

The model gets the query too late. If a person searches for "HRD", but all the documents say "human resources administration", the mistake starts before the answer is even generated. A glossary removes this ambiguity in advance: it brings the query and the documents to the same language.

A newer model may write more neatly, hold context better, and confuse general meaning less often. But it will not guess the company's internal language. It does not know that "plastic" in a bank may mean a bank card, that "dashboard" in retail is not a shelf but a table in storage, or that employees still call an old system by the project name from five years ago.

That is why switching models often gives only a modest improvement, while a glossary fixes the source of the confusion itself. When search understands that "DMS", "voluntary health insurance", and an internal service code mean the same thing, it finds the right documents more often. RAG, meaning answers based on retrieved documents, gets a more accurate context. The chatbot asks fewer unnecessary follow-up questions and makes fewer terminology mistakes.

One list of terms helps in several places at once: it expands the query using synonyms and abbreviations, normalizes names in indexes and documents, suggests which fragments are close in meaning, and makes answers more consistent.

A glossary has another practical advantage: it is easier to measure. The team adds 40 terms and a week later checks whether there were fewer empty results, whether first-answer accuracy improved, and whether the number of manual clarifications went down. With a model switch, the picture is usually blurrier: prompts, context format, temperature, and the test set itself all affect the result at the same time.

A glossary also often wins on cost. It is cheaper to maintain a list of terms than to constantly migrate between models, repeat tests, and pay for extra tokens. For a bank, telecom company, SaaS product, or internal knowledge base, that is a pretty practical conclusion: agree on the words first, then argue about the model.

Where a glossary gives the biggest effect

The biggest gain appears where people's language does not match the language of the documents. An employee writes briefly and plainly, while the system stores long official wording, old names, and internal codes. Without a glossary, search misses the mark on the first step.

This is usually easy to see in the knowledge base, instructions, and policies. A person searches for "day off", but the document says "additional day of rest". They search for "work from home", but the rule is written as "remote work mode". The meaning is the same, the wording is different.

The second common area is products, tariffs, and internal labels. In a bank, telecom, or SaaS team, employees rarely remember the exact service name. They write "old premium business tariff" or enter a code known only to part of the team. If the glossary knows that "B2B Pro 2024", "premium for legal entities", and the product code refer to the same object, the system finds the right document faster.

The effect after renaming is especially noticeable. After a rebrand, team merger, or system replacement, old terms live on for years. An employee searches for "CRM-2", even though in the new documents the product already has a different name. Or they enter the old department name, which no longer exists in the org chart but still appears in emails and tickets. A glossary merges such versions into one entity and removes extra noise.

A glossary also works well where people mix everyday and official words. This is common in HR, finance, procurement, and support. A doctor writes "medical record", a lawyer writes "medical documentation", and an operator searches for "patient card". Sometimes all three mean the same set of documents.

There is a simple way to find these areas quickly. If employees often ask again about a document name, if one product has two or three working names, if queries are full of unresolved abbreviations and old words, then a glossary will have a fast effect.

This is where RAG quality improves the most. If search retrieves the wrong passage, even a strong model will usually just restate the mistake with more confidence.

How to build the first version

Build a single LLM layer

Give engineers one OpenAI-compatible gateway for tests, RAG, and production.

Connect the API

The first version of the glossary does not need to be big. If you try to describe the entire language of the company at once, the work will stall. To start, it is better to use not abstract terms, but real traces of work.

Start with the queries people already enter. Export common wording from internal search, support bots, corporate chats, and the service desk. That is where real abbreviations, conversational names, old product codes, typos, and team habits quickly show up.

Then add terms from documents where the language is already fixed: instructions, email templates, CRM cards, field names in forms, policies, and the knowledge base. That shows the difference between the official form and how employees actually search for the entity in daily life.

Then bring everything into one simple table. For each entry, five fields are usually enough:

the main term;
synonyms and abbreviations;
common incorrect variants;
the department or process where the word is used;
a short explanation if the term is ambiguous.

At this stage, duplicates almost always appear. One team writes "DMS", another writes "voluntary health insurance", and the full name appears in documents. For search, these are not three different entities, but one entry with several variants.

It is better not to sort out disputed words alone. Send a short list to process owners: sales, support, legal, finance, IT. Ask them to quickly mark which terms are equivalent and which must not be mixed. One such review often removes mistakes that would otherwise hurt search for weeks.

The first iteration should not be stretched out. Usually 100-200 entries are enough if you start with the most frequent and most painful cases. After launch, watch what people search for without results and update the table once a month. It sounds boring, but that is how the glossary starts to bring real value.

A simple company example

In a bank, an employee opens internal search and types: "card limit". They need an exact answer for a customer: where the spending boundary is, who can change it, and what exceptions exist. But in the policy, the needed parameter has long been written as "credit threshold" because that was its name in the old system.

Search without a shared glossary often does not see the connection between these words. It looks for almost literal matches and surfaces news, call center notes, or general product instructions. The needed document exists, but the employee does not reach it. Then they spend time asking colleagues or answering the customer from memory. That becomes a risk.

A glossary fixes the situation quite simply. It records that "card limit" and "credit threshold" mean the same thing in this company. After that, search links both expressions and ranks the right document higher. Changing the model for such a case is often not necessary.

The same mechanism helps the chatbot. An employee asks: "What card limit can be increased without manual approval?" The bot searches internal data, finds the section about the "credit threshold", and answers according to the policy, not a guess. The answer is shorter, more accurate, and calmer.

In practice, there are many such pairs: an old term from an accounting system, the business's everyday word, or the name employees use in conversation. A corporate glossary removes that gap. Sometimes one such layer has a bigger effect than moving to a new model, because it fixes not the style of the answer, but the path to the right document.

How to integrate the glossary into search

Leave the code as is

Change the provider and model without rewriting the client layer around the OpenAI API.

Try the gateway

The glossary should work on the input side. First, the system brings the query to the approved form, and only then does it run full-text search, vector search, or RAG. If you normalize the words after the answer, the right documents may already have been left out of the selection.

A simple example: an employee writes "DBO for legal entities", while the documents all say "remote banking service". If search sees only the original abbreviation, it will miss part of the relevant passages. If you first expand the abbreviation and add the approved form, the chance of finding the right text is much higher.

Usually the process looks like this: the system receives the original query, replaces abbreviations, typos, and local names with canonical forms, adds close synonyms with different weights, and only then sends the normalized query to search.

It is important that not all synonyms are equal. One variant is almost a perfect match in meaning, while another is only partially related. That is why it is better to store the glossary not as a simple list, but as a set of mappings with weights. Then "digital signature" and "electronic digital signature" can be treated as almost a full match, while broader or more conversational forms are weakened so they do not pull the result off course.

What to pass on

The normalized query should go into the same flow where you build the answer and calculate metrics. If RAG receives one wording while quality reports look at another, the team sees a distorted picture.

It is useful to keep both versions of the query: the original and the normalized one. The original shows how people really write. The normalized one helps you understand what effect the glossary had and which substitutions work poorly.

For teams that test several LLMs in parallel, this is especially important. You can switch the model quickly, but if search did not retrieve the right context, even a strong LLM will not fix the omission.

Mistakes that break the result

Poor search usually breaks not the model, but the glossary itself. The team changes the prompt, adjusts reranking, tests a new LLM, while documents still keep missing the query. The answer looks smooth, but it relies on the wrong sources.

The first common mistake is collecting terms without an owner and an update date. Then old product names, retired abbreviations, and one team's local shorthand live in the glossary for months. Six months later, nobody remembers why the entry was added, and search pulls in extra documents.

The second mistake is mixing a product name and an ordinary word in one entry. If an internal service is called "Dashboard", you cannot equate it with every everyday use of the word "dashboard". Otherwise, results will start mixing in documents not about the service, but about reports, interfaces, and screen descriptions.

The third mistake is adding too many equal synonyms. On paper, this looks neat, but in real use different words often mean neighboring, not identical, things. If you equate "customer card", "customer profile", and "customer form" without any notes, the system will start expanding queries too broadly.

Another problem is ignoring typos, word forms, and the habit of writing things in different ways. A user enters an abbreviation, an old name, a plural form, or a word missing one letter. If the glossary knows only the perfect form, search misses the needed document.

And the most expensive mistake is trying to fix the answer with the model instead of search data. If the retriever did not find the right document, even a strong model will not guess the internal term from thin air.

To help the glossary live longer and create less noise, each entry should ideally keep five things: the main term, allowed variants and abbreviations, conflicting meanings or exception words, the owner of the entry, and the date of the last review.

Checks before launch

Pick a model for your domain

Route requests to different providers and see where search makes fewer mistakes.

Choose a model

Before launch, it is better to test search on real queries from chats, tickets, and email. Not on pretty examples, but on the way people actually write every day.

If the glossary is built well, that shows right away. The user enters a familiar query and finds the right document on the first try. If they have to rephrase it, expand the abbreviation, or guess the official name, the glossary is still rough.

A few simple checks are enough. Take 20 common queries and see whether the right document opens immediately. Compare results for an abbreviation and the full name. Check old names of products, teams, and processes. Work through ambiguous words with two meanings. And ask two or three colleagues to add a new term without your help: if they do not understand where to write it and who approves it, the glossary will get outdated quickly.

It is useful to look not only at success, but also at the type of mistake. If search returns a nearly correct document, one synonym is often enough. If it misses completely, the term is attached to the wrong thing or the old name is missing from the glossary.

This kind of mini-test is usually enough to catch the most expensive failures before release. One evening of checks often brings more value than another week of arguing about the model.

What to do next

After the first version, do not try to cover the whole company at once. Take one process where search affects daily work: support answers, policy search, procurement, or the internal knowledge base.

Then test the system on live queries. Collect 30-50 phrasings from employees who really search for documents, instructions, or customer records. In such a sample, old names, internal abbreviations, and words that people write differently quickly show up.

After that, the work is fairly practical: record the current result, add terms and unwanted substitutions, run the same sample again, and compare. Look at simple metrics: how many queries found the right document, whether the correct answer appeared in the top 3, and whether there were fewer empty results and strange matches.

Then you need a short policy. Three rules are enough: who can suggest a new term, who approves it, and how often the team reviews disputed substitutions. If a word changes meaning across departments, do not force it together. It is better to keep two versions with a context note.

The working rhythm is simple too: review new no-result queries once a week and clean up duplicates once a month. It does not take much time, but it keeps search in working shape.

If the team is testing several LLMs in parallel, it is convenient to keep routing and quality checks in one layer. For example, AI Router on airouter.kz provides a single OpenAI-compatible endpoint, so the same test set can be run through different models without rewriting the SDK, code, or prompts. But even in that setup, the glossary remains the base: if search did not find the right context, it is already too late to change the model.

If the glossary still does not help after that, the problem is often not the terms, but the data itself: poor document names, extra copies, broken fields, or weak chunking. In that case, you need to fix the index and the content, not the word list.