Semantic Debt in AI‑Assisted Codebases
Systematic mistakes that can derail AI developers
Eric Hestenes is a technology industry and product development veteran with experience in artificial intelligence, media tech, non-profit, open source, ed-tech, and financial services aka fintech.
AI tools can write a lot of code quickly. The trap is that they can also mint a lot of names quickly—names that are plausible in the moment but become confusing (and bug‑shaped) as the product grows. With AI you can code super-fast, and generate defects and debts just as fast. This post is guidance for people learning to code with AI (and learning to avoid AI‑created messes).
When using AI to write code, large language models (LLMs) make systematic mistakes. One of them is that they often make poor choices about names for functions, variables, and similar. Even the expensive, premium LLMs make these mistakes, because all LLMs struggle with missing context. LLMs operate with incomplete context, and their performance degrades consistently when key context is missing or ambiguous. For example, they often choose the most obvious name, without reference to other similar names, or with no specificity or qualifications, and then they do that again and again, so you after a while you may have several different parts that all have the same obvious name.
Over the last few weeks, many of my “fixes” weren’t logic fixes or performance work. They were cleanups of what I’ve started calling semantic debt: places where the code looks good but where the LLM is obviously confused and keeps making mistakes due to the wording in the code. The code may have worked in a particular context at one time, but in a larger context, the vague names became a source of repetitive and costly technical mistakes by the LLM.
Typically the names are "almost right", to the point that they are deceptive, and specifically they mislead the LLM when it reads its own code later. These are just some examples that happened recently:
- The phrase
CLIENTis used by the llm to make sense of code that has multiple clients and services. - A function named
handle_max_iterationswas named based on recent work instead of the actual behavior that was implemented. - Sometimes an LLM writes code that the LLM does not know how to edit.
- Sometimes log statements that say things that are false.
All these examples are confusing to LLMs.
What “semantic debt” means (and why it’s different from technical debt)
Semantic debt is the gap between what a name suggests and what the code actually does. "Semantic debt" isn't a legacy term you’ll find in a 1990s textbook, it is a perfect descriptor for the "hallucination-adjacent" logic errors that plague AI-assisted development.
- Technical debt: “This works, but it’s messy or slow or hard to maintain.”
- Semantic debt: “This might work, but the LLM makes mistakes with it.”
Semantic debt is uniquely dangerous because it causes real bugs:
- Developers (and machines) trust names during debugging.
- Reviewers infer behavior from function names.
- Search results become polluted with false matches.
- Logs mislead on-call responders.
- Bulk edits can make bulk mistakes
If you can’t trust the names, you can’t safely change the system.
The lifecycle of semantic debt (a pattern I keep seeing)
Semantic debt tends to accumulate in stages:
- A vague or non-specific initial name is chosen
- A semantic mismatch arises between the name and reality
- Semantic collisions appear because the name is ambiguous
- Semantically-driven edits wreak havoc
- A clean-up iteration is needed to fix the thing that confused the LLM
This is why semantic debt isn’t just pedantry. It creates operational mistakes.
Example 1: When “Client” means three different things
In one system, we had three peer systems:
- A React UI running in the browser
- A Tauri desktop runtime which implements both a react front end and also acts as a file server for native files
- A Python backend hosted that serves as a gateway to multiple cloud hosted tools.
Tauri is a framework for bridging web technology to native platforms, similar to Electron and React Native. An AI assistant helped build the Tauri system and labeled the Tauri layer the client, because it was the WebSocket client in that specific flow.
Reasonable—until you notice in the code that the phrase "client" is ambiguous. We also had:
- “Client role” (network topology: client vs server)
- “Client-side JavaScript” (running on the web)
- “Client-side JavaScript” (running in Tauri)
- “Client application” (anything calling our API)
So the LLM ended up making reasoning mistakes such as:
If tool location equals
CLIENT, run it on the client; otherwise run it on the backend.
Which client? The browser? The desktop runtime? Some third-party integrator? The answer depended on context. And too often the LLM cannot figure that out because it is learning by a shallow or syntactic scan of the code, rather than actually understanding the code. In other words, the LLM infers a semantic match between the name used and what is happening nearby.
How to fix semantic defects
Typically what would first happen is some really nasty code edit by the LLM, where they write some new code based on shallow semantic meaning rather than actual functionality. To prevent this from happening, it was necessary to review ambiguous vocabulary and make adjustments.
In some cases, we would simply change the name to a semantically accurate name. For example handle_max_iterations did not accurately describe what the function was doing, and it was renamed to force-tool-loop-completion.
In other cases, we add qualifying words that take a word generic like directory and change it to workspace-directory , which has a particular and specific meaning in the codebase.
We replaced the ambiguous vocabulary with layer names that have proper semantics. Where the system was using CLIENT a certain way, we renamed the uses to NATIVE or CLOUD , and suddenly the LLM could reason about how it was mixing up logic that must be handled in the native file system.
And in another case, we had renamed some functions, but the LLM had peppered the code with literal strings for those functions, such that when main function name changed, suddenly the LLM could not make sense of the literal strings. In this case, the fix was to stop using magic strings and instead force all the literal strings into a file of constants. After that, any tool name changes were immediately obvious.
This isn’t about being clever. It’s about choosing words that are not ambiguous and stay stable as you add more features. Rule of thumb: architectural layer names should be semantically accurate.
Example 2: Next.js chooses runtime performance over LLM performance.
Next.js forces one to use the same filename for every route module:
- app/dashboard/page.tsx
- app/settings/page.tsx
- app/profile/page.tsx
Every route is called page.tsx. Every layout is called layout.tsx. Every loading state is called loading.tsx.
When an LLM sees: "Edit the page component to add a new button", the LLM cannot tell which page is which. There are 50 files all named page.tsx.
When searching for context: "Find where the user profile page renders", the LLM sees page.tsx in search results 50 times. It has to open each one and read the content to figure out which is the profile page.
When making changes: "Update the page to include error handling" , the LLM picks the wrong page.tsx because the filename gives zero semantic information. It has to rely entirely on directory path context, which often gets truncated in tool outputs or lost in conversation context.
The LLM wastes tool calls opening wrong files, makes changes to the wrong page, or asks clarifying questions that wouldn't be needed if files were named dashboard-page.tsx, settings-page.tsx, etc. Hopefully in a future generation Next.js is enhanced so that it does not keep this anti-pattern that confuses the LLM.
This is an example of framework-enforced semantic debt. As such, it shows how Next.js may be more costly to use compared to other frameworks with lesser performance. In the case of AI, using an LLM with Next.js may directly multiply the cost of Next.js maintenance. It might work well in small repos, but it does not scale.
Example 3: Magic strings—how renaming a function made code unmaintainable
Thanks to the LLM, we had LLM tool names scattered across the codebase as plain strings:
if toolName == "ripgrep_search" then cost = 15if toolName == "ripgrep_search" then format output differentlyif toolName == "ripgrep_search" then log to backchannel
Twelve files. Dozens of string literals.
Then we refactored for clarity:
ripgrep_searchneeded to becomenative_ripgrep_searchto distinguish it from cloud tools.
We updated the definition and the main registry and thought we were done.
We were not done.
Half the scattered string comparisons silently stopped matching. Cost tracking broke. Logging broke. Routing logic broke. The failure mode was runtime behavior—not compile-time errors.
The fix: create a single source of truth
Instead of repeating string literals, define canonical constants in one place, a constants file.
Now:
- Refactors are one edit, not a scavenger hunt.
- IDE autocomplete prevents typos.
- The “official spelling” exists in exactly one location.
Semantic debt here wasn’t “strings are bad.” It was “names are duplicated without a canonical home.”
To be blunt, this is not a new coding mistake, except that the LLM makes this mistake without hesitation. It will rename something and then lose track of all the impacted code. Literally, it will orphan whole modules with a change or two, and without recognizing what it did, leaving a mess behind. So one has to guide the system to write better code. The point is that some code patterns the LLM may use also cannot be understood by the same LLM. It is well known that LLMs make mistakes due to lack of context, and this pattern is one of those types of mistakes: code orphans.
How to avoid semantic debt (a practical checklist)
1) Make sure names are specific and not generic
LLMs tend to choose obvious names, and as a rule, a specific name is much better than an obvious name because specific names are easy to work with on subsequent iterations. Obvious names pollute search results and this in turn can misdirect the LLM away from the content that matters most.
2) Pair generic words with a qualifier
Generic words are simply not specific. Because they are generic, they are more likely to cause collisions and LLM confusion. If you pair a generic term with something that is specific in your given context, that can greatly help to disambiguate the name.
3) Treat semantic mismatches as a defect
If a function or variable name is misleading, then it will lead to problems.
If you are unsure what name to use, ask the LLM to describe what the function does, and them write a name that is more accurate. This is not difficult.
4) Replace magic numbers and strings with constants.
LLMs may struggle with magic numbers and strings. By replacing them with constants, you change how the LLM looks at the information.
5) Be careful about masking how things work with code layers.
Layered code is part of our reality, But keep in mind that the LLM cannot reasonably trace through many layers of code unless that is the primary task they are working on. Most of the time, they are relying on the semantic names to determine which direction to move in.
6) Acknowledge Semantic Debt
Some code may be technically correct and hard to understand by humans or LLM. Consider that a mistake. Code should be easy to understand. Writing obfuscated code might be fun, but obtuse code is not a best practice in most cases. In most cases, it is not necessary to write or keep confusing code; just fix it.
Treat semantic debt as a P1 maintenance bug—because it’s a bug factory.
Why AI makes this worse (and how it can also help)
AI assistants are especially prone to semantic debt because they:
- pick locally-reasonable names without global context
- default to professional-sounding generic nouns
- don’t anticipate future naming collisions
- generate quickly—before ambiguity becomes obvious
But AI can also be part of the solution:
- it can scan the entire codebase for collisions instantly
- it can suggest names based on usage patterns
- it can automate refactors once you’ve chosen the right concept
The trick is to prompt for it. Ask:
- “Name this by what it does, not when it’s called.”
- “Check if this name already means something else in the codebase.”
- “Suggest a name that will still be accurate if we add a second backend.”
The real cost
Every semantic-debt fix I’ve done had a predictable price tag:
- 30–60 minutes to understand what the code actually does
- updating 5–20 call sites
- adjusting logs, dashboards, and docs
- worrying about missed references
But the bigger cost is quieter: time wasted because someone trusted a name, formed the wrong mental model, and made a change that couldn’t possibly fix the problem.
Good names make bugs obvious. Bad names hide them.
Semantic debt is technical debt’s sneaky cousin.
- Technical debt: messy but honest.
- Semantic debt: neat-looking but misleading.
If you’re building with AI, assume this failure mode will show up—then build habits to catch it early.



