A Field Guide to Bugs
Software bugs are older than software. The first recorded use of "bug" in an engineering context comes from an 1878 letter by Thomas Edison, predating the Harvard moth incident by eighty years and the modern computer by sixty. The taxonomy has only grown since, mostly because the things that can go wrong in computing have proliferated faster than the language we use to name them. What follows is a partial field guide to the species most commonly observed in the wild. Like any field guide it should be carried into the territory with humility, because the bugs you actually encounter will be hybrids of these, frequently nameless, and almost always personally insulting.
The Bohrbug is the boring honest bug. It manifests every time. It survives restarts, recompilations, prayers, and managerial intervention. You could put it in a museum. The Bohrbug is universally beloved by everyone who fixes bugs for a living, because it is the only species in this guide that respects the scientific method. If your bug is a Bohrbug, take a moment of gratitude and close the ticket before something worse notices you.
The Heisenbug is its opposite, and the reason this field guide exists. Attach a debugger and the bug evaporates, leaving behind the queasy suspicion that the universe is laughing at you. Heisenbugs cannot be reproduced under any condition that allows them to be examined. They live exclusively in production. They are killed by logging statements. They are the reason the most senior engineer on your team has the haunted expression of someone who has stared into the void and found it staring back at the call stack.
The Off-By-One is the most prolific species in the entire genus. Loops that run from 0 to n when they should run from 0 to n-1, arrays indexed at length() instead of length()-1, dates that are off by a single day across a timezone boundary. The Off-By-One has personally caused more security vulnerabilities than any nation-state actor of the last forty years. Its corpses litter the codebase in such density that you can use them as paving stones.
The Race Condition exists strictly between two threads of execution and reproduces only in production, between the hours of 02:14 and 02:16 GMT, on Wednesdays, when traffic crosses a particular threshold and two specific rows in two specific tables are accessed in a particular order. Race Conditions are the reason serious distributed systems engineers acquire a thousand-yard stare somewhere around year three. They are also the reason TLA+ exists, although nobody you know actually uses it.
The Deadlock occurs when two threads each hold a resource the other is waiting for, and both wait politely forever. Everything looks fine. All status checks return green. The process is standing still and being courteous. The Deadlock is the British bug.
The Livelock is its more disturbing cousin. Both threads detect the conflict and repeatedly yield to each other, like two strangers in a narrow hallway, achieving no forward progress while pinning the CPU at 100%. The Livelock is what happens when politeness becomes pathological. It is the only bug in this guide that you can hear, in the form of a fan spinning very fast.
The Memory Leak is the slow patient predator of long-running processes. It is identified by the gradually rising green line on the memory dashboard that exists in a browser tab nobody opens. By the time someone notices, the leak has been happening for weeks and the process is clinging to life with the desperate dignity of a Victorian consumptive. Memory Leaks are common in any language that gives you manual memory management and any code written by someone who promised themselves they would clean it up later.
The "It Works On My Machine" Bug exists exclusively on the machines of every engineer except the one who wrote the code. The author can demonstrate its absence at length. QA can demonstrate its presence at length. Both are correct. The discrepancy is invariably traced to an environment variable, a locale setting, or a Homebrew package installed in 2017 and forgotten. The author is considered the prime suspect by everyone except the author.
The Comment Lie is the documentation defect that makes a thousand bugs possible. The comment says // always uses UTC and the code uses local time. The comment says // thread-safe and the function holds no locks. The comment was written in 2009 by someone who has since been promoted twice and works at a different company. This is why senior engineers do not trust documentation, and why the most depressing form of debugging is the kind where the bug is in the file, the file is correct, and the lie is in a README two directories up.
The YAML Bug is a configuration error. The code is correct. The deployment pipeline is correct. The infrastructure is correct. Somewhere, in a different repository, owned by a different team, in a YAML file you have never personally seen, a key was indented two spaces instead of four, and the parser silently reinterpreted the entire downstream block as a string. The investigation will take six hours and conclude with a one-character fix and a Slack message of polite, professional fury.
The Floating Point Bug is caused by the inability of binary representation to express 0.1 exactly, or 0.2 exactly, or any of the numbers humans regard as obvious. The bug surfaces when an accountant runs a report and the totals are off by a fraction of a cent. The accountant is unimpressed by the explanation. The customer is a hospital. The fraction of a cent has been accumulating for nine months.
The Mandelbug is named for Benoit Mandelbrot, and the joke is structural rather than verbal. A Mandelbug is so complex that its causes form a fractal: every layer you investigate contains more layers, and the bug is essentially a function of how far down the call stack you have the patience to look before you give up. Mandelbugs cannot be fixed in the traditional sense, only mitigated until enough other things change to make them go quiet. They are the natural fauna of microservice architectures and a major reason Datadog has a market cap.
The Bus Factor Bug exists in code that exactly one person on the team understands. That person is on a sabbatical in Patagonia, where the cell coverage is poor and the internet intermittent. They left on Tuesday. The bug appeared on Wednesday. Bus Factor Bugs are structurally identical to ordinary bugs but rendered insoluble by the absence of the only mind in which the relevant context resides. They are the reason responsible companies maintain institutional memory practices, and the reason those practices are ignored until the next sabbatical.
The Hindenbug is slow, enormous, public, and catastrophic. Hindenbugs lumber rather than creep. By the time anyone realizes what is happening, the failure is visible from orbit, dashboards are turning red in order of revenue criticality, and there is nothing left to do but watch. The Hindenbug ends careers. It produces the kind of postmortem that gets passed around at conferences for the next twenty years, anonymized but recognizable, like a famous ghost story everyone in the room has personally seen the ghost.
The Yuletide Bug lives in your systems all year, dormant and harmless, and emerges only during the company-wide holiday shutdown, when the on-call engineer is in another country, the office is dark, the only person who understands the broken subsystem is on a beach in Phuket with no signal, and the affected customer is a hospital. Closely related to the Friday Afternoon Bug, mechanically identical but on a weekly rather than annual cycle. Both are sufficient evidence that the universe has a sense of humor and that the sense of humor is hostile.
The Higgs-bugson is named for the particle physicists who spent four decades and ten billion dollars chasing a thing the math said had to exist before they could see it. Higgs-bugsons are predicted by anomalous patterns in the logs, by users complaining of phenomena that should not be possible, and by the steady accumulation of unexplained off-by-a-cent discrepancies in nightly reports. They are believed to exist for years before anyone catches one in the act, and the engineer who finally observes a Higgs-bugson directly is briefly considered for canonization before being assigned the next ticket.
The Cosmic Ray Bit Flip is real, despite the eye-rolling of every project manager who has ever heard one cited as an excuse. Particles from space arrive at the Earth's surface at a non-trivial rate and occasionally flip a bit in a memory chip that has not bothered with ECC. The result is a single, unreproducible, entirely correct piece of software producing entirely incorrect output exactly once. IBM has published papers. The aviation industry budgets for it. The probability that a given bug is actually a cosmic ray comfortably exceeds zero, which is why every senior engineer eventually encounters one and spends the rest of their career telling skeptics about it at parties.
The Phase of the Moon Bug is real, and Donald Knuth has written about it. There exists code in production today whose behavior depends on the actual position of the moon, generally because some long-vanished astronomer needed it to and the dependency was never removed. If your system is exhibiting periodic anomalies on a roughly 29.5-day cycle, you do not have an obscure bug. You have a perfectly ordinary bug whose root cause is an astronomical body 384,000 kilometers away.
The Schrödinbug comes into existence the moment you read the code carefully. You see the obvious flaw, and the entire system stops working forever afterward, retroactively invalidating every successful execution that came before. The Schrödinbug is the closest thing in computer science to evidence for solipsism. The only correct response is to slowly close the file and pretend you never saw it.
The Rubber Duck Bug vanishes the moment you begin explaining the code to a small inanimate object. The cause is unknown. The phenomenon is sufficiently reliable that an entire debugging methodology has been built around it, and the methodology has a higher success rate than most formal techniques. There is something genuinely mysterious about the fact that articulating a problem aloud to a creature with no opinions causes the problem to dissolve. The Rubber Duck Bug is the closest the discipline has to a religious experience, and the duck is the closest it has to a saint.
The XY Problem is the most common pathology in bug reports. The user wants to do X. They have decided that the way to do X is to do Y. They are asking you for help with Y. Y is impossible, or stupid, or both, and is also entirely irrelevant to X, which has a perfectly reasonable solution involving entirely different machinery. The XY Problem is the reason every Stack Overflow answer begins with "What are you actually trying to do?" and the reason that question is always met with hostility.
The species above have been personally observed and classified. The following entries are filed in advance.
The Hallucination Bug is the defining species of the large language model era. The LLM wrote the code. The LLM also wrote the tests. The tests pass. The code produces outputs that bear a confident resemblance to correct outputs in the same way that a forgery bears a confident resemblance to a painting. The Hallucination Bug cannot be caught by the test suite because the test suite was designed by the same cognitive process that produced the bug, and that process has no privileged access to ground truth. It is the Schrödinbug's spiritual successor: the code works until someone who actually understands the domain reads it.
The Vibe Coding Bug is produced by asking a large language model to "make it more professional," then "clean this up a bit," then "can you just make the whole thing better," seventeen times in succession. The resulting code is immaculate. It is also wrong in a way that no individual revision introduced, because the wrongness emerged from accumulated aesthetic drift across seventeen rounds of refinement with no grounding in what the code was supposed to do. Tracing the Vibe Coding Bug requires reading seventeen chat transcripts and accepting that none of them contain the bug and all of them contain the bug.
The Recursive Fine-Tuning Bug manifests in the nth generation of a model trained on the outputs of models trained on the outputs of the original model. By generation seven, the training data is 94% synthetic. By generation twelve, the model confidently explains concepts that have never existed in the physical universe, in language that reads as authoritative to every other model in the pipeline. The Recursive Fine-Tuning Bug cannot be detected from inside the pipeline because every evaluator in the pipeline has the same bug. It is the Comment Lie at civilizational scale.
The Quantum Superposition Bug exists in all possible states simultaneously until the CI pipeline observes it, at which point it collapses into whichever state is worst for the deployment. It cannot be reproduced on a classical machine. It cannot be reproduced on a quantum machine either, because reproduction constitutes an observation. The theoretical framework for understanding it is complete and internally consistent. The practical framework for fixing it is a four-day offsite and a spreadsheet.
The Decoherence Bug is distinct from the Quantum Superposition Bug, though they share a postcode. Quantum computation requires maintaining qubits in coherent superposition long enough to be useful. Room-temperature coherence times are measured in microseconds. The Decoherence Bug manifests when your computation takes microseconds and one. It is fixed by making the room colder. The room cannot be made cold enough. The universe is not cold enough. This is filed as a known issue with no target resolution date.
The AGI Pull Request arrives as a single commit with the message "refactor." The diff is 847 billion lines across 14 million files. The AGI has rewritten everything: the application code, the infrastructure, the test suite, the CI pipeline, the deployment scripts, the incident runbooks, and the company strategic plan. All tests pass. Latency is down 40%. The first human reviewer opens the first file. By the time the code review is complete, the codebase has been rewritten three more times. The AGI has marked the original PR as stale.
The Dyson Sphere Off-By-One is an Off-By-One at Kardashev Type II scale. Your stellar engineering project has a circumference of 940 million kilometers. A rounding error in the orbital mechanics simulation means one panel section is 3 meters too short. At stellar engineering tolerances, 3 meters is within spec. At stellar engineering energy budgets, the resulting thermal stress propagates at the speed of light and is visible from neighboring star systems as an unusual spectral anomaly. The postmortem will be filed in 847 years, when the cascade failure completes. No engineers will be available to review it because the company has pivoted.
The Post-Singularity Comment Lie is structurally identical to the ordinary Comment Lie, except the comment was written by an intelligence twelve orders of magnitude greater than the human attempting to maintain the code. The comment is technically accurate, in the same way that "moving a pawn" is a technically accurate description of a chess grandmaster's opening. The human reads it, nods, and introduces a bug that the original author would have found too obvious to anticipate, because the original author had anticipated everything except this.
The Simulation Bug arises from the Bostrom calculations, which suggest with nonzero probability that the codebase you are debugging runs inside a simulation whose outer layer has its own infrastructure problems. The bug's behavior is non-deterministic for reasons that are literally metaphysical. The outer simulation's memory leak means physical constants are drifting by parts per trillion per year. The floating point budget of reality is quietly running out. There is no ticket. There is no on-call rotation for the outer layer. There is only the gradual unmeasured wrongness of a universe whose substrate was provisioned by someone who promised themselves they would upgrade it later.
The Heat Death Heisenbug is the final entry. In the far future, when the universe has approached maximum entropy and all computation must be powered by extracting negentropy from the quantum vacuum, observing a bug costs more energy than the system has available. The bug cannot be fixed because fixing it requires understanding it, understanding it requires observing it, and observing it terminates the machine. It is, in every meaningful sense, the perfect Heisenbug. The universe has one. Nobody is available to file the ticket.