Testing in Production — Building Observable Distributed Systems
Complex systems exhibit unexpected behavior.
— John Gall, The Systems Bible
One reaction to the myriad failure cases we encounter with distributed systems is to add more testing. Unfortunately, testing is a best-effort verification of system correctness — we simply cannot predict the failure cases that will happen in production. What’s more, any environment that we use to verify system behaviour is — at best — a pale imitation of our production environment.
[Read More]