Testing in Production — Building Observable Distributed Systems

Complex systems exhibit unexpected behavior. — John Gall, The Systems Bible One reaction to the myriad failure cases we encounter with distributed systems is to add more testing. Unfortunately, testing is a best-effort verification of system correctness — we simply cannot predict the failure cases that will happen in production. What’s more, any environment that we use to verify system behaviour is — at best — a pale imitation of our production environment....

September 25, 2018 · 8 min · Kevin Sookocheff