Humans are great storytellers and even better story-listeners. We love to hear stories so much that when there aren't any available, we make them up on our own. From an early age, children grasp the idea of narrative. Even if they don't understand the forms of storytelling so much, you can hear a four-year-old weave a linked list of events from her day. We look for stories behind everything.
Continue Reading »-
Root Cause Analysis as Storytelling -
Release It Second Edition in Beta I’m excited to announce the beta of Release It! Second edition. It’s been ten years since the first edition was released. Many of the lessons in that book hold strong. Some are even more relevant today than in 2007. But a few things have changed. For one thing, capacity management is much less of an issue today. The rise of the cloud means that developers are more exposed to networks than ever.
Continue Reading » -
Spectrum of Change I’ve come to believe that every system implicitly defines a spectrum of changes, ordered by their likelihood. As designers and developers, we make decisions about what to embody as architecture, code, and data based on known requirements and our experience and intuition. We pick some kinds of changes and say they are so likely that we should represent the current choice as data in the system. For instance, who are the users?
Continue Reading » -
Queuing for QA Queues are the enemy of high-velocity flow. When we see them in our software, we know they will be a performance limiter. We should look at them in our processes the same way. I've seen meeting rooms full of development managers with a chart of the year, trying to allocate which week each dev project will enter the QA environment. Any project that gets done too early just has to wait its turn in QA.
Continue Reading » -
Availability and Stability Last post covered technical definitions of fault, error, and failure. In this post we will apply these definitions in a system. Our context is a long-running service or server. It handles requests from many different consumers. Consumers may be human users, as in the case of a web site, or they may be other programs. Engineering literature has many definitions of "availability." For our purpose we will use observed availability.
Continue Reading » -
Fault, Error, Failure Our systems suffer many insults when they contact the real world. Flaky inputs, unreliable networks, and misbehaving users, to name just a few. As we design our components and systems to thrive in the only environment that matters, it pays to have mental schema and language to discuss the issues. A fault is an incorrect internal state in your software. Faults are often introduced at component, module, or subsystem boundaries.
Continue Reading » -
Power Systems This is an excerpt from something I'm working on this Labor Day holiday: – Large scale power outages act a lot like software failures. It starts with a small event, like a power line grounding out on a tree. Ordinarily that would be no big deal but under high-stress conditions it can turn into a cascading failure that affects millions of people. We can also learn from how power gets restored after an outage.
Continue Reading » -
Remember DAT? Do you remember Digital Audio Tape? DAT was supposed to have all the advantages of digital audio—high fidelity and perfect reproduction—plus the "advantages" of tape. (Presumably those advantages did not include melting on the dashboard of your Chevy Chevelle or spontaneously turning into The Best of Queen after a fortnight.) In hindsight, we can see that DAT was a twilight product. As the sun set on the cassette era, DAT was an attempt to bridge the discontinuous technology change to digital music production.
Continue Reading » -
QA Instability Implies Production Instability Many companies that have trouble delivering software on time exhibit a common pathology. Developers working on the next release are frequently interrupted for production support issues with the current release. These interrupts never appear in project schedules but can take up half of the developers' hours. When you include the cost of task-switching, this means less than half of their available time is spent on the new feature work.
Continue Reading » -
Wittgenstein and Design What does a philosopher born in the 19th Century have to say about software design? More than you might think, particularly his ideas about family resemblance. Wittgenstein used the subject of "games" to illustrate an idea. We'll start with a counter-example. Suppose we operate with the then-prevailing notion that words are defined like sets in axiomatic set theory. Then there is a decision procedure that will let us decide whether something is a member of the set "
Continue Reading »