Lately, I’ve been grooving on emergent behavior. This fuzzy term comes from the equally fuzzy field of complexity studies. Mix complex rules together with non-linear effects (like humans) and you are likely to observe emergent behavior.

Recent example: web browser security holes. Any program inherently constitutes a complex system. Add in some dynamic reprogramming, downloadable code, system-level scripting, and millions upon millions of users and you’ve got a perfect petri dish. Sit back and watch the show. Unpredictable behavior will surely result.

In fact, "emergent" sometimes gets used as a synonym for "unpredictable". By and large, I believe that’s true. In traditional systems design, "unpredictable" definitely equals "sloppy". Command-and-control, baby. Emergent behavior is what happens when your program goes off the rails.

The thing is, emergent behavior is where all the really interesting things happen. Predictable programs are boring. Big batch runs are predictable.

But, you have to consider the complete system. In a big batch run, the system is linear: inputs, transformation, outputs. No feedback. No humans. When you include humans in your view of the system, all these messy feedback loops start to appear. It gets even worse when you have multiple humans connected via the programs. Feedback loops that stretch from one person, through at least two programs, out to another person and back.

Any system that involves humans will exhibit emergent behaviors – and this is a very good thing.

Are "designed" behavior and "emergent" behavior inherently incompatible? I don’t think so. I think it may be possible to design for emergent behavior. I mean that certain designs will encourage some kinds of emergent behavior, whereas other designs encourage other kinds of emergent behavior. We can study the behaviors produced by various systems and designs to build a compendium of factors that are likely to facilitate one class of behavior or another.

For example: In every corporation, I see large volumes of data stored and shared in two different formats. The nature of the two systems encourages very different behaviors.

First we have relational databases. These tend to be large, expensive systems. As a result, they are centralized to one degree or another. The nature of relational algebra is that of a static schema. Therefore, changes are rigidly controlled. Centralized, rigidly controlled assets require guardians (DBAs) and gatekeepers (data modelers). Because the schema is well-defined and changes slowly, the database gains a degree of transparency. Applications are integrated through their databases. Generic tools for backup, reporting, extraction, and modeling become possible. The data can be accessed from a variety of applications in a relatively generic fashion.

The other data storage tool I see used widely is the spreadsheet. I almost never see a spreadsheet used to calculate numbers. Instead, most are used as a schema-less data storage tool. Often created directly by the business analysts, these spreadsheets are very conducive to change. Sharing is as simple as sending the file through email. Of course, this leads to version conflicts and concurrent update issues that have to be settled by hand (usually by printing a timestamp on the hardcopies!) There is not a central definition of the data structure. Indeed, neither the data nor the structures from spreadsheets can be reused. A spreadsheet makes the 2-dimensional structure of a table obvious, but it makes relationships difficult, if not impossible, to represent. Ergo, spreadsheet users don’t do relationships. Access to the spreadsheets is always mediated by a single application.

So, two different systems. Both store structured (or at least semi-structured) data. The nature of each produces very different emergent behaviors. In one case, we find the evolution of acolytes of the RDBMS. In the other case, we find that a numeric analysis tool is being used for widespread data storage and sharing.

Given enough examples, enough time, and enough study, can we not learn to extrapolate from the essential nature of our designs to the most probable emergent behaviors? Even perhaps, to select the emergent behaviors that we desire first, and, starting from those, decide what essential nature our designs must embody to most likely to encourage those behaviors?