Note: This piece originally appeared in the "Marbles Monthly" newsletter in April 2003
I caught an incredibly entertaining special on The Learning Channel last week. A bunch of academics decided that they were going to build an authentic Roman-style catapult, based on some ancient descriptions. They had great plans, engineering expertise, and some really dedicated and creative builders. The plan was to hurl a 57 pound stone 400 yards, with a machine that weighed 30 tons. It was amazing to see the builders faces swing between hope and fear. The excitement mingled with apprehension.
At one point, the head carpenter said that it would be wonderful to see it work, but "I'm fairly certain it's going to boink." I immediately knew what he meant. "Boink" sums up all the myriad ways this massive device could go horribly wrong and wreak havoc upon them all. It could fall over on somebody. It could break, releasing all that kinetic energy in the wrong direction, or in every direction. The ball could fly off backwards. The rope might relax so much that it just did nothing. One of the throwing arms could break. They could both break. In other words, it could do anything other than what it was intended to do.
That sounds pretty familiar. I see the same expressions on my teammates' faces every day. This enormous project we're slaving on could fall over and crush us all into jelly. It could consume our hours, our minds, and our every waking hour. Worst case, it might cost us our families, our health, our passion. It could embarrass the company, or cost it tons of money. In fact, just about the most benign thing it could do is nothing.
So how do you make a system that don't boink? It is hard enough just making the system do what it is supposed to. The good news is that some simple "do's and don'ts" will take us a long way toward non-boinkage.
Automation is Your Friend #1: Runs lots of tests -- and run them all the time
Automated unit tests and automated functional tests will guarantee that you don't backslide. They provide concrete evidence of your functionality, and they force you to keep your code integrated.
Automation is Your Friend #2: Be fanatic about build and deployment processes
A reliable, fully automated build process will prevent headaches and heartbreaks. A bad process--or a manual process--will introduce errors and make it harder to deliver on an iterative cycle.
Start with a fully automated build script on day one. Start planning your first production-class deployment right away, and execute a deployment within the first three weeks. A build machine (it can be a workstation) should create a complete, installable numbered package. That same package should be delivered into each environment. That way, you can be absolutely certain that QA gets exactly the same build that went into integration testing.
Avoid the temptation to check out the source code to each environment. An unbelievable amount of downtime can be traced to a version label being changed between when the QA build and the production build got done.
Everything In Its Place
Keep things separated that either change at different speeds. Log files change very fast, so isolate them. Data changes a little less quickly but is still dynamic. "Content" changes slower yet, but is still faster than code. Configuration settings usually come somewhere between code and content. Each of these things should go in their own location, isolated and protected from each other.
Be transparent
Log everything interesting that happens. Log every exception or warning. Log the start and end of long-running tasks. Always make sure your logs include a timestamp!
Be sure to make the location of your log files configurable. It's not usually a good idea to keep log files in the same filesystem as your code or data. Filling up a filesystem with logs should not bring your system down.
Keep your configuration out of your code
It is always a good idea to separate metadata from code. This includes settings like host names, port numbers, database URLs and passwords, and external integrations.
A good configuration plan will allow your system to exist in different environments -- QA versus production, for example. It should also allow for clustered or replicated installations.
Keep your code and your data separated
The object-oriented approach is a good wasy to build software, but it's a lousy way to deploy systems. Code changes at a different frequency than data. Keep them separated. For example, in a web system, it should be easy to deploy a new code drop without disrupting the content of the site. Likewise, new content should not affect the code.