A short while back, I did a brief series on the value of "dirty data"---copious amounts of unstructured, non-relational data created by the many interactions user have with your site and each other. ReadWriteWeb has a post up about Four Ad-Free Ways that Mined Data Can Make Money, along very similar lines. Well worth a read.
-
ReadWriteWeb on Dirty Data -
97 Things Every Software Architect Should Know O'Reilly is creating a new line of "community-authored" books. One of them is called "97 Thing Every Software Architect Should Know". All of the "97 Things" books will be created by wiki, with the best entries being selected from all the wiki contributions. I've contributed several axioms that have been selected for the book: Talk about the arch, but see the scaffolding beneath itYou're negotiating more often than you thinkSoftware architecture has ethical consequencesEverything will ultimately fail Engineer in the white spaces Long-time readers of this blog may recognize some of these themes.
Continue Reading » -
How Buildings Learn Stewart Brand's famous book How Buildings Learnhas been on my reading queue for a while, possibly a few years. Now that I've begun reading it, I wish I had gotten it sooner. Listen to this: The finished-looking model and visually obsessive renderings dominate the let's-do-it meeting, so that shallow guesses are frozen as deep decisions. All the design intelligence gets forced to the earliest part of the building process, when everyone knows the least about what is really needed.
Continue Reading » -
Dan Pritchett on Availability Dan Pritchett is a man after my own heart. His latest post talks about the path to availability enlightenment. The obvious path--reliable components and vendor-supported commercial software--leads only to tears. You can begin on the path to enlightenment when you set aside dreams of perfect software running on perfect hardware, talking over perfect networks. Instead, embrace the reality of fallible components. Don't design around them, design for them. How do you design for failure-prone components?
Continue Reading » -
Agile Tool Vendors There seems to be something inherently contradictory about "Enterprise" agile tool vendors. There's never been a tool invented that's as flexible in use or process as the 3x5 card. No matter what, any tool must embed some notion of a process, or at least a meta-process. I've looked at several of the "agile lifecycle management" and "agile project management" tools this week. To me, they all look exactly like regular project management tools.
Continue Reading » -
Beyond the Village As an organization scales up, it must navigate several transitions. If it fails to make these transitions well, it will stall out or disappear. One of them happens when the company grows larger than "village-sized". In a village of about 150 people or less, it's possible for you to know everyone else. Larger than that, and you need some kind of secondary structures, because personal relationships don't reach from every person to every other person.
Continue Reading » -
S3 Outage Report and Perspective Amazon has issued a more detailed statement explaining the S3 outage from June 20, 2008. In my company, we'd call this a "Post Incident Report" or PIR. It has all the necessary sections: Observed behaviorRoot cause analysisFollowup actions: corrective and operationalThis is exactly what I'd expect from any mature service provider. There are a few interesting bits from the report. First, the condition seems to have arisen from an unexpected failure mode in the platform's self-management protocol.
Continue Reading » -
Article on Building Robust Messaging Applications I've talked before about adopting a failure-oriented mindset. That means you should expect every component of your system or application to someday fail. In fact, they'll usually fail at the worst possible times. When a component does fail, whatever unit of work it's processing at the time will most likely be lost. If that unit of work is backed up by a transactional database, well, you're in luck. The database will do it's Omega-13 bit on the transaction and it'll be like nothing ever happened.
Continue Reading » -
Kingpins of Filthy Data If large amounts of dirty data are actually valuable, how do you go about collecting it? Who's in the best position to amass huge piles? One strategy is to scavenge publicly visible data. Go screen-scrape whatever you can from web sites. That's Google's approach, along with one camp of the Semantic Web tribe. Another approach is to give something away in exchange for that data. Position yourself as a connector or hub.
Continue Reading » -
Inverting the Clickstream Continuing my theme of filthy data. A few years ago, there was a lot of excitement around clickstream analysis. This was the idea that, by watching a user's clicks around a website, you could predict things about that user. What a backwards idea. For any given user, you can imagine an huge number of plausible explanations for any given browsing session. You'll never enumerate all the use cases that motivate someone to spend ten minutes on seven pages of your web site.
Continue Reading »