Every single person I've heard talk about Continuous Delivery says you have to change your system's architecture to succeed with it. Despite that, we keep seeing "lift and shift" efforts. So I was happy to be invited to join a panel to discuss architecture for Continuous Delivery. We had an online discussion last Tuesday on the C9D9 series, hosted by Electric Cloud.
They made the recording available immediately after the panel, along with a shiny new embed code.
Best of all, they supplied a transcript, so I can share some excerpts here. (Lightly edited for grammar, since I have relatives who are editors and I must face them with my head held high.)
Pipeline Orchestration
It's easy to focus on the pipeline as the thing that delivers code into production. But I want to talk about two other central roles that it plays. One, with regards to risk management. To me the pipeline is not so much about ushering code out to production, but it's about finding every opportunity to reject a harmful change, or a bad change prior to let it get into production. So I view the pipeline as an essential part of risk management.
I've also had a lot of lean training, so I'd look on the deployment pipeline as the value stream that developers use to deliver value to their customers. In that respect we need to think about the pipeline as production-grade infrastructure, and we need to treat it with production-like SLAs.
Cattle, Not Pets
I think a lot has been said about "cattle versus pets" over the last ten years or so. I just want to add one thing - the real challenge is identity. There are ton of systems and frameworks that implicitly assume stable identity on machines. Particularly a lot of distributed software toolkits. When you do have the cattle model, a machine identity may disappear and never come back again. I just really hope you're not building up a queue of undelivered messages for that machine.
Service Orientation and Decoupling
Having teams running in parallel and being able develop more or less
independently - I talk about team scale autonomy. But if there are
very long builds, large artifacts and large number of artifacts, I
regard that as the consequence of using languages and tools that are
early bound and early linked. I don’t think it's any accident that the
people I heard of first doing continuous delivery were using PHP. You
can regard each PHP file as its own deployable artifact, and so things
move very quickly. If everything we wrote was extremely late bound, then
our deployment would be an rsync
command. So to an extent, breaking
things down into services is a response to large artifacts, long build
times, that's one side of that.
The other side is team scale autonomy and the fact that you can't beat Conway’s Law and that absolutely holds true. (Conway’s Law: an organization is constrained to produce software that recapitulates the structure of the organization itself. If you have four teams working on a compiler, you're going to have a four pass compiler.)
Now, when we talk about decoupling, I need to talk about two different types of decoupling, both important.
The bigger your team gets, the more communication overhead goes up. We have known that since the 1960s, so breaking that down makes sense. But then we have to recompose things at runtime and that's when coupling becomes a big issue. Operational coupling happens minute by minute by minute. If I have service A calling service B, service B goes down, I have to have some response. If I don't do anything else, service A is also going to go down. So I need to build in some mechanisms to provide operational decoupling, maybe that's a cache, maybe it’s timeouts, maybe it's a circuit breaker, something along those lines, to protect one service from the failure of another service.
It's not just the failure of the service! A deployment to the other service looks exactly like a failure from the perspective of the consumer. It's simply not responding to request within an acceptable time.
So we have to pay attention to the operational decoupling.
Semantic coupling is even more insidious, and that's what plays out over a span of months and years. We talk about API versioning quite a bit, but there other kinds of semantic coupling that creep in. I've been harping a lot lately about identifiers. If I have to pass an itemID to another system then I'm sort of implicitly saying there is one universe of itemIDs and that system has them all, and I can only talk to that system for items with those IDs.
Similarly with many services that we create, we create the service as though there is one instance of the service. We'd be better off creating the code that can instantiate that service many times for many consumers. So if you create a calendar service, don’t make one calendar that everyone has eventIDs on. Make a calendar service where we can ask for a new calendar and it gives you back a URL for a whole new calendar that is yours and only yours. This is the way you would build it if you were building a SaaS business. That's how you would need to think about the decoupled services internally.
Messaging and Data Management
If I'm truly deploying continuously then I've got version N and version N+1 running against the same data source. So I need some way to accommodate that. In older less-flexible kinds of databases, that means triggers, shims, extra views, that kind of scaffolding.
I heard a great a story, I think it's from Pinterest at Velocity a couple of years back. They had started with a monolithic user database and found they needed to split the table. After they already had 60 million users! But they were able to make many small deployments that each added kind of one step for an incremental migration. And once they got that in place, they let it sit for three months, at the end of that they found who was left and did a batch migration of those. Then they did a series of incremental deployments to remove the extra data management stuff.
So it's one of those cases - doing continuous delivery both necessitates that you're more sophisticated about your data changes, but it also gives you new tools to accomplish those changes.
There are a wide crop of databases that don't require that kind of care and feeding when you make deployments. If you are truly architecting for operational ease and delivery, then that might be a sufficient reason to choose one of the newer databases over one of the less flexible relational stores.
Conclusion
The C9D9 discussion was quite enjoyable. The hosts ran the panel well, and even though all of us are pretty long-winded, nobody was able to filibuster. I'll be happy to join them again for another discussion some time.