Simple Pieces, Dying Quickly

Late. Again. I assure you there are perfectly good reasons for this that, I further assure you, you do not want to know. Good news is things have radically improved in the last few days and the future is extremely bright. I’m squinting right now, in fact.

There are some architectural patterns that just Feel Right. That feeling of rightness isn’t always reliable — it can lead you pretty badly astray — but still, it’s there. And sometimes, what do you know, it even is right.

For years now I’ve been following research into microkernels, starting back in the early nineties when I was reading about Amoeba, an old-school distributed system structured as a set of small components. The idea of structuring an operating system as a set of isolated pieces, protected from each other and passing data only by well-defined messages, seems conceptually clean. It’s started to gain traction lately with projects such as the L4 kernel, the Coyotos project, and Microsoft’s Singularity, and IBM’s K32.

Interestingly, one of the biggest obstacles to the wider use of microkernels is Linus Torvalds. In late 2006 he had an online spat with Alex Tanenbaum, developer of Amoeba (back in the day) and Minix3 (new hotness). Linus has been saying for years that microkernels are a crock, and that operating systems are best built with extensive use of shared memory, because — in his view — what operating systems do is provide coherent views of shared state to multiple processes, and without a single shared state available to the whole kernel, providing a coherent view becomes much, much harder. Linus draws a parallel between microkernels and distributed systems, pointing out that distributed protocols are really hard to implement, precisely because you don’t have common state.

Personally, I agree strongly with Tanenbaum’s (and Shapiro’s) rebuttals. Tanenbaum points out that distributed protocols also have to deal with partial failure and reordering, which are not problems that microkernels have. Inter-component communication within a single machine’s operating system can be radically simpler than communication over a network (even though multicore machines do start introducing some timing variability). Also, of course, shared state in a monolithic kernel still requires concurrency management, and the complexity of managing concurrent access to shared kernel state is by far the biggest single source of Linux kernel bugs. Just look at all the work on massaging the locking patterns in the Linux kernel (getting rid of the Big Kernel Lock, etc., etc.). For Linus to claim that micro-component kernel development is more complex than monolithic concurrent state management is… well… not as obvious as he seems to think.

Shapiro points out that all robust engineering practice has shown that high reliability requires high isolation between small, robust components. He claims, and I agree, that there are no large-scale highly reliable systems that are not built from small, modular, isolated pieces. Certainly Erlang provides another data point that high reliability comes from many small interacting components, rather than from large-scale shared state.

Small, modular components get you other benefits, such as upgradeability (if your microkernel tracks references between components and manages inter-component message passing, you can upgrade individual pieces of your system without shutting things down — Erlang applications also work this way). Security is also enhanced if the compromise of a single component doesn’t expose the entire state of your kernel.

There are also interesting parallels between building single-machine operating systems as sets of modular services, and building large-scale Internet systems as sets of modular services. In both cases, you want to build something big and reliable from individual pieces that communicate over explicit interfaces, and that can be individually quickly restarted when they fail. In an operating system, your drivers are the flakiest piece, and when they die you want to be able to reload them without the rest of the system batting an eye. In an Internet service, your individual servers are the flakiest piece, and when they die you want to be able to fail over to other (mostly identical) servers without batting an eye. Isolation between components is critical in both cases, and you want to build your whole system in a layered way with redundancy and restartability at all levels.

Linus is correct that distributed transactions (for example) are hard to build out of individual components. But it’s also true that as you scale, distributed transactions are one of the biggest architectural system-breakers. Amazon, for instance, doesn’t use them (see also here), instead relying on careful ordering of service updates to preserve consistency in the face of intermediate failures. And many microkernel operating systems are also structured to avoid multi-component consistency interactions wherever possible.

So I think this is a rare example of where Linus is squarely on the wrong side of history, and where Linux will likely fall behind other systems that push towards greater modularity and greater internal componentization. It’s an interesting question whether large-scale Internet services will, over time, make individual-server operating systems less important — in other words, whether most applications will migrate to a highly-managed cloud, in which case most computers will wind up being more like thin clients, with all the action happening on a virtualized pool of services. But even in that case, the companies building those services will still want to leverage multi-core technology to the max, which essentially means building their individual service instances using highly isolated component architectures. Having such an architecture at the base of the operating system can only help achieve that goal, and it’s sad to think that Linus (on current evidence) will get in the way of that for Linux.

Written by robjellinghaus

2008/02/19 at 05:11

Posted in Uncategorized

robjsoftware.info