A blog about software – researching it, developing it, and contemplating its future.

Archive for the ‘conscientious software’ Category

Conscientious Software, part 3

leave a comment »

(Sigh, keeping on a schedule is pretty darn tough. Sorry for week hiatus. Onwards! Finally finishing off this particular sub-series of posts.)

Conscientious Software, Part 3: Sick Software

Final part in a 3-part series. Part 1, Part 2.

An interesting thing happened when garbage collection came into wider use. Back when I was doing mostly C++ programming, a memory error often killed the program dead with a seg fault. You’d deallocate something and then try to use a dangling pointer, and whammo, your program core dumps. Fast hard death. Even once you fixed the dangling pointer bugs, if your program had a memory leak, you’d use more and more memory and then run out — and whammo, your program would crash, game over, man!

Once garbage collection came along, I remember being delighted. No more running out of heap! No more memory leaks! Whoa there, not so, is it? You could still leak memory, if your program had a cache that was accumulating stale data. But now instead of killing your program immediately, the garbage collector would just start running more and more often. Your program started running slower and slower. It was still working… kind of. But it wasn’t healthy.

It was sick.

Sick software is software that is functioning poorly because of essentially autopoietic defects. It’s fail-slow software, instead of fail-fast software. A computer with a faulty virus checker, infected with trojans and spyware that are consuming its resources, is sick. A memory-leaking Java program is sick. These systems still function, but poorly. And the fix can be harder to identify, because the problems, whatever they are, are more difficult (or impossible) to trace back to your own (allopoietic) code. Fail-fast environments make it clear when your code is at fault. Fail-slow environments, self-sustaining environments, are trying to work around problems but failing to do so.

Now, there is every likelihood that as systems scale up further, we will have to deal more and more with sick software. Perhaps it’s an inevitability. I hear that Google, for instance, is having to cope with unbelievable amounts of link spam — bogus sites that link to each other, often running on botnets or other synthesized / stolen resources. In some sense this is analogous to a memory leak at Internet scale — huge amounts of useless information that somehow has to be detected as such. Except this is worse, because it’s a viral attack on Google’s resources, not a fault in Google’s code itself. (I realize I’m conflating issues here, but like I said, this posting series is about provocative analogies, not conceptual rigor.)

I think there is a real and fundamental tension here. We want our systems to be more adaptive, more reactive, more self-sustaining. But we also want guarantees that our systems will reliably perform. Those two desires are, at the extremes, fundamentally in opposition. Speaking for myself as an engineer, and even just as a human being, I find that certainty is comforting… humans fundamentally want reassurance that Things Won’t Break. Because when things stop working, especially really large systems, it causes Big Problems.

So we want our systems to be self-sustaining, but not at the cost of predictability and reliability. And as we scale up, it becomes more and more difficult to have both. Again, consider Google. Google has GFS and BigTable, which is great. But once it has those systems, it then needs more systems on top of them to track all the projects and the storage that exist in those systems. It needs garbage collection within its storage pool. The system needs more self-knowledge in order to effectively manage (maybe even self-manage) its autopoietic resources. And in the face of link spam, the system needs more ability to detect and manage maliciously injected garbage.

Returning to the original paper, they spend a fair amount of time discussing the desire for software to be aware of and compliant with its environment. They give many potential examples of applications reconfiguring their interfaces, their plugins, their overall functioning, to work more compatibly with the user’s pre-existing preferences and other software. While extremely thought-provoking, I also find myself somewhat boggled by the level of coupling they seem to propose. Exactly how does their proposed computing environment describe and define the application customizations that they want to share? How are the boundaries set between the environment and the applications within the environment?

In fact, that’s a fundamental question in an autopoietic system. Where are the boundaries? Here’s another autopoietic/allopoietic tension. In an allopoietic system, we immediately think in terms of boundaries, interfaces, modules, layers. We handle problems by decomposition, breaking them down into smaller sub-problems. But in an autopoietic system, the system itself requires self-management knowledge, knowledge of its own structure. Effective intervention in its own operation may require a higher layer to operate on a lower layer. This severely threatens modularity, and introduces feedback loops which can either stabilize the system (if used well) or destabilize the system (if used poorly). This is one of the main reactions I have when reading the original paper — how do you effectively control a system composed of cross-module feedback loops? How do you define proper functioning for such a system, and how do you manage its state space to promote (and if possible, even guarantee) hysteresis? This circles back to my last post, in that what we may ultimately want is a provable logic (or at least a provable mathematics) of engineered autopoietic systems.

There are some means to achieving these goals. You can characterize the valid operating envelope of a large system, and to provide mechanisms for rate throttling, load shedding, resource repurposing, and other autopoietic functions to enable the system to preserve its valid operating regime. It’s likely that autopoietic systems will be initially defined in terms of the allopoietic service they provide to the outside world, and that their safe operating limits will be characterized in terms of service-level agreements that can be defined clearly and monitored in a tractable way. This is like a thermometer that, when the system starts getting overworked, tells the system that it’s feverish and it needs to take some time off.

So the destiny of large systems is clear: we need to be more deliberate about thinking of large systems as self-monitoring and to some extent self-aware. We need to characterize large systems in terms of their expected operating parameters, and we need to clearly define the means by which these systems cope with situations outside those parameters. We need to use strong engineering techniques to ensure as much reliability as possible wherever it is possible, and we need to use robust distributed programming practices to build the whole system in an innately decomposed and internally robust way. Over time, we will need to apply multi-level control theory to these systems, to enhance their self-control capabilities. And ultimately, we will need to further enhance our ability to describe the layered structure of these systems, to allow self-monitoring at multiple levels — both the basic issues of hardware and failure management, the intermediate issues of application upgrade and service-level monitoring, and the higher-level application issues of spam control, customer provisioning, and creation of new applications along with their self-monitoring capabilities.

We’ve made much progress, but there’s a lot more to do, and the benefits will be immense. I hope I’ve shown how the concept of conscientious software is both valid and very grounded in many areas of current practice and current research. I greatly look forward to our future progress!


Written by robjellinghaus

2007/10/23 at 04:14

Conscientious Software, Part 2: Provable Robustness

leave a comment »

Second post in a series. Part 1.

Onwards! Now, what’s really interesting to me is the further implications of this autopoietic / allopoietic dichotomy throughout the software stack. The Conscientious Software paper talks about potential means of implementing autopoietic software in terms of more vaguely defined membranes and other sorts of perception-oriented interfaces between software components, along the lines of Jaron Lanier’s work.

To me, that is a lot harder to think about than the Google example, in which a lot of allopoietic engineering (very clear definition of subsystems, logical contracts, operational proofs, and other extremely “rigid” software engineering techniques) was used to create a system that, to a large extent, is reliably self-sustaining.

In other words, it seems to me that we want our autopoietic systems to be as reliable as we can make them. And the best techniques we have for designing something to be reliable are essentially allopoietic techniques.

Let’s look at this another way. A lot of programming language research right now is devoted towards integrating proof systems into various languages. In some sense, every strongly typed language has a proof system built into it. A type system is a proof system (see Wadler’s paper on Proofs As Programs). The power of these proof systems and type systems is steadily increasing — a lot of work lately is going into logics for proving file system safety, data structure heap correctness, distributed messaging consistency, and on and on.

We are continually developing conceptual techniques to increase the accuracy and correctness with which we can describe the desired properties of a system, and prove that the system in fact has those properties as implemented.

It is a fascinating question, to me, whether all those tools can be turned to the purpose of developing autopoietic systems. Let’s take one example: the above-linked work on logically proving that a file system is reliable.

The intention of this work is to model the state of the computer’s memory relative to the known state of the computer’s disk, in terms of what the memory believes to be true versus what the disk’s contents declare as true. Logical statements relate changes in the memory beliefs to changes in the disk’s beliefs. Overall, there is an error if at any point one can prove that the system’s disk beliefs are inconsistent with memory.

In a sense, this proof work is intended to make the file system robust in the face of any and all environmental failures. The robustness comes from construction, not necessarily from recovery. This is building reliability into the system at the very bottom.

It is not too difficult to postulate that such logics could be further developed, for example, to characterize the consistency properties of the Google File System, or of other large-scale distributed software infrastructures. (Boy, what a handwaving sentence! Yes, it’s not too difficult to postulate that. Actually doing it, on the other hand, is likely to be extremely difficult indeed. But over time even hard problems can get solved….)

Another direction in which to deliver increased reliability is internal self-protection. Almost all major organisms on Earth are composed of cellular structures. A cell is defined by its boundary — without a boundary, a cell does not exist. Some programming languages have similarly oriented themselves around the concept of bounded, encapsulated units which are composed in large numbers to create a greater whole. Possibly the most industrially tested and publicly available example of such a language is Erlang, which composes large and immensely reliable systems out of many small software components, which communicate only by asynchronous message passing. There is no global state in an Erlang program, and no shared state among independent objects. This makes Erlang innately fault-tolerant, insofar as the failure of any one component cannot immediately cause other components to fail. This also allows the system to be upgraded while continuing to run. A really successful autopoietic system eventually reaches a point where it needs to provide continuous availability, with no externally visible downtime at all, ever. Systems decomposed into pure message-passing subcomponents inherently support this, since individual components can be replaced cleanly while the rest of the system continues normal operation.

Of course, autopoietic layers need to be built as other Erlang objects (a common Erlang idiom is to have a number of worker objects overseen by a smaller pool of supervisor objects — this is exactly an autopoietic software layer), but the innate properties of the language contribute to the enforced internal modularity which leads not only to improved stability but also to ease of distribution. (Autopoietic systems will necessarily also be distributed and parallel systems, since any individual hardware unit may fail, and if the overall system is to survive it must have redundant parallel hardware.)

Large-scale autopoietic systems currently, and in the near future, will combine these properties. They will be systems that provide a general-purpose computing environment, with well-defined and consistent contracts, which are engineered with as much built-in and provable correctness as we can give them, which are modularized into independent and well-protected components, and which provide sufficiently reliable guarantees to the allopoietic programs running in those environments.

What is especially interesting is that this view flies in the face of the original paper’s claim that what’s needed is more vagueness, more inconsistency if you will, more tolerance between components for partial failures or partial communication. I’m not convinced by this part of their argument, at all. I think that we currently lack the ability to engineer in that style. Perhaps ultimately we’ll make greater use of artificial evolution for engineering design, and I would expect such design tools to be far better than we are at leveraging interfaces that are ambiguous or otherwise not rigidly defined.

Next: what happens if software can get sick?

Written by robjellinghaus

2007/09/18 at 04:47

Conscientious Software, Part 1: Allopoiesis and Autopoiesis

leave a comment »

Recently some Sun researchers wrote a very interesting paper about the future evolution of software. The paper seeks to characterize a new type of software, which they term “conscientious”. I want to dig into this concept a bit.

This will be a multi-post series; this first post lays the groundwork, then the subsequent posts will delve into some further implications of the basic concepts.

This post will make much more sense if you read their paper first; otherwise, hang on 🙂 Their paper is very much a manifesto, in the sense that it aggressively commingles concepts from biology, engineering, and computer science to propose a very ambitious and not-fully-understood thesis. So, this post is going to do that too, and conceptual rigor be damned. It’s brainstorming time.

Conscientious software might be described (paraphrasing their paper heavily) as software which has a currently unknown degree of self-awareness, in the sense that it has the ability to test itself, to analyze its own functioning, to address external or internal malfunctions appropriately, and to maintain its operation under a variety of adverse conditions. And, of course, to actually do some job that we consider valuable.

There are two kinds of systems they talk about extensively. These are allopoietic systems and autopoietic systems. The technical definitions here come from biology, but I’ll paraphrase heavily and just say that an autopoietic system creates itself, sustains itself, and produces itself, whereas an allopoietic system is externally created and produces something other than itself.

An example they give is the distinction between a living amoeba and a watch. An amoeba consists of a large number of cellular structures and chemical processes. A living amoeba is a dynamic system — its continued existence depends on its active metabolism, by which it takes in nutrients, processes them, excretes material toxic to it, seeks out hospitable environments, and avoids dangerous ones. All the metabolic cycles and other processes that constitute the amoeba are primarily oriented towards self-sustainment.

A watch, on the other hand, has almost no ability to sustain itself. Its sole purpose is to tell time, and that purpose is only useful to the users of the watch, not to the watch itself. If the watch malfunctions, it must be repaired externally.

What does this have to do with software? They claim — and I think I agree — that software needs to develop in a more autopoietic direction. Fragile software, with frequent malfunctions, tricky configuration, and inadequate self-monitoring and self-repair capabilities, is increasingly hard to scale to larger and more complex functions. Yet we clearly want software to be capable of more than it currently is. Autopoiesis is a very thought-provoking concept when considering how to design such self-sustaining systems.

They make another point, which is that a purely autopoietic system — one which is only concerned with self-sustainment — is of little value to us. We want a system which can both sustain itself and produce something valuable. Their paper draws a picture of an autopoietic system containing an allopoietic core — the self-sustaining system devotes a large part of its resources to solving specific, externally defined problems.

This is all terribly abstract. What would such a system look like? Are we talking about some kind of muzzy fuzzy genetically engineered software? What are some examples of autopoietic software today?

They start with garbage collection as an example of an autopoietic software function. In some sense, memory is the environment in which software exists. As software runs, it populates that memory. Without lots of careful management, the memory can easily become cluttered with no-longer-used but not-yet-deallocated objects. It’s no big stretch to consider garbage objects as waste products of the software’s metabolism. (Well, OK, it is a stretch, but like I said, we’re brainstorming here.)

Garbage collection was originally developed to enable LISP programs to be developed in the way that was conceptually cleanest — in other words, in a largely functional, allocation-intensive style — without complicating them with memory reclamation logic. Externalizing the garbage collection code meant that the LISP program itself could more or less just do its job, with the underlying metabolism of its environment responsible for cleaning up its waste.

Another example (mine, not theirs), and probably one of the most well-known (or at least well-documented) large-scale examples, is Google’s infrastructure. Google has developed numerous systems — the Google File System, the BigTable large-scale database and the Chubby lock system — that are inherently widely distributed and fault-tolerant. These systems actively monitor themselves, replicate their data around failed components, and generally self-manage on top of a widely distributed pool of faulty servers. It is even less of a stretch to think of these systems as implementing a metabolic environment for user programs that is composed from cells (servers) that routinely die and are replaced, while the entire system has a dynamic, self-sustaining continuity. (I’m sure many Google admins would laugh very hard at the phrase “self-sustaining” here, but nonetheless these systems do internally monitor and repair themselves.)

The allopoietic component of the Google environment is the applications that actually run on the underlying autopoietic Google services. These applications can (mostly) ignore the details of the hardware they run on and the storage they use; they can simply consume the resources they need (in both persistent storage and CPU) to perform their job.

OK. So that’s autopoiesis and allopoiesis as applied to software as we currently know it. Next post: allopoiesis as the foundation of autopoiesis.

Written by robjellinghaus

2007/09/07 at 03:22