CTO Articles

Home > News > CTO Articles

IT World
E-Business in the Enterprise – May 11, 2004

The demise of Hara-Kiri computing

By Sean Mc Grath

If you disembowel any run-of-the-mill software application and examine its entrails - the software - you will notice that a significant amount of effort goes into dealing with errors. Pass within hearing distance of a white board when application developers are sketching a design and you will hear lots of questions of the form "what if X happens?". Most of this "what if" talk is to do with handling errors.

A non-specialist might conclude that all the what-if analysis is aimed at ensuring that the system can detect and then recover from errors gracefully. However, in the vast majority of cases, this is not what is motivating the development team. Instead, the name of the error handling game is to detect errors and then *die* with as much grace as possible. The emphasis is on graceful death, hara-kiri style, rather than graceful error detection followed by correction/resumption.

The unfortunate fact of the matter is that the problems involved in building software that can gracefully recover from errors and continue working is beyond the state of the art. The good news is that we at least have a name for where we want to get to. It is called autonomic computing[1] and is an active field of research at the moment.

The bad news is that we really, really need to have a handle on how to make autonomic systems before we can begin to realize the dream of plug-n-play web services.

Why? Let me make a domestic analogy. Imagine having one four year old kid to look after in a china shop. One kid is a pretty easy thing to track. It has finite bounds and all the non-transient component pieces are  always co-located. Now imagine having two four year old kids in the china shop. Things are now tricky but still manageable. After all, you have two hands, two eyes. It's tough but you can cope - just. Now add a third kid. Well, now  you are in deep trouble. The difference between managing two kids and managing three kids is enormous as any mother of three or more will tell you.

A standalone application on a PC is like the one kid scenario. All the application is in one place. If bad things happen, the application stops, you fix it and restart it. Simple.

A client/server application is like the two kid scenario. When something goes wrong, either the client or the server (or both) close down. If it is the client, you may be able to fix the problem without bringing the server down but you need to be very careful. If it is the server that dies, you may be able to fix it without interrupting all the clients but you need to be very careful. Error handling in a client/server application is trickier than in a standalone application but still manageable - just.

A web services application is like the three kid scenario. You have three or more completely independent applications working together. They may be miles apart physically and/or organizationally. If something goes wrong, you do not have the luxury of just stopping everything, fixing it and starting again. You do not have the luxury of upgrading all the component pieces. (They are not all your kids!)

It is perhaps tempting to hope that some web services error handling pattern will emerge and the geeks will just make it all work. Perhaps. I'm doubtful that the geeks can - or indeed should - address this problem on their own. Reason being, it is not a purely technological problem.

Let us return to our white boarding developers again. Ask any developer where the troublesome error conditions in web service applications come from. You will be told that a significant proportion can be tracked back to *change*. Changes in one component which have knock-on effects in other components. And the major source of change is? The real world business environment. An environment in which the only constant is change.

When working out an error handling strategy, we need to address the problem of business-related change from the get-go. The bad news is that this is rarely done at the moment. The good news is that there is an increasing awareness that it needs to be done. An important first step on the road to truly autonomic computing is forward compatibility[2]. For web services. This entails coding now so that future changes in the way your service talks to other services will not cause your own service to fall over. If we can make all web services well-behaved with respect to forward compatibility we will be in much better shape than we are right now.

[1] http://www.autonomic-conference.org/
[2] http://www.pacificspirit.com/blog/2003/12/22/formal%20compatibility%20definition

 

http://seanmcgrath.blogspot.com