CTO Articles

Home > News > CTO Articles

IT World
E-Business in the Enterprise – November 11, 2003

The impotence of numbers

By Sean Mc Grath

I heard a lecture by Marvin Minsky[1] once in which he said that any time you see a number on its own you should think to yourself 'how sad'. The point being, that a number is the result of some sort of calculation or measurement. If all you have visibility of is the resultant number, you have lost potentially useful information. Is "4" the result of adding two and two or the result of adding three and one? Perhaps it is irrelevant, perhaps it isn't.

Regardless of the provenance of any given number, of undoubted relevance to most data processing is the stuff that invariably surrounds it -context. Numbers do not work well on their own - they require context in order to be properly interpreted. Indeed, context can be said to give raw numbers their true meaning. There is a big difference - as anyone putting an orbiter around Mars will tell you - between metric measurements and imperial measurements[2]. There is a big difference between 1 million US dollars turnover and 1 million US dollars gross profit as any business person will tell you.

Context matters. Context matters a whole lot. The concept of context brings us by commodius vicus of recirculation[3] back to the idea of looking at numbers and being sad as a consequence. You see, from time to time I see someone looking at an XML document and saying something like 'that is the integer value 4'. My response is, usually, a private intonation of 'how sad'. It is sad because the four-ness of four is the least interesting thing about it. All the really interesting stuff is the context that *surrounds* the number four.

Communicating that context information is essentially XML's gift to mankind, in my opinion.  XML allows you to attach arbitrarily luscious and plump context onto the insipid bones of mere numbers - its called markup. A wonderful, expressive facility that is there for all of us to use.

In (my interpretation) of a markup view of the world, there is an uncountably large number of contexts and basically only one type of raw data - text. All data is basically text that has been annotated with context information - markup - that allows you to view the data as a hierarchical structure. A structure in which you drill down into layers of context to locate pieces of raw information - text.

Having found some text, you use the surrounding context information to put flesh on the bones of the raw data - to interpret its meaning. Is the  piece of text to be interpreted as an off balance sheet item in US dollars? Perhaps it is the time it takes for a photon of light to pass from one side of a chocolate brownie to another? Perhaps it is average wing size of a pre-puberty Balrog[3]. How about the total number of creatures living on the lip of a lobster[4]? Any of these pieces of information is significantly more interesting than the fact that the text can be interpreted as a floating point integer. Wouldn't you agree? If so, the next time you see an XML application that is festooned with concepts like 'field' and 'type' and 'gYearMonth', you might join me in this private susurration: 'how sad'.

[1] http://web.media.mit.edu/~minsky/
[2] http://www.canoe.ca/CNEWSSpace9910/01_metric.html
[3] http://fan.theonering.net/middleearthtours/balrog.html
[4] http://www.microscopy-uk.org.uk/mag/articles/pandora.html