CTO Articles

Home > News > CTO Articles

IT World
XML IN PRACTICE --- 23/05/2001

Type is a Four Letter Word

By Sean Mc Grath

Suppose you are traveling on the Trans-Siberian Railway. Suppose something bad happens to the locomotive just East of Moscow and you are told that it could be many days before you reach the next stop to stretch your legs. So you sit back in your cabin, resigning yourself to a long wait.

You need something to amuse yourself, but all you have at your disposal is the attentions of a smiling XML technologist you share the cabin with. So you search for a topic of conversation that will keep both of you from going mad with boredom, a good meaty topic that will run and run....

Should you find yourself in such a situation, I have topic of conversation that fits the bill.

Suppose I show you this XML fragment:
<weight>12</weight>

Now I ask you the question, "What type is the piece of information known as 'weight'?"

I am not exaggerating when I say that this simple sounding question opens up vistas of debate that could fill a library with dissertations. The concept of "type" is a real doozie; a harmless looking four-letter word that packs a massive punch! More than enough to keep the conversation flowing from Moscow to Vladivostok.

One possible answer is that the value of the variable "weight" is a variable of type "string" with the value "12". Another possible answer is that weight is a variable of type "positive integer" with the value "12".

Now let us make a little modification and ask the same question: "What type is the piece of information known as 'weight'?"
<weight>+0012</weight>

Is it the string "+0012" or is it still the positive integer 12?

I suspect you can see the slippery slope we are on here? The "twelveness" of the weight variable is an interpretation of the underlying characters "+0012" in the XML document. A form of semantic Polaroid we are seeing the XML through. At its lowest level, XML has but one single, universal type -- string. An XML document, is first and foremost a string, all else is layers of interpretation.The question of types in XML boils down to this: Is the twelveness of the weight variable something that XML should be intimately concerned with or should that be left to higher levels of processing?

If XML gets involved in such things, it effects not only XML but also all its surrounding courtiers and hand maidens such as XPath, XLink, XQuery, and XSLT. In order for them to act in concert on the "twelvness" of the weight variable, they all need to share the same set of types.

In order to do this, something has to establish "twelveness" as a concept that everyone can agree on. W3C XML Schema does this with its set of data types. Then, the twelveness of the variable "weight" needs to be exposed in a form that XPath, XQuery, etc... can see it. The Post Schema Validation Infoset (PSVI) does this.

Unfortunately, nailing down "twelveness" comes at significant cost, as anyone who has read the W3C XML Schema documents and PSVI specification will tell you. So much so that some are inclined to think that maybe all this data typing in the bowels of XML just isn't worth it.

The strongest form of this argument is that all this data typing stuff is not really in keeping with the ethos of XML. Moreover, if what you want is tight universal agreement of data types and a tight API for talking to strongly typed data, XML is not a good place to start. Try ASN.1 or Java Object Serialization!

Lets not get caught up in that debate here. Lets ask a more fundamental question. Did XML succeed *precisely* because it did not have strong data typing or *in spite* of not having strong data typing?

To get an answer to that question, we will need more time than a mere Trans-Siberian Railway crossing, however slow, would afford. I can see the advertisement now:

Wanted: XML geek to argue merits of data typing on trip to Alpha Centauri. Strong constitution and sense of humor required.

 

Sean is co-founder and Chief Technology Officer of Propylon and is an industry–recognised XML expert.