CTO Articles

Home > News > CTO Articles

IT World
Ebusiness in the Enterprise – March 25, 2003

A study in XML culture and evolution

By Sean Mc Grath

There is a spoken language in Africa - I believe it is Malinke but memory (and Google) may have failed me here - which has evolved a very interesting alternative to designating people by name. A hyper-pronoun system if you like.

Speakers of Malinke prefer not to name people directly in speech. To do so would give the evil spirits a direct connection with the named individual. Not good. Instead, speakers of Malinke embark on a circumlocutory route to identifying the individual. For example if Mr. X has just come in through the door, he might be identified in speech as "the man who has just come through the door" rather than "Mr. X". Thus throwing the evil spirits off the hunt.

Closer to home for me, the Irish language makes it well nigh impossible to say "hello" without invoking God. In Irish, "God be with you"[1] is the most common form of greeting. Without even thinking about it, Irish speakers go around the place invoking the powers that be, to look kindly on the people they meet and greet. A sort of built-in protection mechanism against the forces of evil, right in the heart of the language.

These are examples, of course, of culture impacting human language. Nothing controversial there. More likely to be controversial is the assertion I am about to make, that culture impacts computer languages too. XML provides a good example of this phenomenon.

Mr. X comes through the door as in the Malinke example. How would we capture the details of Mr. X in a computer program - say a Time and Attendance system?

We might start - as so many HR and CRM systems do - with the idea that people have names and addresses. But what is a name? What is it about a name that distinguishes it from an address?

Perhaps a name is nothing more than a synonym for the innermost part of an address. Let's use my address as an example and hope the evil spirits do not read ITworld articles:

Sean McGrath

Propylon Ltd. 45 Blackbourne Square Rathfarnham Gate Dublin 14 Ireland Europe Earth

Let's turn this upside down:

Earth

Europe Ireland Dublin 14 Rathfarnham Gate 45 Blackbourne Square Propylon Ltd. Sean McGrath

The latter form of my address is a form that Malinke speakers would perhaps prefer. It consists of a layer by layer zooming in on an individual through ever tighter - ever more qualified - contexts. The "name" is nothing more than the tightest qualifier. Malinke speakers could stop short of that final, innermost part of the address and say something like "The CTO" or "The tall bearded guy with the faraway look in his eyes" etc. Thus uniquely identifying me.

There is an entire culture in IT that is Malinke-like in its approach to identifying things. The proto-language for that culture is called SGML and the most common dialects spoken today are HTML and XML.

Speakers of the XML dialect are the "doc-heads" who brought markup into the new world of e-commerce and Web Services.

XML doc-heads are very fond of addressing - as opposed to unique names - as a way of uniquely identifying things. Their cultural preference is a direct result, I believe, of the impossibility of allocating unique names for things in richly complex hierarchical structures.

Think of the complex structures you find in corpora of legislation, exegesis of biblical texts, financial reports and so on. How do you identify 'the third paragraph of the written judgement of Judge McGrath in the case of X versus Y heard in the Queens Court on the 1st of February 2001'? You do it Malinke-style. It has no unique name. It only has an *address*.

Something very interesting happened to XML when it crossed over into the world of e-commerce and Web Services. That land turned out to be heavily populated with an aboriginal population of "data-heads". A culture with a very different world view when it comes to naming things versus addressing things.

In the relational database culture that many XML data-heads emanated from, unique naming was of paramount importance. With a record everything* has a unique name. It simply must be so for the relational model to work. Records themselves have unique identifiers. Again, it simply must be so for the cultural ceremonies of normalization and joins and so on to function.

As often happens when cultures collide, friction resulted in the XML world over this issue. Some doc-heads, such as I, felt no need for any new naming machinery other than already supplied in XML 1.0. In XML 1.0 an element of a document has a name - a tag. It is a name in the sense that "Sean McGrath" is a name (i.e. it is merely the most qualified part of my address). My full address can be determined by going to the top of the document, and working your way down, layer by later until your arrive at "Sean McGrath". This is the fully qualified address - a completely unique name for me in an XML document.

However, the data-heads, felt the need for a direct naming convention that could be used to make individual element names unique in an XML document. This resulted in the creation of the XML namspaces recommendation[2].

Now, we might look upon this as a quaint example of linguistic evolution. Or we might view the clash of approaches and a sure fire way to invoke the evil spirits of bad design.

I believe the latter is true. Unique names are so much less useful than powerful addressing. Moreover, the pseudo-cascading effect of XML namespaces creates a brittle middle-ground between content-free and context-dependent naming that is best avoided in my opinion[3].

I have a simple rule about namespaces in XML applications. I don't use them.

A simple rule that I commend to you for your consideration in your own XML applications.

[1] http://www.fortunecity.com/bally/dublin/180/dia.html
[2] http://www.w3.org/TR/REC-xml-names/
[3] http://www.itworld.com/nl/xml_prac/04112002/

http://seanmcgrath.blogspot.com