CTO Articles

Home > News > CTO Articles

Published in IT World
March 21, 2007

XML and the document format mind bender

XML has been around now, in its final fully fledged form, for more years than I care to remember. Having played a small part in its original creation, thinking back that far makes me feel old.

Explaining the whys and wherefores of XML to non-technologists and technologists alike has always been an interesting challenge. One could be forgiven for thinking that the value proposition has at this stage been fully trashed out. Either you believe in the value proposition or you do not. Either you are applying XML sensibly in your business or you are not. Surely such matters would be well and truly baked at this point?

Not so. Not by a long shot unfortunately. Here is the problem in a nutshell: it is real hard to explain to non-technical folk why it is that keeping your information in XML is not - in itself - a guarantee that any sizable benefits will accrue.

As I have said before in this column [1] and elsewhere, any old nasty, crufty, effectively-proprietary-silo of information goo can be 100% XML compliant. Being in XML lifts information one small step up the information ladder - it is no longer completely opaque outside the four walls of the application that created it. However, it is quite a small step up what is quite a long ladder. Would I prefer to start with XML rather than a non-XML format for most document-centric IT tasks? Yes. For sure. Does XML - in and of itself - make it straightforward to move data from one application to another or to automate document processing? No. No it does not. It can, but it is not an automatic side effect of using XML.

Another nutshell (this is the mind bending one): information can be utterly, utterly application-specific and still 100% XML compliant. Your ability to work with data outside of the application that created it is an optional - highly desirable, but optional - attribute of an XML-based system. It can easily be the case that one proprietary application from one vendor is the only realistic tool for manipulating your XML data. It can easily be the case that without the application in question, the value of the data is significantly diminished and the rationale for using XML in the first place greatly reduced.

Many non-technical (and some technical) folk have difficulty understanding this fact. A common conversation goes something like this:

Slightly technical senior manager person who reads a lot of trade press: "We should move all our documents to XML because all sorts of great things will become possible...If you have time, I can walk you through the benefits..."

Non technical senior decision maker: "That sounds great but according to the blurb I read, our new word processor/DTP/Web Editing tool stores all its information in XML and/or seamlessly imports/exports to XML. So we get all these good things you mention for free as part of our next application upgrade? Excellent!"

Why is this a mind bender now? Because we are on the verge of a world in which all mainstream document-centric tools do XML natively. Most of them will store their files natively in XML. So if the word processor/DTP tools you know and love all do XML natively, why do you need to do anything at all to benefit from XML?

Explaining the flaws here is left as an exercise to the reader. Figuring out how to explain the important issues raised to non-technical senior management in the course of an elevator ride, is left as a Ph.D. thesis suggestion.

It is entirely possible to have all the benefits of XML and yet retain the ability to just use user friendly, commodity off-the-shelf tools. However doing so - especially with complex document-oriented information - requires something more than just slapping an XML label on the file format.

Having the cake and eating it too are what initiatives like ODF and XHTML are all about. To understand their significance you need an understanding of what interoperability really is and how you go about creating it. You need to understand what is realistic to expect a mere file format to do and what is not realistic. You need to understand the areas where XML's mantra of separating content from presentation has its practical limits [2] and how those limits are typically encountered at the boundaries where file format stops and application behavior starts. You need to understand that there is value in an XML file format even if application independence is not a goal. You need to understand that if application independence is a goal, it is really hard to write down in English what an application actually does to information - especially when WYSIWYG word processors author/edit information. Sometimes, meaning inheres in the running application code - not in the file format. You need to understand micro-formats and how semantics can be layered on top of information that is (erroneously) not considered "structured" by an entire generation of developers and IT architects.

If this stuff interests you I recommend starting with the anatomy of interoperability [3]. If this stuff does not interest you, at least take this piece of advice away with you: saying "it is in XML" is essentially equivalent to saying nothing. Any statement of the form "It is in XML therefore..." is a non-sequiter and needs to be questioned.

You need to look a level or two deeper if the real value proposition of XML (and it is real) is to be realized in your organization.

[1] http://open.itworld.com/nl/xml_prac/10042001/pf_index.html
[2] http://open.itworld.com/nl/xml_prac/07252002/
[3] http://www.robweir.com/blog/2007/02/anatomy-of-interoperability.html#links