XML and UML
By Sean McGrath
The only way to stay sane in this business is to be selective about what three-and four-letter acronyms you chose to develop expertise in. Over the past decade, the four-letter acronym SGML and the three-letter acronym XML have been high on my acronym shortlist. Guess I got lucky. Mind you, I have backed some duds in my time as well. I was into JSP back when it stood for Jackson Structured Programming, and I was convinced Awk would take over the scripting world. As for the Z80 processor...lets not go there, too many painful memories...
Anyway, in recent years I monitored the movements of the UML acronym purely through my peripheral vision. I guess I was hoping against hope that it would disappear in a poof of modeling logic and spare me having to shortlist it for my 20/20 attention. Recently, I concluded UML wasn't going to go away and I started studying it in earnest. The crossing of the Rubicon was triggered by the centrality of UML modeling in the ebXML initiative.
UML stands for Unified Modeling Language and it is the clear leader in an age-old field: object oriented application modeling. UML, like so many three letter acronyms, is actually a basket of different ideas joined together. Seven basic model (diagram) abstractions range from "class" to "state transition diagram". Using the standardized diagramming notation UML provides, analysts, programmers, and users can share a common conceptual understanding of a system through diagrams and narrative text.
XML is an increasingly important part of many applications. XML people argue that XML provides its own modeling paradigm -- originally DTDs and, more recently, W3C XML Schema. UML people argue that DTDs and W3C Schema are just notations and that organizations can and should keep their data models independent of any one syntax by using UML diagrams as the normative reference. From the UML diagrams, the story goes, you can *generate* DTDs, W3C Schemas, Relax NG Schemas, whatever you want.
Lets pause for breath and think about what is going on here. Syntax (XML) versus semantics (model diagrams). Generating one from the other, keeping your options open by becoming more abstract in your models. There is a word for this phenomenon, actually an acronym. A four-letter acronym: MMTT -- More Meta Than Thou.
The MMTT aspect of this worries me. Reason being we have heard it all before. Just one more level and all will be revealed: ISO Seven Layered Model, Ada, Z Notation, Architectural Forms. Will UML go the same way?
Another worrying thing is the mismatch between the things UML is happy modeling and the things XML is happy modeling. In Object Oriented modeling, we think of classes having attributes associated with them. Typically, we just say something like "A Person has a Name and Age and a Phone Number". This can be directly translated into the syntax of umpteen object oriented programming languages.
With an XML hat on, the process of modeling is subtly, but significantly, different. We say something like "A Person has a Name, followed by an Age, followed by a Number". In other words, the XML modeling presupposes that we are concerned about modeling the order things occur in, as well as the things than can occur.
The order used here is SEQUENCE order but XML has numerous more complex orders you can specify. The fact is, these ordering concepts do not fit well into the UML paradigm and arguably, owe more to document-oriented applications of XML than data-oriented applications. I have seem a number of attempts (ironically using the UML term "tag"!) to get around this problem but the results are not very satisfactory. I suspect that this mismatch will further strengthen the growing schism between the data-oriented and document-oriented XML worlds. It may well be that UML friendly XML models will dictate the boundaries of XML usage in object oriented application design.
The final worrying thing is the uncomfortable (to me) relationship between what a UML model actually says and how it says it. UML is unashamedly visual in its approach. It is the visualization of system models that give UML its power. But here is the rub -- when you save a UML model in its interchange format (known as XMI -- an XML based format) you loose a lot of the visualization. The visualization is a part of the tool that did the rendering -- not part of the model itself. If your UML models can only properly be "understood" by your UML tool -- even reading the XML interchange notation for your model -- who owns the model? You or the tool that created it?
The beauty of DTDs, and more recently Relax NG, as modeling paradigms is that the notation *is* the visualization. Tools to draw pretty pictures can be used but at the end of the day, they are just cognitive lubricant. The notation itself -- in pure text form -- is the normative reference as to what the model really means. Not so I fear with UML and certainly no so of the W3C XML Schemas I have seen generated from UML tools.
I am not against pretty pictures and GUI tools. I am however, against any notion that any GUI tool own my data models. I fear that UML based approached to XML modeling may be leading us in that direction. I am especially concerned that the rush to UML is the result of understandable fear of the complexity of W3C XML Schema. RelaxNG and Schematron both show that powerful models can be created using non- scary notations. Notations that can be supplemented by GUI tools but that keep the notation -- not the visualization -- as the normative reference of what it all means.
Sean is co-founder and Chief Technology Officer of Propylon and is an industry–recognised XML expert.