The Key to XML Modeling: Knowing When to Stop
By Sean Mc Grath
Lets start with an anecdote:
A teacher asks a child: "Can you spell the word banana?"
The child says: "Yes, I can spell banana but I'm never sure when to stop."
As I wrote that, the sun broke through my window for the first time in ages. Let's celebrate this happy event with another anecdote:
A scientist explains to an audience how the earth orbits the sun. An unimpressed member of the audience ripostes: "That is not how it works! The earth sits on the back of a giant turtle."
The scientist, with a supercilious timbre, intones, "But what is the turtle sitting on?"
The audience member replies: "Ah, you won't get me with that one. It is turtles all the way down...."
These two anecdotes help lighten the gloom that can descend over my Emacs environment when working with third party XML schemas of any hue: DTDs, W3C XML Schemas, or Relax NG schemas. Many of these schemas exhibit an inability on the part of their creators to know when to stop modeling. Like spelling "banana", getting started is easy but knowing when to stop is harder. Think of each element in a schema structure as a turtle. Is it sitting on other turtles? If so, what are they sitting on? Where does the turtle-on-turtle structure bottom out?
Knowing when to stop is a key XML data modeling skill. All elements created in schemas cost money to create, but they often split cleanly into two camps in terms of their ability to generate value. The semantic value of tagging typically is heavily skewed toward the top- level markup (e.g., the top-level turtles. The markup's value diminishes and becomes patchy the further down you go.
A good example of cost-centered markup is when a schema designer, working from the top turtle down, finds herself inventing element-type names for things like paragraph, table, list, and graphic. The world already has a bag of tags for these elements called XHTML. Is there value in creating another set? Probably not.
In defense of the schema designers, the turtle/banana syndrome is difficult to resist -- especially within the western world's genetic predisposition to a Cartesian Dualist approach to modeling the real world. Also, DTDs -- XML's original schema notation -- is notoriously unhelpful at taking a modular view of schemas and, ideally, "importing" things like XHTML Basic into high-value custom element containers.
The newer schema languages' improved support for mix-'n'-match schema design is changing the situation. Hopefully this componentization will help schema designers avoid the banana/turtle syndrome in the future
Sean is co-founder and Chief Technology Officer of Propylon and is an industry–recognised XML expert.