XML and the Humble Paragraph Tag
By Sean McGrath
Nothing in the world of XML seems as harmless as the <p> tag; a universally accepted way of saying "here is a paragraph". A concept familiar to anyone with even a passing familiarity with the Web. It's as if the <p> tag has always been with us, a fundamental truth, part of the fabric of the universe. Discovered rather then invented. Simple, elegant, perfect....
Of course, the <p> tag is too simple for some pedants who insist on using <para>, or even <paragraph>, tags. Such pretensions! Plain country folk like me like our beer cold, our apple pie warm, and our paragraphs surrounded by good 'ole <p> tags just like grandma used to make. However, peel off a layer or two and those little old <p> tags shows their teeth, revealing a vista of complexity that is at the heart of the "XML for data" versus "XML for documents" debate.
There are two features that differentiate data oriented XML and document oriented XML. Firstly, the depth of tagging is irregular and unbounded in "documents". In data oriented XML, the tagging is regular and bounded. All tags occur in the same order, record after record, and the same depth of tagging is used throughout. Secondly, plain text can be intermixed at the same level as tags to create what is called "mixed content" in "documents". In data oriented XML, everything is tagged; there is no free standing text and thus no mixed content.
In both cases, the <p> tag is center stage. In fact, if you see a <p> tag in an XML document, then you can infer a lot about the type of issues you are likely to face. If you find <para> or <paragraph> tags, then you know that the issues are the same but you are also dealing with a pedant.
On the Web, where sub-millimeter control over paragraph layout is neither practical nor desirable, an alternative paragraph-positioning model is needed. The answer, to date, has involved the aid of the single most abused element in the HTML tag bag -- the table. Much to the chagrin of typographers and XML data modelers alike, the border-less table has replaced pretty much every other geometry model for laying out paragraphs of text.
CCS2 has made it possible to exert fine control over paragraph positioning using pre-Web methods such as left indent, negative first line indent, and so on. However, until the likes of CSS2 becomes standard in all browsers, we are likely to see table trickery remain.
So, in summary, the <p> tag is not so simple. Its presence or absence tells you a lot about the type of XML you are dealing with, not to mention the world-view of whoever created it. If you work purely with data-oriented XML, you may never come across them but if you work with document oriented XML, then they will be a source of constant trouble and complexity, but also endless fascination for us easily amused doc- heads!
Sean is co-founder and Chief Technology Officer of Propylon and is an industry–recognised XML expert.