CTO Articles

Home > News > CTO Articles

IT World
XML IN PRACTICE --- July 25 2002

Separating Content from Presentation: Easier Said Than Done

By Sean Mc Grath

We say it so often we don't even listen to the words any more. "Separate content from presentation" with XML. Information should not be inter-twingled with any one presentation as the presentations may (indeed will) change over time. New browsers, new devices of all sorts, oodles of document formats -- all can be targeted from a single source XML document. So sayeth the fundamental lore of XML.

Although this works fine for the most part, there are some situations where it is not really possible in practice. There are some cases where the content and the presentation are so deeply intertwined that separating one from the other is far from straightforward.

Four cases spring to mind: graphics, tables, mathematics, and advanced typography.

Lets deal with the graphics problem first. Graphics come in two main flavors -- vector and bitmap. Vector graphics are quite in keeping with the ethos of separating content from presentation espoused by XML. Vectored formats concentrate on capturing the semantics of an image --in terms of fundamental units such as lines, circles, fills and so on. The objects are laid out on a virtual space. At the point on rendering, a mapping between the virtual space and the physical space is made to achieve the best possible rendering given the limitations of the rendering device e.g. area in pixels, color depth and so on.

Bitmapped graphics on the other hand, are intimately tied to a particular rendering in terms of pixel area and color depth. Bad things typically happen if you try and resize bitmapped images as the pixels in the image do not encode any semantics about what the image represents. In short, they cannot be repurposed to different shapes, sizes or color schemes without significant loss in quality.

So, surprise, surprise, when it comes to graphics and XML, vector formats are much more in tune with the ethos of separating content from presentation. However, there are times when a bitmapped approach is the only one available. The images may represent facsimiles of documents that have a legal status and therefore must be reproducible *exactly*. No amount of philosophical twaddle about the benefits of generating renderings from logical representations of the information will cut it in a court of law.

We turn now to tables. If I had a penny for every second my cerebral cortex has pondered the mysteries of tables, I would be writing this from my hideaway Island in the Pacific via a dedicated 1GB satellite link.

Tables have a quantum feel to them. The more you look, the more the act of looking seems to affect their very nature. At first glance, it would seem possible to dissect most tables into semantic structures but as you look closer, this possibility typically sails into the sunset leaving you with a model of rows and columns and alignments and tab stops and vertical offsets and spans and ... So much for separating the content from the presentation! With tables, the best you can do most of the time is intertwine the content with the presentation in reasonably well-delineated structures.

Mathematics? Ha! A real beauty this one. The physical form of mathematical equations is probably the most condensed blend of content and presentation ever invented. My advice? Embed TeX in your XML. Until such time as producers of mathematics use XML for markup, all attempts at reproducing the presentation of mathematics from an XML source are labors of love -- most likely unrequited.

And lastly we come to typography. It never ceases to amaze me how quickly users of Web browsers have become accustomed to what is, frankly, a 20 year retrograde step in the presentation of information in typographic form. I suspect it is quite exasperating for lovers of fine typography to see what passes for high quality presentation on the Web. Kerning, ligatures? Um, the web doesn't even do tab stops!

The mismatch between the typographic effects used for paper production and those used for on-line production can hit hard when faithful replication of paper typography is required on the Web.  Perhaps the classic case of this is legal material such as Bills and Acts produced by Governments. To date, these have always existed in paper form before they exist in electronic form and the paper form is "normative" relative to the electronic. i.e. if in doubt, consult the paper. It is the paper sheets that are read and approved by legislators. It is the paper sheets that are installed by a suitably authorized person to become law. The paper is king.

In order to replicate the look and feel of legislation, some significant trickery is required in HTML markup. Take the concept of a section of legislation for example. Sections have numbers. Section numbers are rendered to the left of the first paragraph of the section. In Word, Framemaker, etc.  this is achieved by creating a negative first line indent in the paragraph.  How about on the Web? Well, the concept of negative first line indents is quite new to the Web so the standard way of making it work across all browsers is to nest tables within tables. One table as the outer "shell" to hold the section number and the body of the section. The other table to contain the body of the section, nested inside the outer table.

It is extraordinarily tricky to make the electronic version exactly the same as the paper version as a result of this fundamental difference in approach to paragraph layout. As for footnotes, side-headings, watermarks, 3-inch high integral signs spanning multiple cells in a table column... don't get me started on those.

Yes, it makes sense, for all sorts of reasons, to separate content from presentation. Yes, XML is a great technology for helping you achieve that.

However, sometimes, the medium is an inextricable part of the message. The next time someone tries to sell you a line like "just separate the content from the presentation with XML" be warned -- it is not necessarily that simple.

 

Sean is co-founder and Chief Technology Officer of Propylon and is an industry–recognised XML expert.