Validation is a Process, Not an Action
By Sean Mc Grath
Creating a schema in an XML application is always a fun experience, regardless of the schema language being used. It is best performed near a window because it helps to be able to stare out of one during the inevitable waits for inspiration to call on you.
Inspiration is required because dreaming up names and classification systems for names is, quite simply, a hard problem.
For each and every tag name in your schema, you will aspire to the taxonomic sui generis; the definitive, indisputable mot juste. Your tag names will oscillate from the laconic to the loquacious. No ontological stone will be left unturned in your search for the perfect execution of truth by tagging!
There! Doesn't that make you feel better?
As this ancient example beautifully illustrates, classifying and naming things is hard enough, however XML modelers must suffer a further level of complexity. This extra complexity is called *change*. How would animal classification work if, in the memetic analog of Stephen Jay Gould's evolution by punctuated equilibria, new forms suddenly emerge and demand to be tagged in the data?
That would make things even more complex right? The trouble is, that this is exactly what happens in real world applications on XML. Business requirements change and with them, data changes shape, processes need to evolve, die, or be replaced.
In this fluid situation, the "hard-wired" nature of XML schemata can present a real stumbling block to successful system evolution.
The XML world, perhaps thanks to its SGML heritage, is predisposed to thinking of validation as something that happens *before* data processing really starts. A sort of "please wash your hands" prelude to the main business process.
This form of thinking about validation results in XML schemas that try and capture all the constraints on the data up front. A sort of all-or-nothing validation on which everything hangs. Two things follow from that. Firstly, the schemata get large and complex as they try and pack in as much validation as possible in one fell swoop. Secondly, the software to process data conforming to the schema gets more and more complex.
The only constant is change. Given that, does it makes sense to try and treat validation as a once off action? An action to be performed at a single point it time, prior to most of the data processing? Would it not make more sense to think of validation as a process? A sequence of actions, performed through time, kanban by kanban, as data flows from one from to another through a business process?
Thinking about it this way has a wonderful way of decomposing the validation into pieces, each of which stands alone, each of which is small, tidy, and easy to evolve. Then we can move on to worrying about how we classify the validation processes. Best get near a window for that one too.
Sean is co-founder and Chief Technology Officer of Propylon and is an industry–recognised XML expert.