CTO Articles

Home > News > CTO Articles

IT World
Ebusiness in the Enterprise – January 6, 2003

Spreadsheets finally yield their buried treasure

By Sean Mc Grath

Back in the days when my home computer was a Sinclair ZX80 with a barnstorming 1K-byte of RAM, I worked part-time, for a computer company in Dublin.

I distinctly remember one day when a businessman arrived seeking a Visicalc. I said something along the lines of "you mean an Apple II with CPM and the Visicalc spreadsheet?". He replied, "No. I want a Visicalc. It has a screen and a keyboard and you put figures and formulas into it. Just give me one of those."

Ah, those were the days. Days of true killer applications where all nuances of hardware and software were secondary considerations.

Looking back on the evolution of the spreadsheet, we can see a steady path of increasing power and sophistication from Visicalc to today's "category killer" which is, of course, Microsoft Excel.

I have no idea how much of the world's corpus of business information is housed in spreadsheets but I'm pretty confident it is significant. Until quite recently, spreadsheet data has been the most proprietary, most closed data format in common use on the planet.

Word-processors - first thought by many as the worst offender for proprietary formats - are really quite open in comparison to spreadsheets. It is possible to export from word processor packages to other packages, to plain text, to HTML and so on. Although it is always a pain and an expensive pain at that, there are numerous interchange options before you must revert to "save as text" with all that that involves in terms of lost formatting and so on.

Spreadsheets on the other hand seem to have no effective export capability other than "save as text" or more commonly, "save as CSV". Comma separated values (CSV) is just one step up, structurally speaking, from plain text and entails losing all of the formulae stored in the spreadsheet. Not good.

For quite some years now, the standard approach to reading/writing spreadsheets programmatically has involved using APIs - nasty APIs - to "drive" the spreadsheet engine underneath.

This approach has the benefit of allowing you to work directly with the key spreadsheet concept of a cell formula. However, as any document-centric engineer will tell you, it is a lot simpler and more robust to work directly with the file format - or an API for the file format - rather than programmatically driving a spreadsheet behemoth simply to read/write data in spreadsheet format.

I'm delighted to say that this is now changing. The Gnumeric[1] spreadsheet for Linux was the one that got the ball rolling as far as I can tell. Its native file format is XML with a fully documented data model. You can read/write Gnumeric spreadsheets perfectly well with nothing more than XML aware tools.

Then there is OpenOffice[2] which has a very nice spreadsheet. Again, the file format is XML (zipped XML to be precise).

So what of the big daddy of them all - Excel? Well, I'm delighted to report that colleagues who attended XML 2002 in Baltimore in early December have come away convinced that XML support in Excel in Office 11 is real and powerful[3]. I hope so as I have a couple of mini-killer applications in mind if this proves to be the case.

[1] http://www.gnome.org/projects/gnumeric/
[2] http://www.openoffice.org
[3] http://www.microsoft.com/office/developer/preview/default.asp