After describing what makes for good legislative XML, I feel I should bring up a favorite topic of mine — why word processors don’t make for good legislative drafting tools.
Lately, we’ve been implementing round tripping tools to allow Akoma Ntoso documents to be imported and exported from Microsoft Word. This is to facilitate migration from a largely office productivity-oriented system to an XML-based one and to allow the exchange of documents with external clients that don’t have access to the internal systems being used to draft and manage legislation. It’s been quite a difficult process. The round-tripping itself has been quite straight forward. Exporting a document is relatively easy and reimporting that exported document, unchanged, isn’t difficult. What is very problematic is trying to ingest documents drafted or extensively edited using a word processor. The DOCX markup quickly becomes a tangled mess. Even when a document looks fine visually, there can be a lot going wrong on the inside, revealing the drafter’s struggle with the word processor to get a document that at least looks right. To avoid the problematic mess, we tend to resort to interpreting the words and discarding the structure and internal metadata entirely. It’s not perfect, but it’s at least manageable.
I’m going to compare the prominent word processing format today, DOCX (well, at least the WordprocessingML part of it) to Akoma Ntoso in respect to how they stack up to each other on my list:
- Is it semantic?
DOCX: No, not at all. DOCX is a serialization of the inner workings of Microsoft Word. It makes no attempt to be anything else.
Akoma Ntoso: Yes, this is the fundamental approach Akoma Ntoso takes.
- Is the presentation separated from the semantics as much as possible?
DOCX: No, the presentation is tied directly into the document itself, and what’s more, is very proprietary.
Akoma Ntoso: Yes, although you can apply presentation directly inline in cases, such as tables, where necessary.
- Is all the text (excluding any metadata section) in the natural reading order?
DOCX: Yes, for the most part.
Akoma Ntoso: Yes, for the most part.
- Does it, to the fullest extent possible, avoid the use of generated text?
DOCX: No, and this is one of the most frustrating and infuriating parts of working with DOCX.
Akoma Ntoso: Mostly, but it doesn’t preclude practices that ensure this rule is followed.
- Is every provision that needs data associated with it permanently identifiable?
Akoma Ntoso: Yes, via the @wId or the @GUID attributes.
- Is every provision that is referred to easily locatable?
DOCX: Not without extensive customization.
Akoma Ntoso: Yes, via a standardized locator mechanism using the @eId/@wId attributes.
- If the XML schema is for general use, is there an extensible way to add missing constructs?
DOCX: No, unless you regard styling as your constructs (a bad idea) or want a complex customization task.
Akoma Ntoso: Yes, via the seven elements found in the generic model.
- Is there an extensible metadata mechanism?
DOCX: Yes, but it’s complicated.
Akoma Ntoso: Yes, but it’s complicated.
- Does it provide the facilities necessary to automate according to modern expectations?
DOCX: No, the presentation oriented structure of DOCX does little to enable downstream automation.
Akoma Ntoso: Yes, Akoma Ntoso encourages a hierarchical content structure that is ideal for downstream automation.
Of course, Akoma Ntoso looks a lot better for legislative documents than does DOCX files. That should be no surprise — Akoma Ntoso is purpose-built for legislation while DOCX is a general purpose document model intended for no single purpose. But it is also fundamentally very different. While Akoma Ntoso is designed to be in modern standards-based document information model for legislation, DOCX is a serialization of the archaic data structures that exist within Microsoft Word. DOCX reflects the proprietary inner workings of Microsoft Word rather than the semantic meanings to be found within a document.
Akoma Ntoso has its drawbacks too. It’s complex, a bit academic, and has to span a very broad range of legal traditions make it a good fit for most legislative traditions, but a perfect fit for none.