The (Supposed) Limitations of XML

It’s been a while since I updated my blog – a whole year in fact. The reason is that I’ve been hard at work finishing our web-based XML editor, LegisPro, supporting our projects with the U.S. House, while simultaneously developing an Akoma Ntoso-based implementation for the U.K. and Scottish Parliaments. The challenge has been all-consuming.

Next week I will be giving a couple of talks with Matt Lynch of the Scottish Parliament at the LEX Summer School 2016 in Ravenna, Italy and then, the following week, by myself at NALIT 2016 in Indianapolis. My company, Xcential, also intend to show glimpses at our booth the Data Transparency Conference in Washington D.C. on the 28th September. We’ve got a busy month ahead of us.

Recently the tired old question of whether a legislative drafting system is best built on a word processor or using true XML technology was raised yet again. (No, the Open Document Format (ODF) and  Office Open XML don’t make word processors into XML editors.) To me, and everyone I interact with, the answer is quite clear and was settled a decade ago – XML is the way to go. The reason is simple. XML provides a long-lasting data format that can be used to build a comprehensive solution that enables all of the required automation features for legislative drafting. On the other hand, shoe-horning a legislative drafting application into a word processor, never designed for this type of an application, results in too many compromises.

In reliving the debates from 10 years ago, I stumbled across a competitor’s white paper on the subject. While they still promote the white paper, its content is quite dated. It makes the case for the word processor approach rather than XML. I read, with some amusement (or was it irritation), all the perceived shortfalls of XML.

I thought it would be fun to take a look at each of these supposed problems with XML, and provide a counterpoint to each of them. To be fair, this paper was written several years ago and technology doesn’t stand still.  So, here goes:

Point 1: Legislative content and presentation cannot be separated

There is a thread of truth to this. Because so much of the amending process is based around the page and line number paradigm for referring to locations, it is essential that there be a robust and precise means for referring to any part of the document right down to a specific word. However, this is the entire requirement – there is no further need for the presentation to be tied directly to the content. Legislatures including California and the U.S. Congress, have used markup technologies for many decades now, long before the advent of XML (or even SGML). If the requirements mandated that content and presentation not be separated, none of these solutions would have been viable. So let’s consider the specific issue – how to tie page and line numbers with the content. Superficially, a word processor does this with an intrinsic page and line numbering capability. However, you quickly discover a problem – legislation requires page and line numbers fixed to locations in the last official publication. The dynamic recalculating nature of the intrinsic page and line number capability of a word processor renders the capability useless. Instead, the classic way to address this requirement is to produce a separate rendition of the document using a hidden tabular format, one row per each published line and a column for line numbers and a column for content. However, this creates a huge problem – you now have two copies of the legislation, one organized by the document structure and one organized by physical layout. Getting between these two representations precisely becomes a troubling problem to deal with. For XML, this was also a challenge until we came up with a very clean and workable solution almost a decade ago. Now, when we publish the document PDF, we back-annotate unobtrusive markers into the XML. These markers are used to arrange the editor presentation as well as to drive the amendment engine. This works out very nicely. We have implemented this technique several times now with great success.

Point 2: Temporal relationships must be preserved

This one made me laugh. For years, we’ve pointed to issues like this as reasons to go to XML rather than to avoid it. The argument made in the white paper is that XML provides no facilities to model the temporal relationships that are necessary when making citations or establishing other relationships that exist in legislation.  While this is true, it’s also quite misleading. To expect XML to intrinsically provide this facility is to completely fail to understand the role of XML. In fact, word processors have no intrinsic capability to solve this problem either – it’s something that has to be built.

We’ve been addressing this problem since our beginning in 2001 using web-based references or URIs and a clever middle-tier technology we call a resolver that interprets the temporal or versioning aspects of the citation or reference. This problem was solved long ago.

Point 3: Permanence is required

The argument here is confusing. It is totally true that there is an unbendable requirement that legislation be preserved in a form known to last forever (or at least for many centuries). It’s also totally true that there is no digital technology with the proven permanence of paper or vellum. However, this is an argument about the medium used to preserve the content. It’s not an argument about the type of format that should be used to create the content – unless there is an argument that we should give up on digital technologies and return to paper, scissors, and glue. A physical document can be produced for archival purposes regardless of the technology used to draft it.

Personally, it you were to ask me, the logical document to archive would be a vellum printout of the XML. That way, you would have a much easier task of restoring the document at some future date should some catastrophe result in the loss of the digital record. However, I don’t think that’s a likely decision that anyone will make anytime soon. John? 😉

Point 4: Work-in-progress is structurally broken by XML’s rules

This has long been the primary argument against XML editors. Their rigidity in enforcing the rules often gets in the way, especially in the early part of the drafting cycle when the ideas are still fluid. Most XML editors just aren’t designed for this type of work.

This limitation has been one that I’ve focused considerable effort at overcoming for years. Our most recent efforts have made tremendous strides. We’ve tackled this problem in two ways:

First, using the modern DOM Range constructs that are inherent in modern browsers, we’ve been able to loosen up the selection model considerably so that it closely matches selection in a word processor. Using sophisticated programming of the DOM and this range mechanism, we able to match much of the loose editing offered by a word processor.

Second, we go beyond the word processor by allowing the structure to be removed entirely and allowing the words to be rearranged entirely unencumbered by any structure at all (think text editor). Once the words are rearranged and the user is ready to move on, we automatically recreate the structure for the user. It’s a great way for drafters to get their ideas down without worrying about detail. Turns out, it’s also a great way to import foreign content – a nice bonus.

Point 5: XML document validation is insufficient

This is another one that made me laugh. As before, it’s totally true and entirely misleading. XML validation is not intended to be the be-all and end-all of verifying a document. It would be quite remarkable if an XML schema could do that. Curiously, a word processor offers nothing whatsoever in this regard – it must all be custom created.

When it comes to the subject of verifying a document, we use two terms – validation and verification. Validation is the process of ensuring that the document’s XML content adheres to the content model prescribed by the schema. We call this the “outer envelope” of checks. The “inner envelope” is to verify that the document adheres to the jurisdiction’s internal business rules. While off-the-shelf technologies exist to perform the outer XML validation, this inner verification step requires custom software. We’ve built a configuration mechanism that allows us to configure a “model” that our existing software can use for verification rather than building this from scratch each time.

Point 6: There is no common language around which to develop a standard

This one is perhaps the most annoying. To be fair, the white paper is not current and could do with an update – although doing that might undermine its use as a marketing tool. Again, as before, there is some truth to the assertion, but to argue that the differences between jurisdictions disqualifies XML is to see the glass as half empty rather than half full. I’ve been in this field for 15 years in November and have worked on legislative systems on four continents. What’s surprised me is how similar they are rather than how different. Fundamentally, the process of making law is the same almost everywhere. It’s in the details that the differences lie. Yes, in California a resolution is a type of document while in the UK, a resolution is a type of section in a specific type of statutory instrument, but that’s a detail that doesn’t get in the way at all.

There is one point in this area that the white paper makes that is particularly annoying. XML is characterized as a generic model for representing data. That’s only half the story. Everybody in this field knows that there are two very different models that XML serves well – representing data and representing documents. XML, as a derivative from SGML, has stronger origins in the representation of documents than of data. So why are XML’s strengths in representing a document so casually ignored?  Seems a little self-serving to me. JSON, on the other hand is an excellent generic model for representing moderate amounts of data, but a terrible model for representing documents.

The entire argument here can now be refuted as we now have Akoma Ntoso. It’s an XML schema that was initially designed by the University of Bologna at the request of the United Nations. Today, it’s on the verge of becoming an OASIS standard. Akoma Ntoso understands that there is no on-size-fits-all solution to legislative XML. It does this by providing a basic set of constructs that are generally found everywhere, a mechanism to create custom constructs, and an overarching design for how to model the hierarchy of legislation.

The implementation I will be showing in Ravenna, Indianapolis, and Washington D.C. in the coming weeks will demonstrate just how a general-purpose XML model such as Akoma Ntoso for legislation can be applied to the specific needs of a pair of jurisdictions – and a pretty challenging pair of jurisdictions at that.

For me, XML is the easy winner.  With XML you design the document to exactly fit the needs of a jurisdiction and then shape the tools to work with this model. With a word processor, you shoe-horn the needs of the jurisdiction into the limited flexibility offered by a word processor’s intrinsic model and then spend all your time trying to handle the mismatches between what the word processor was designed to do and that the customer wants. Either way it’s challenging, but with a word processor, much more so.

The (Supposed) Limitations of XML

Connected Information

As a proponent of XML for legislation, I’m often asked why an XML approach is better than a more traditional approach using a word processor. The answer is simple – it’s all about connected information.

The digital end point in a legislative system can no longer be publication of PDFs. PDFs are nothing but a kludgy way to digitize paper — a way to preserve the old traditions and avoid the future. Try reading a PDF on a cell phone and you see the problem. Try clicking on a citation in a PDF and you see the problem. Try and scrape the information out of a PDF to make it computer readable and you see the problem. The only useful function that PDFs serve is as a bridge to the past.

The future is all about connected information — breaking the physical bounds of what we think of as a document and allowing the nuggets of information found within them to be connected, interrelated, and acted upon. This is the real reason why the future lies with XML and its related technologies.

In my blog last week I provided a brief glimpse into how our future amending tools will work. I explored how legislation could be managed similar to how software is managed with GitHub. This is an example of how useful connected information becomes. Rather than producing bills and amendments as paper documents, the information is stored in a way that it can be efficiently and accurately automated — and made available to the public in a computer readable way.

At Xcential, we’re building our new web-based authoring system — LegisPro. If you take a close look at it, you’ll see that it has two main components. Of course, there is a robust XML editor. However, at the system’s very heart is a linking system — something we call a resolver. It’s this resolver where the true power lies. It’s an HTTP-based system for managing all the linkages that exist in the system. It connects XML repositories, external data sources, and even SQL databases together to form a seamless universe of connected information.

We’re working hard to transform how legislation, and indeed, all government information is viewed. It’s not just about connecting laws and legislation together through simple web links. We talking about providing rich connections between all government information — tying financial data to laws and legislation, connecting regulatory information together, associating people, places, and things to government data, and on and on. We have barely started to scratch the surface, but it’s clear that the future lies with connected information.

While we today position LegisPro as a bill authoring system — it’s much more than that. It’s some of the fundamental underpinnings necessary for a system to transform government documents of today into the connected information of tomorrow.

Connected Information

Can GitHub be used to manage legislation?

Every so often, someone suggests that GitHub would be a great way to manage legislation. Usually, we roll our eyes at the naïve suggestion and that is that.

However, there are a good many similarities that do deserve consideration. What if the amending process was supported by a tool that, while maybe not GitHub, worked on the same principles?

My company, Xcential, built the amending solution for the California Legislature, using a process we like to call Amendments in Context. With this process, a proposed revision of a bill is drafted and then the amendments necessary to produce that revision are extracted as an amendment document. That amendment document, which really becomes an enumeration of proposed changes in a report, is then submitted to the committee for approval. If approved, the revised document that was drafted earlier then becomes the next official version of the bill. This process differs from the traditional process in which an amendment document is drafted, itemizing changes to be made. When the committee approves the amendments, there is a mad rush, usually overnight, to implement (or execute) those amendments to the last version in order to produce the next version. Our Amendments in Context automated approach is more accurate and largely eliminates the overnight bottleneck of having to execute approved amendments before the start of business the following day.

Since implementing this system for California, we’ve been involved in a number of other jurisdictions and efforts that deal with the amending process. This has given us quite a good perspective on the various ways in which bill amendments get handled.

As software developers ourselves, we’ve often been struck by how similar the bill amendment process is to the software development process — the very thing that invariably leads to the suggestion that GitHub could be a great repository for legislation. With this all in mind, let’s compare and contrast the bill amending process with the software development process using GitHub.

(We’ll make suitable procedural simplifications to keep the example clear)

Begin a proposed amendment Begin a proposed enhancement
Create a copy of the last version of a bill. In the U.S. and other parts of the world that still use page and line numbers, cleverly annotated page and line number information from the last publication must be included. This copy will be modified to reflect the proposed changes. Create a new software branch. This branch will be modified to implement the proposed enhancement
Make the proposed changes using redlining, showing the changes as insertions and deletions. Carefully craft the changes to obey the drafting rules and any political sensitivities regarding how the changes are shown. Make the proposed changes to the software — testing and debugging as needed.
Redlining Software
Generate the amendment Prepare to commit
The amendment generator examines the redlining (insertions and deletions), carefully grouping changes together to produce a minimized set of amendments. These amendments are expressed in the familiar, at least in the U.S., “on page X, line Y, strike ‘this’ and replace with ‘that'” or something along those lines. (For jurisdictions that don’t use an amendment generator, a manually written amendment document, enumerating the amendments, is the starting point) A differencing engine compares the source code with the prior version, carefully grouping changes together to produce a minimized set of hunks. If you use a tool such as SourceTree by Atlassian, these hunks are shown as source code with lines to be removed and lines to be inserted.
Amendment Hunks
Save the amendment document alongside the revised bill with redlining Commit the changes to GitHub
Vote on the amendments Submit for review
The amendment document goes to committee where it is proposed and then either adopted or rejected. The procedures here may differ, depending on the jurisdiction. In California, multiple competing amendment documents (known as instruction amendments) may be proposed at any one time, but only one can be adopted and it is adopted in whole. Other jurisdictions allow multiple amendment documents to be adopted and individual amendments with any amendment document to be adopted or rejected. The review board considers the proposed enhancement and decides whether or not to incorporate them into the next release. They may choose to adopt the entire enhancement or they may choose to adopt only certain aspects of it.
Execute the amendment Merge into mainline
In California, because only single whole amendments can be adopted, executing an adopted amendment is quite easy — the redlined version of the bill simply becomes the next version. However, in most jurisdictions, this isn’t so easy. Instead, each amendment must be applied to a new copy of the bill, destined to become the next version. Conflicts that arise must be resolved following a prescribed set of procedures. Incorporating an enhancement into the mainline involves a merge of the enhancement branch into the mainline. If an enhancement is not adopted in whole, then approved changes may be cherry picked. When conflicts between different sets of approved enhancements occur, GitHub requires manual intervention to resolve the issues. This process is generally a lot less formal than resolving conflicts in legislation.

So, as you can see, there are a lot of similarities between amending a bill and implementing a software enhancement. The basic process is essentially identical. However, the differences lie in the details.

Git is designed specifically for the software development process. The legislative process has quite a different set of requirements and traditions which must be met. It simply isn’t possible to bend and distort the legislative process to fit the model prescribed by Git. However, that doesn’t mean that something like GitHub is out of the question. What if there was a GitHub for Legislation — a tool with an associated repository, modeled after Git and GitHub, specifically designed for managing legislation?

This example shows the power of adopting XML for drafting legislation. With properly designed XML, legislation becomes a vast store of machine-readable information that can meet the 21st century challenges of accuracy, efficiency, and transparency. We’re not just printing paper anymore — we’re managing digital information.

Can GitHub be used to manage legislation?

LegisPro edit will soon be ready for beta!

Our new rulemaking LegisProedit is coming along nicely. It’s a web-based XML drafting tool specifically designed for the rigors of rulemaking tasks such as legislative bill drafting. It supports both the Akoma Ntoso and the USLM legislative models and can be customized to support any other model if necessary.

This past week I gave a demonstration of it at the LEX US Summer School at George Mason University in Washington D.C. With trepidation, I allowed everyone to have a hands-on experience with it as I provided guidance. This was the first time the editor had been used by anyone outside of Xcential and the first time we had stressed server performance. While certainly not glitch free, the editor exceeded my expectations for this point in the development process and all went well. It worked!

This week we are talking about the editor at NCSL by way of a screenshot demo that I am sharing here:

The next opportunities to try the editor hands-on will be at the LEX Summer School in Ravenna, Italy next month and we will also be showing it later in the month at NALIT in Sacramento, California.

The QuickStarter beta program is still in the process of being finalized. We are currently envisioning different levels of participation, from basic beta testing to a full-fledged evaluation program for anyone looking to use it or a part of it in an upcoming project.

More information can be found at or you can contact us at

LegisPro edit will soon be ready for beta!

Data Transparency Breakfast, LEX US Summer School 2015, First International Akoma Ntoso Conference, and LegisPro Edit reveal.

Last week was a very good week for my company, Xcential.

We started the week hosting a breakfast put on by the Data Transparency Coalition at the Booz Allen Hamilton facility in Washington D.C.. The topic was Transforming Law and Regulation. Unfortunately, an issue at home kept me away but I was able to make a brief pre-recorded presentation and my moderating role was played by Mark Stodder, our company President. Thank you, Mark!

Next up was the first U.S. edition of the LEX Summer School from Italy. I have attended this summer school every year since 2010 in Italy and it’s great to see the same opportunity for an open dialog amongst the legal informatics community finally come to the U.S. Monica Palmirani (@MonicaPalmirani), Fabio Vitali, and Luca Cervone (@lucacervone) put on the event from the University of Bologna. The teachers also included Jim Mangiafico  (@mangiafico) (the LoC data challenge winner), Veronique Parisse (@VeroParisse) from the European Union, Andrew Weber (@atweber) from the Library of Congress, Kirsten Gullickson (@GullicksonK) from the Office of the Clerk at the U.S. House of Representatives, and myself from Xcential. I flew in for an abbreviated visit covering the last two days of the Summer School where I covered how the U.S. Code is modeled in Akoma Ntoso and gave the students an opportunity to try out our new bill drafting editor — LegisProedit.

After the Summer School concluded, it was followed by the first International Akoma Ntoso Conference on Saturday, where I spoke about the architecture of our new editor as well as how the USLM schema is a derivative of the Akoma Ntoso schema. We had good turnout, from around the world, and a number of interesting speakers.

This week is NCSL in Seattle where we will be discussing our new editor with potential customers and partners. Mark Stodder from Xcential will be in attendance.

In a month, I’ll be in Ravenna once more for the European LEX Summer School — where I’ll be able to show even more progress towards the goal of a full product line of Akoma Ntoso tools. It’s interesting times for me.

The editor is coming along nicely and we’re beginning to firm up our QuickStarter beta plans. I’ve already received a number of requests and will be getting in touch with everyone as soon as we’re ready to roll out the program. If you would like to participate as a beta tester — or if you would just like more information, please contact us at

I’m really excited about how far we’ve come. Akoma Ntoso is on the verge of being certified as an official OASIS standard, our Akoma Ntoso products are coming into place, and interest around the world is growing. I can’t wait to see where we will be this time next year.

Data Transparency Breakfast, LEX US Summer School 2015, First International Akoma Ntoso Conference, and LegisPro Edit reveal.

Coming soon!!! A new web-based editor for Akoma Ntoso

I’ve been working hard for a long time — building an all new web-based editor for Akoma Ntoso. We will be showing it for the first time at the upcoming Akoma Ntoso LEX Summer School in Washington D.C.

Unlike our earlier AKN/Editor, this editor is a pure XML editor designed from the ground up using the XML capabilities that modern browsers possess. This editor is much more robust, more precise,  and is very scalable.


Basic Features

  1. Configurable XML models — including Akoma Ntoso and USLM
  2. Edit full documents or portions of large documents
  3. Flexible selection and editing regardless of XML structure
  4. Built-in redlining (change tracking) supporting textual AND structural changes
  5. Browse document sources with drag-and-drop.
  6. Full undo & redo
  7. Customizable attribute editor
  8. Search and replace
  9. Modular architecture to allow for extensive customization

Underlying Technology

  1. XML-based editing component
    • DOM 4 support
    • XPath Support
    • CSS Styling
    • Sophisticated event model
  2. HTTP-based resolver architecture for retrieving documents
    • Interpret citations
    • Deference URLs
    • WebDAV adaptors to document repositories
    • Query repositories with XQuery or databases with SQL
  3. AngularJS-based User Interface using HTML5
    • Component modules for easy customization
  4. XML repository for storing documents
    • Integrate any XML repository
    • Built-in support for eXist-db
  5. Validation & Publishing
    • XML Schema validator
    • XSL-FO publishing

We’ll reveal a lot more at the LEX Summer School later this month! If you’re interested in our QuickStart beta program, drop me a note at

Coming soon!!! A new web-based editor for Akoma Ntoso

Upcoming U.S. and European events related to Akoma Ntoso

In my last blog post I covered the public review of the new proposed Akoma Ntoso (LegalDocML) standard for legal documents. Please keep the comments coming. In order to comment, please send email to If you wish to subscribe to this mailing list, please follow the instructions at

In addition, there are three upcoming events related to Akoma Ntoso which you may wish to participate in: (this list coming from Monica Palmirani, the chair of the OASIS LegalDocML technical committee)

1. Akoma Ntoso Summer School, 27-31 July, 2015, George Mason University, Fairfax, Virginia (USA):
Registration fee:
Application Form:
Deadline: end of June, 2015.

2. IANC2015 (First International Akoma Ntoso Conference): August 1st, 2015, George Mason University, Fairfax, Virginia (USA)
Call for contributions:
Deadline: June 19th, 2015.

3. Summer School LEX2015, 7-15 Sept. 2015, Ravenna, Italy:
Registration fee:
Application Form:
Deadline: July, 15th, 2015.

I have been participating in the European LEX Summer school every year since 2010 and find it to be both inspirational and very valuable. If you’re interested in understanding where the legal informatics field is headed, I encourage you to find a way to attend any of these events. I will be speaking/teaching at all three events.

Upcoming U.S. and European events related to Akoma Ntoso