Akoma Ntoso, Standards, Transparency

The U.S. Code in Akoma Ntoso

I’m on my way to Italy this week for my annual pilgrimage to Ravenna, Italy and the LEX Summer School put on by the University of Bologna. This is my fourth trip to the class. I always find it so inspirational to be a part of the class and the activities that surround it. This year I will be talking about the many on-going projects that we have underway as well as talking, in depth, about the HTML5 editor I built for Akoma Ntoso.

Before I get to Italy, I wanted to share something I’ve been working on. It should come as absolutely no surprise to anyone that I’ve been working on producing a version of the U.S. Code in Akoma Ntoso. A few weeks ago, the U.S. Office of the Law Revision Counsel released the full U.S. Code in XML. My company, Xcential, helped them to produce that release. Now I’ve taken the obvious next step and begun work on a transform to convert that XML into Akoma Ntoso – the format currently being standardized by the OASIS Legal Document ML technical committee. I am an active member of that TC.

U.S. Code

About 18 months ago, I learned of a version of the U.S. Code that had been made available in XML. While that XML release was quite far from complete, I used to to produce a representation in Akoma Ntoso as it stood back then. My latest effort is a replacement and update of that work. The new version of XML released by the OLRC is far more accurate and complete and is a better basis for the transform than earlier release was. And besides, I have a far better understanding of the new version – having had a role in its development.

My work is still very much a work-in-progress. I believe in openly sharing my work in the hope of inspiring other to dive into this subject – so I’m releasing a partial first step in order to get some feedback. Please note that this work is a personal effort – it is not a part of our work with the OLRC. At this point I’ve written a transform to produce Akoma Ntoso XML according to the most recent schema released a few weeks ago. The transform is not finished, but it gives a pretty good rendition of the U.S. Code in Akoma Ntoso. I’m using the transform as a vehicle to identify use cases and issues which I can bring up with the OASIS TC at our weekly meetings. As a result, there are a few open issues and the resulting XML does not fully validate.

I’m making 8 Titles available now. They’re smaller Titles which are easier for me to work with as I refine the transform. Actually, I do have the first 25 Titles converted into Akoma Ntoso, but I’ll need to address some performance and space issues with my tired old development server before I can release the full set. Hopefully, over the next few months, I’ll be able to complete this work.

When you look at the XML, you will notice a “proposed” namespace prefix. This simply shows proposed aspects of Akoma Ntoso that are not yet adopted. Keep in mind that this is all development work – do not assume that the transformation I am showing is the final end result.

I’m looking for feedback. Monica, Fabio, Veronique, and anyone else – if you see anything I got wrong or could model better, please let me know. If anyone finds the way I modeled something troubling, please let me know. I’m doing this work to open up a conversation. By trying Akoma Ntoso out in different usage scenarios, we can only make it better.

Don’t forget the Library of Congress’ Legislative Data Challenge. Perhaps my transformation of the U.S. Code can inspire someone to participate in the challenge.

Standard
Akoma Ntoso, Hackathon, HTML5, LegisPro Web, Standards, Transparency, W3C

Web-Based XML Legislative Editor Update

It’s been quite a while since I gave an update on our web-based XML legislative editor – LegisProweb. But that doesn’t mean that nothing has been going on. Quite the contrary, this has been a busy year for the editor project.

Let me first recap what the editor is. It’s an XML editor, written entirely around HTML5 technologies. It was first developed last year as the centerpiece to a Hackathon that Ari Hershowitz and I staged in San Francisco and around the world. While it is designed as a general purpose XML editor and can be configured to model any XML schema, it’s primarily configured to support Akoma Ntoso.

LegisProWeb

Since then, there has been a lot of continuing interest in the editor. If you attended the 2013 Legislative Data and Transparency Conference this past May in Washington DC, you may have noticed Jim Harper of the Cato Institute demonstrating their “Deepbills” project. The editor you saw is a heavily customized early version of LegisProweb, reconfigured to handle the XML format that the US Congress publishes legislation in.

And that’s not the only place where LegisProweb has been adopted. We’re in the finishing stages of a somewhat larger implementation we did for Chile. This is an Akoma Ntoso implementation – focused on debates and debate reports rather than on legislation. One interesting point worth noting – this implementation is done in Spanish. LegisProweb is quite easily localized.

The common thread between these two implementations in the use case – they’re both implementations focused on tagging metadata within pre-existing documents rather than on creating new documents from scratch. This was the focus of the Hackathon we staged back in 2012 – little did we know how much of a market would exist for an editor focused on annotation rather than document creation. And there’s more to still come – we’ve been quite surprised in the level of interest in this particular use-case.

Of course, we’re not satisfied with an editor that can only annotate existing documents. We’ve been hard at work turning the editor into a full-featured legislative editor that works equally well at creating new documents as it does at annotating existing documents. In addition, we’ve made the editor very customizalble as well as adding capabilities to manage the comments and discussions that might revolve around a document as it is being created and annotated.

Most recently, the editor has been upgraded to the latest version of Akoma Ntoso coming out of the OASIS legal document ML technical committee where I am an active member. Along with that effort, the validator has been separated to run as a standalone Akoma Ntoso validator. I talked about that in my blog last week. I’m busy using the validator as I work frantically to complete an Akoma Ntoso project I am working on this week. I’ll talk some more about this project next week.

So where do we go from here? Well, the first big effort is to modularize the technologies found within the editor. We now have a diverse set of customers right now and they can all benefit from the various bits and pieces that make up LegisProweb. By modularizing the pieces, we’ll be able to pick and choose which parts we use when and how. Separating out the validator was the first step. We’ll also be pulling out the reference resolver, attaching it to a native XML database, and partitioning out the client-side to allow the editing component to be used without the full editing environment offered by LegisProweb.

One challenge that remains is handling redlining – managing insertions and deletions. This is a very difficult subject – and one I tackled in the work I did implementing the XML editor used by the California legislature. I took a very different approach in trying to solve the problem with LegisProweb, but I’m not happy with the result. So, I’ll be returning to the proven approach we used way back when we built the original LegisPro editor on XMetaL.

As you can tell, we’ve got our work for the next year cut out for us.

Standard
Akoma Ntoso, LegisPro Web, Standards, Transparency, W3C

Free Akoma Ntoso Validator

How are people doing with the Library of Congress’ Akoma Ntoso Challenge? Hopefully, you’re making good progress, having fun doing it, and in so doing, learning a valuable new skill with this important emerging technology.

I decided to make it easy for someone without an XML Editor to validate their Akoma Ntoso documents for free. We all know how expensive XML Editors tend to be. If you’re like me, you’ve used up all the free trials you could get. I’ve separated the validation part of our LegisProweb editor from the editing base to allow it to be used as a standalone validator. Now, all you need to do is either provide a URL to your document or, even easier, drop the text into the text area provided and then click on the “Validate” button. You don’t even need to go find a copy of the Akoma Ntoso schema or figure out how to hook it up to your document – I do all that for you.

To use the validator, simply draft your Akoma Ntoso XML document, specifying the appropriate namespace using the @xmlns namespace declaration, and then paste a copy into the validator. I’ll go and find the schema and then validate your document for you. The validation results will be shown to you conveniently inline within your XML source to help you in making fixes. Don’t worry, we don’t record anything when you use the validator – it’s completely anonymous and we keep no record of your document.

You can validate either the 2.0 version of Akoma Ntoso or the latest 3.0 version which reflects the work of the OASIS LegalDocumentML committee. Actually, there are quite a few other formats that the validator also will work with innately and, by using xsi:schemaLocation, you can point to any XML schema you wish.

Give the free Akoma Ntoso XML Validator a try. You can access it here. Please send me any feedback you might have.

Validator1Input Form Validator2Validation Results
Standard
Transparency

U.S. House of Representatives release the U.S. Code in XML.

This week marked a big milestone for us. The U.S. House of Representatives released the U.S. Code in XML. You can see the announcement by the Speaker of the House, John Boehner (R-Ohio), here. This is a big step forward towards a more transparent Congress. As many of you know, my company, Xcential, has worked closely with the Law Revision Counsel on this project. It has been an honor to provide our expertise as part of our on-going efforts with the U.S. House of Representatives.

This project has been a great opportunity for us to update the U.S. House of Representatives technology platform by introducing new XML schema techniques along with robust and high performance conversion tools. Our eleven years in this field, working on an international scale, has given us valuable insights into XML techniques which we were able to bring to bear to ensure that success of this project.

The feedback has been very good:

As you can expect, members of the technical community have swiftly picked up on this release and are actively finding ways to use the data it provides. Josh Tauberer of GovTrack.us has already started – check out his work here. Why did I already know he would be the first to jump in. 🙂

Of course, if you know me, you’ll know that I also have something up my sleeve. I’ll be spending my weekends and evenings for the next few weeks to release an Akoma Ntoso transform coincident with an upcoming OASIS LegalDocML announcement. Keep watching my blog for more info.

This project has been one of numerous projects we are working on right now. We have a very similar project underway in Asia and an Akoma Ntoso project nearing completion using our HTML5-based editor, LegisProWeb, in South America. I’ll be providing an update on LegisProweb in the coming weeks.

Standard
Akoma Ntoso, LegisPro Web, Standards, Transparency

Akoma Ntoso Challenge by the Library of Congress

As many of you may have already read, the U.S. Library of Congress has announced a data challenge using Akoma Ntoso. The challenge lasts for three months and offers a $5,000 prize to the winner.

In this challenge, participants are asked to mark up four Congressional bills, provided as raw text, into Akoma Ntoso.

If you have the time to participate in this challenge and can fulfill all the eligibility rules, then I encourage you to step up to the challenge. This is a good opportunity to give Akoma Ntoso a try – to both learn the new model and to help us to identify any changes or adaptations that must be made to make Akoma Ntoso suitable for use with Congressional legislation.

You are asked, as part of you submission, to identify gaps in Akoma Ntoso’s design along with documenting the methodology you used to construct your solution to the four bills. You’re also encouraged to use any of the available open-source editors that are currently available for editing Akoma Ntoso and to provide feedback on their suitability to the task.

I would like to point out that I also provide an Akoma Ntoso editor at http://legisproweb.com. It is free to use on the web along with full access to all the information you need to customize the editor. However, while our customers do get an unrestricted internal license to the source code, our product is not open source. At the end of the day, I must still make a living. Nonetheless, I believe that you can use any editor you wish to create your four Akoma Ntoso documents – it’s just that the sponsors of the competition aren’t looking for feedback on commercial tools. If you do choose to use my editor, I’ll be there to provide any support you might need in terms of features and bug fixes to help speed you on your way.

Standard
Process, Transparency

Transparent legislation should be easy to read

Legislation is difficult to read and understand. So difficult that it largely goes unread. This is something I learned when I first started building bill drafting systems over a decade ago. It was quite a let down. The people you would expect to read legislation don’t actually do that. Instead they must rely on analyses, sometimes biased, performed by others that omits many of the nuances found within the legislation itself.

Much of the problem is how legislation is written. Legislation is often written so as to concisely describe a set of changes to be made to existing law. The result is a document that is written to be executed by a law compilation team deep within the government rather than understood by law makers or the general public. This article, by Robert Potts, rather nicely sums up the problem.

Note: There is a technical error in the article by Robert Potts. The author states “These statutes are law, but since Congress has not written them directly to the Code, they are added to the Code as ‘notes,’ which are not law. So even when there is a positive law Title, because Congress has screwed it up, amendments must still be written to individual statutes.” This is not accurate. Statutory notes are law. This is explained in Part IV (E) of the DETAILED GUIDE TO THE CODE CONTENT AND FEATURES.

So how can legislation be made more readable and hence more transparent? The change must come in how amendments are written – with an intent to communicate the changes rather than just to describe them. Let’s start by looking at a few different ways that amendments can be written:

1) Cut-and-Bite Amendments

Many jurisdiction around the world use the cut-and-bite approach to amending, also known as amendments by reference. This includes Congress here in the U.S., but it is also common to most of the other jurisdictions I work with. Let’s take a look at a hypothetical cut-and-bite amendment:

SECTION 1. Section 1234 of the Labor Code is amended by repealing “$7.50” and substituting “$8.50”.

There is no context to this amendment. In order to understand this amendment, someone is going to have to go look up Section 1234 of the Labor Code and manually make apply the change to see what it is all about. While this contrived example is simple, it already involves a fair amount of work. When you extrapolate this problem to a real bill and the sometimes convoluted state of the law, the effort to understand a piece of legislation quickly becomes mind-boggling. For a real bill, few people are going to have either the time or the resources to adequately research all the amendments to truly understand how they will affect the law.

2) Amendments Set Out in Full

I’ve come to appreciate the way the California Legislature handles this problem. The cut-and-bite style of amending, as described above, is simply disallowed. Instead, all amendments must be set out in full – by re-enacting the section in full as amended. This is mandated by Article 4, section 9 of the California Constitution. What this means is that the amendment above must instead be written as:

Section 1. Section 1234 of the Labor Code is amended to read:

1234. Notwithstanding any other provision of this part, the minimum wage for all industries shall be not less than $8.50 per hour.

This is somewhat better. Now we can see that we’re affecting the minimum wage – we have the context. The wording of the section, as amended, is set out in full. It’s clear and much more transparent.

However, it’s still not perfect. While we can see how the amended law will read when enacted, we don’t actually know what changed. Actually, in California, if you paid attention to the bill redlining through its various stages, you could have tracked the changes through the various versions to arrive at the net effect of the amendment. (See note on redlining) Unfortunately, the redlining rules are a bit convoluted and not nearly as apparent as they might seem to be – they’re misleading to the uninitiated. What’s more, the resulting statute at the end of the process has no redlining so the effect of the change is totally hidden in the enacted result.

Setting out amendments in full has been adopted by many states in addition to California. It is both more transparent and greatly eases the codification process. The codification process becomes simple because the new sections, set out in full, are essentially prefabricated blocks awaiting insertion into the law at enactment time. Any problems which may result from conflicting amendments are, by necessity, resolved earlier rather than later. (although this does bring along its own challenges)

3) Amendments in Context

There is an even better approach – which is adopted to varying degrees by a few legislatures. It is to build on the approach of setting out sections in full, but adds a visible indication of what has changed using strike and insert notation. I’ll refer to this as Amendments in Context.

This problem is partially addressed, at the federal level, by the Ramseyer Rule which requires that a separate document be published which essentially does shows all amendments in context. The problem is that this second document isn’t generally available – and it’s yet another separate document.

Why not just write the legislation showing the amendments in context to begin with? I can think of no reason other than tradition why the law, as proposed and enacted, shouldn’t show all amendments in context. Let’s take a look at this approach:

Section 1. Section 1234 of the Labor Code is amended to read:

1234. Notwithstanding any other provision of this part, the minimum wage for all industries shall be not less than $7.50 $8.50 per hour.

Isn’t this much clearer? At a glance we can see that the minimum wage is being raised a dollar. It’s obvious – and much more transparent.

At Xcential, we address this problem in California by providing an amendments in context view for all state legislation within our LegisWeb bill tracking service. We call this feature As Amends the LawTM and it is computed on-the-fly.

Governments are spending a lot of time, energy, and money on legislative transparency. The progress we see today is in making the data more accessible to computer analysis. Amendments in context would make the legislation not only more accessible to computer analysis – but also more readable and understandable to people.

Redlining Note: If redlining is a new term to you, it is similar to, but subtly different, to track changes in a word processor.

Standard
Akoma Ntoso, Standards, Transparency

Legislative Data: The Book

Last week, as I was boarding the train at Admiralty station in Hong Kong to head back to the office, I learned that I am writing a book. +Ari made the announcement on his blog. It seems that Ari has found the key to getting me to commit to something – put me in a situation where not doing it is no longer an option. Oh well…

Nonetheless, there are many good reasons why now is a good time to write a book. In the past year we have experienced a marked increase in interest in the subject of legislative data. I think that a number of factors are driving this. First, there is renewed interest in driving towards a worldwide standard – especially the work being done by the OASIS LegalDocumentML technical committee. Secondly, the push for greater transparency, especially in the USA, is driving governments to investigate opening up their databases to the outside world. Third, many first generation XML systems are now coming due for replacement or modernization.

I find myself in a somewhat fortuitous position of being able to view these developments from an excellent vantage point. From my base in San Diego, I get to work with and travel to legislatures around the world on a regular basis. This allows me to see the different ways people are solving the challenges of implementing modern legislative information managements systems. What I also see, is how many jurisdictions struggle to set aside obsolete paper-based models for how legislative data should be managed. In too many cases, the physical limitations of paper are used to define the criteria for how digital systems should work. Not only do these limitations hinder the implementation of modern designs, they also create barriers that will prevent fulfilling the expectations that come as people adapt to receiving their information online rather than by paper.

The purpose of our book will be to propose a vision for the future of legislative data. We will share some of our experiences around the world – focusing on the successes some legislatures have had as they’ve broken legacy models for how things must work. In some cases the changes involve simply better separating the physical limitations of the published form from the content and structure. In other cases, we’ll explain how different procedures and conventions can not only facilitate the legislative process, but also make it more open and transparent.

We hope that by producing a book on the subject, we can help clear the path for the development of a true industry to serve this somewhat laggard field. This will create the conditions that will allow a standard, such as Akoma Ntoso, to thrive which, in turn, will allow interchangeable products to be built to serve legislatures around the world. Achieving this goal will reduce the costs and the risks of implementing legislative information management systems and will allow the IT departments of legislatures to meet both the internal and external requirements being placed upon them.

Ari extended an open invitation to everyone to propose suggestions for topics for us to cover. We’ve already received a lot of good interest. Please keep your ideas coming.

Standard
Akoma Ntoso, HTML5, LegisPro Web, Standards, Transparency

2013 Legislative Data and Transparency Conference

Last week I participated in the 2013 Legislative and Transparency Conference put on by the U.S. House of Representatives in Washington D.C.

It was a one day event that featured numerous speakers both within the U.S. government and in the surrounding transparency community around D.C. My role, at the end of the day, was to speak as a panelist along with Josh Tauberer of GovTrack.us and Anne Washington of The George Washington University on Under-Digitized Legislative Data. It was a fun experience for me and allowed me to have a friendly debate with Josh on API’s versus bulk downloads of XML data. In the end, while we both fundamentally agree, he favors bulk downloads while I favor APIs. It’s a simple matter of how we use the data.

The morning sessions were all about the government reporting the progress they have made over the past year relating to their transparency initiatives. There has been substantial progress this year and this was evident in the various talks. Particularly exciting was the progress that the Library of Congress is making in developing the new congress.gov website. Eventually this website will expand to replace THOMAS entirely.

The afternoon sessions were kicked off by Gherardo Casini of the UN-DESA Global Centre for ICT in Parliament in Rome, Italy. He gave an overview of the progress, or lack thereof, of XML in various parliaments and legislatures around the world. He also gave a brief mention of the progress in the LegalDocumentML Technical Committee at OASIS which is working towards the standardization of Akoma Ntoso. I am a member of that technical committee.

The next panel was a good discussion on extending XML. The panelists were Eric Mill at the Sunlight Foundation who, among other things, talked about the HTML transformation work he has been exploring in recent weeks. I mentioned his efforts in my blog last week. Following him was Jim Harper at the Cato Institute. He talked about the Cato Institute’s Deepbills project. Finally, Daniel Bennett gave a talk on HTML and microdata. His interest in this subject was also mentioned in my blog last week.

One particularly fun aspect of the conference was walking into the entrance and noticing the Cato Institute’s Deepbills editor running on the table at the entrance. The reason it was fun for me is that their editor is actually a customization of an early version of the HTML5-based LegisPro Web editor which I have spent much of the past year developing. We have developed this editor to be an open and customizable platform for legislative editing. The Cato Project is one of four different implementations which now exist – two are Akoma Ntoso based and two are not. More news will come on this development in the not-too-distant future. I had not expected the Cato Institute to be demonstrating anything and it was quite a nice surprise to see software I had written up on the display.

If there was any recurring theme throughout the day, it was the call for better linked data. While there has been significant progress over the past year towards getting the data out there, now it is time to start linking it all together. Luckily for me, this was the topic I had chosen to focus on in my talk at the end of the day. It will be interesting to see the progress that is made towards this objective this time next year.

All in all, it was a very successful and productive day. I didn’t have a single moment to myself all day. There were so many interesting people to meet that I didn’t get a chance to chat with nearly as many as I would have liked to.

For an amusing yet still informative take on the conference, check out Ari Hershowitz’s Tabulaw blog. He reveals a little bit more about some of the many projects we have been up to over the past year.

https://cha.house.gov/2013-legislative-data-and-transparency-conference

Standard
Akoma Ntoso, HTML5, Standards, Transparency, W3C

XML, HTML, JSON – Choosing the Right Format for Legislative Text

I find I’m often talking about an information model and XML as if they’re the same thing. However, there is no reason to tie these two things together as one. Instead, we should look at the information model in terms of the information it represents and let the manner in which we express that information be a separate concern. In the last few weeks I have found myself discussing alternative forms of representing legislative information with three people – chatting with Eric Mill at the Sunlight Foundation about HTML microformats (look for a blog from him on this topic soon), Daniel Bennett regarding microdata, and Ari Hershowitz regarding JSON.

I thought I would try and open up a discussion on this topic by shedding some light on it. If we can strip away the discussion of the information model and instead focus on the representation, perhaps we can agree on which formats are better for which applications. Is a format a good storage format, a good transport format, a good analysis/programming format, or a good all-around format?

1) XML:

I’ll start with a simple example of a bill section using Akoma Ntoso:

<section xmlns="http://docs.oasis-open.org/legaldocml/ns/akn/3.0/CSD03" 
       id="{GUID}" evolvingId="s1">
    <num>§1.</num>
    <heading>Commencement </heading>
    <content> <p>This act will go into effect on 
       <date name=”effectiveDate” date="2013-01-01">January 1, 2013</date&gt;. 
    </p> </content>
</section> 

Of course, I am partial to XML. It’s a good all-around format. It’s clear, concise, and well supported. It works well as a good storage format, a good transport format, as well as being a good format of analysis and other uses. But it does bring with it a lot of complexity that is quite unnecessary for many uses.

2) HTML as Plain Text

For developers looking to parse out legislative text, plain text embedded in HTML using a <pre> element has long been the most useful format.

   <pre>
   §1. Commencement
   This act will go into effect on January 1, 2013.
   </pre>

It is a simple and flexible represenation. Even when an HTML represenation is provided that is more highly decorated, I have always invariably removed the decorations to leave behind this format.

However, in recent years, as governments open up their internal XML formats as part of their transparency intiatives, it’s becoming less necessary to write your own parsers. Still, raw text is a very useful base format.

3) HTML/HTML5 using microformats:

<div class="section" id="{GUID}" data-evolvingId="s1">
   <div>
      <span class="num">§1.</span> 
      <span class=”heading”>Commencement </span>
   </div>
   <div class="content"><p>This act will go into effect on 
   <time name="effectiveDate" datetime="2013-01-01">January 1, 2013 <time>. 
   </p></div>
</div>

As you can see, using HTML with microformats is a simple way of mapping XML into HTML. Currently, many legislative data sources that offer HTML content either offer bill text as plain text as I showed in the previous example or they decorate it in a way that masks much of the semantic meaning. This is largely because web developers are building the output to an appearance specification rather than to an information specification. The result is class names that better describe the appearance of the text than the underlying semantics. Using microformats preserves much of the semantic meaning through the use of the class attribute and other key attributes.

I personally think that using HTML with microformats is a good way to transport legislative data to consumers that don’t need the full capabilities of the XML representation and are more interested in presenting the data rather than analyzing or processing it. A simple transform could be used to take the stored XML and to then translate it into this form for delivery to a requestor seeking an easy-to-consume solution.

[Note: HTML5 now offers a <section> element as well as an <article> element. However, they’re not a perfect match to the legislative semantics of a section and an article so I prefer not to use them.]

4) HTML5 Microdata:

<div itemscope 
      itemtype="http://docs.oasis-open.org/legaldocml/ns/akn/3.0/CSD03#section" 
      itemid="urn:xcential:guid:{GUID}">
   <data itemprop="evolvingId" value="s1"/>
   <div>
      <span itemprop="num">§1.</span>
      <span itemprop="heading">Commencement </span>
   </div>
   <div itemprop="content"> <p>This act will go into effect on 
      <time itemprop="effectiveDate" time="2013-01-01">January 1, 2013 </time>.
   </p> </div>
</div>

Using microdata, we see more formalization of the annotation convention than microformats offers – which brings along additional complexity and requires some sort of naming authority which I can’t say I either really understand or see how it will happen. But it’s a more formalized approach and is part of the HTML5 umbrella. I doubt that microdata is a good way to represetn a full document. Rather, I see microdata better fitting in to the role of annotating specific parts of a document with metadata. Much like microformats, microdata is a good solution as a transport format to a consumer not interested in dealing with the full XML representation. The result is a format that is rich in semantic information and is also easily rendered to the user. However, it strikes me that the effort to more robustly handle namespaces only reinvents one of XMLs more confusing aspects, namely namespaces, in just a different way.

5) JSON

{
   "type": "http://docs.oasis-open.org/legaldocml/ns/akn/3.0/CSD03#section",
   "id": "{GUID}",
   "evolvindId": "s1",
    "num" : {
      "type": "http://docs.oasis-open.org/legaldocml/ns/akn/3.0/CSD03#num",
      "text": "§1."
   },
   "heading":  {
      "type": "http://docs.oasis-open.org/legaldocml/ns/akn/3.0/CSD03#heading",
      "text": "Commencement"
   },
   "content": {
      "type": "http://docs.oasis-open.org/legaldocml/ns/akn/3.0/CSD03#content",
      "text1": "This act will go into effect on "
      "date": {
         "type": "http://docs.oasis-open.org/legaldocml/ns/akn/3.0/CSD03#date",
         "date": "2013-01-01",
         "text": "January 1, 2013"
      }
      "text2": "."
   }
}

Quite obviously, JSON is great if you’re looking to easily load the information into your programmatic data structures and aren’t looking to present the information as-is to the user. This is a programmatic format primarily. Representing the full document in JSON might be overkill. Perhaps the role of JSON is for key parts of extracted metadata than the full document.

There are still other formats I could have brought up like RDFa, but I think my point has been made. There are many different ways of representing the same legislative model – each with its own strength and weaknesses. Different consumers have different needs. While XML is a good all-around format, it also brings with it some degree of sophistication and complexity that many information consumers simply don’t need to tackle. It should be possible, as a consumer, to specify the form of the information that most closely fits my need and have the legislative data source deliver it to me in that format.

[Note: In Akoma Ntoso, the format is called the “manifestation.” and is specified as part of the referencing specification.]

What do you think?

Standard
Akoma Ntoso, Standards, Transparency

Legal Reference Resolvers

After my last blog post I received a lot of feedback. Thanks to everyone who contacted me with questions and comments. After all the interest in the subject, I think I will devote a few more blog posts to the subject of legal references. It is quite possibly the most important subject that needs to be tackled anyway. (And yes, Harlan, I will try and blog more often.)

Many of the questions I received asked how I envision the resolver working. I thought I would dive into this aspect some more by defining the role of the resolver:

The role of a reference resolver is to receive a reference to a document or a fragment thereof and to do whatever it takes to resolve it, returning the requested data to the requestor.

That definition defines the role of a resolver in pretty broad terms. Let’s break the role down into some discrete functions:

  1. Simple Redirection – Perhaps the most basic service to provide will be that of a reference redirector. This service will convert a standardized virtual reference into a non-standard URL that is understood by a proprietary repository available elsewhere on the web that can supply the data for the request. The redirection service allows a legacy repository to provide access to documents following its own proprietary referencing mechanism without having to adopt the standard referencing nomenclature. In this case, the reference redirector will serve as a front to the legacy repository, mapping the standard references into non-standard ones.

  2. Reference Canonicalization – There are often a number of different ways in which a reference to a legal document can be composed. This is partly because the manner in which legal documents are typically structured sometimes encourages both a flat and a hierarchical view of the same data. For instance, one tends to think of section in a flat model because sections are usually sequentially numbered. Often however, those sections are arranged in a hierarchical structure which allows an alternate hierarchical model to also be valid. Another reason for alternate references is the simple fact that there are all sorts of different ways of abbreviating the same thing – and it is impossible to get everyone around the world to standardize on abbreviations. So “section1”, “sec1”, “s1”, and the even more exotic “§1” need to be treated synonymously. Also, let’s not forget about time. The requestor might be interested in the law as it existed on a particular date. The resulting reference will be formulated in a manner in which it starts being more of a document query rather than a document identifier. For instance, imagine a version of a section that became operational January 1, 2013. A request for the section that was in operation on February 1, 2013 will return that January 1 version if that version was still in operation on February 1 even though the operational date of the version is not February 1. (Akoma Ntoso calls the query case a virtual expression and differentiates it from the case where the date is part of the identifier)

    The canonicalization service will take any reference, perhaps vague or malformed, and will return one or more standardized references that precisely represent the documents that could be identified by the original reference – possibly along with a measure of confidence. I would imagine that official data services, providing authoritative legal documents, will most likely provide the canonicalization service.

  3. Repository Service – A legal library might provide both access to a document repository and an accompanying resolution service through which to access the repository. When this is the case, the resolver acts as an HTTP interface to the library, converting a virtual URL to an address of sorts in the document repository. This could simply involve converting the URL to a file path or it could involve something more exotic, requiring document extraction from a database or something similar.

    There are two separate use cases I can think of for the repository. The basic case is the repository as a read-only library. In this case, references are simply resolved, returning documents or fragments as requested. The second case is somewhat more complex and will exist within organizations tasked with developing legal resources – such as the organizations that draft legislation within the government. In this case, a more sophisticated read/write mechanism will require the resolver to work with technologies such as WebDAV which front for the database. This is a more advanced version of the solution we developed for use internally by the State of California.

  4. Resolver Routing – The most complex, and perhaps most difficult to achieve aspect, will be resolver routing. There is never going to exist a single resolver that can resolve every single legal reference in the world. There are simply too many jurisdictions to cover – in every country, state/province, county/parish, city/town, and every other body that produces legal documents. What if, instead, there was a way for resolvers to work together to return the document requested? While a resolver might handle some subset of all the references it receives on its own, for the cases it doesn’t know about, it might have some means to negotiate or pass on the request to other resolvers it knows about in order to return the requested data.

Not all resolvers will necessarily provide all the functions listed. How resolvers are discovered, how they reveal the functions they support, and how resolvers are tied together are all topics which will take efforts far larger than my simple blog to work out. But just imagine how many problems could be resolved if we could implement a resolving protocol that would allow legal references around the world to be resolved in a uniform way.

In my next blog, I’m going to return to the reference itself and take a look at the various different referencing mechanisms and services I have discovered in recent weeks. Some of the services implement some of the functions I have described above. I also want to discuss the difference between an absolute reference (including the domain name) and a relative reference (omitting the domain name) and why it is important that references stored in the document be relative.

Standard