I’m often asked what make on XML model better than another when it comes to representing laws and regulations. Just because a document is modeled in XML does not mean that it is useful in that form — the design of the schema matters in terms of what it enables or facilitates.
We have a few rules of thumb that we apply when either designing or adopting an XML schema:
- Is it semantic?
Reason: In order to process the information in a document, you have to understand what it is and what it means.
- Is the presentation separated from the semantics as much as possible?
Reason: We have moved beyond paper and nowadays it’s important to present information in form factors that just don’t suit the legacy constraints imposed by printing paper.
- Is all the text (excluding any metadata section) in the natural reading order?
Reason: The simplest way to present and process the text in a document is in the reading order of the text. This is particularly important is the presentation is to be added to the XML using simple CSS styling (as opposed to HTML transformation) and when the text is subject to complex amending instructions.
- Does it, to the fullest extent possible, avoid the use of generated text?
Reason: Similar to the last rule, it’s important for text to be displayed or amended when that text is represented. Generating text opens up a can of worms which can require sophisticated additional processing. Also, from a historical record of the text, which is essential for enacted law, having part of the text be generated by an external algorithm requires that the algorithm itself become part of the permanent record.
- Is every provision that needs data associated with it permanently identifiable?
Reason: With modern automation comes the need to not only manage the text of a provision but also state information. For example, is the current status of the provision pending, effective, repealed, or spent? While some of the metadata might be stored with the XML representation of the provision itself, sometimes it is better to store that metadata in a separate part of the document or in an external database. In these cases, it’s important to be able to permanently associate this external metadata with the provision — and this usually requires an immutable (permanent) identifier.
- Is every provision that is referred to easily locatable?
Reason: Laws are full of references (or citations). These are to provisions within the same document or to other documents or provisions within those documents. There needs to be a way to accurately and efficiently traverse and process these references. This need usually requires a locating identifier that an unambiguously identify the provision being referred to.
- If the XML schema is for general use, is there an extensible way to add missing constructs?
Reason: It is easy to claim to support all the legal traditions in the word, but extremely difficult to do so. While legal traditions are remarkably similar around the world, it’s impossible to predict every single construct that will arise — especially with documents data back hundreds of years. There has to be a way to implement constructs that don’t intrinsically exist within the base XML schema.
- Is there an extensible metadata mechanism?
Reason: A primary objective for representing a legislative or regulatory document in XML is for the processability it enables. This invariably means a need to record extensive metadata about the provisions found within the document. As the automation possibilities are endless, there needs to be a way to model and record the metadata that is generated.
- Does it provide the facilities necessary to automate according to modern expectations?
Reason: Some structure facilitate automation while others do not. For instance, flat structures can simplify the drafting process, but also make the automation process more difficult. It’s usually better to implement hierarchical structures and then hide the drafting complexity that creates with richer tools.