Changing the way the world is governed. Together.

I’ve recently been marveling at how software development has changed in recent years. Our development processes are increasingly integrated with both our government customers and our commercial partners — using modern Agile methodologies. This largely fulfills a grand vision I was a part of very early in my career.

I started my career at the Boeing Company working on Internal Methods and Processes Development (IMPD). Very soon, the vision that came about was the idea of Concurrent Engineering where all aspects of the product development cycle, including all disciplines, all partners, and all customers, were tightly integrated in a harmonious flow of information. Of course, making the vision a reality at Boeing’s scale has taken some time. Early on, Boeing had great success on the B777 programme where the slogan was “Working Together“. A bit later, with the B787 programme where they went a few (or perhaps many) steps too far, they stumbled for a while. This was all Agile thinking — before there was anything called Agile.

Boeing’s concurrent engineering efforts quickly inspired one of Boeing’s primary CAD suppliers, Mentor Graphics. Mentor was hard at work on their second generation platform of software tools for designing electronic systems. Concurrent Engineering was a great customer-focused story to wrap around those efforts. Mentor’s perhaps arrogant tagline was “Changing the way the world designs. Together.” Inspired, I quickly joined Mentor Graphics as the product manager for data management. Soon I was to find that the magnitude of the development effort had actually turned the company sharply inward and the company had become anything but Agile. Mentor’s struggle to build a product line that marketed Concurrent Engineering became the very antithesis of the concept it touted. I eventually left Mentor Graphics in frustration and drifted away from process automation.

Now, two decades later, a remarkable thing has happened. All those concepts we struggled with way back when have finally come of age. It has become the way we naturally work — and it is something called Agile. Our development processes are increasingly integrated with both our customers and our partners around the world. Time zones, while still a nuisance, have become far less of a barrier than they once were. Our rapid development cycles are quite transparent, with our customers and partners having almost complete visibility into our repositories and databases. Tools and services like GitHub, AWS, Slack, JIRA, and Trello allow us to coordinate the development of products shared among our customers with bespoke layers built on top by ourselves and our partners.

ConcurrentEngineering.png

It’s always fashionable for political rhetoric to bash the inefficiencies of big government, but down in the trenches where real work gets done, it’s quite amazing to see how modern Agile techniques are being adopted by governments and the benefits that are being reaped.

As we at Xcential strive to become great, it’s important for us to look to the future with open eyes such that we can understand how to excel. In the new world, as walls have crumbled, global integration of people and processes has become the norm. To stay relevant, we must continue to adapt to these rapidly evolving methodologies.

Our vision is to change the way the world is governed — through the application of modern automation technology. One thing is very clear, we’re going to do it together with our customers and our partners all around the world. This is how you work in the modern era.

In my next blog post, I will delve a little more into how we have been applying Agile/Scrum and the lessons we have learned.

Changing the way the world is governed. Together.

Connected Information

As a proponent of XML for legislation, I’m often asked why an XML approach is better than a more traditional approach using a word processor. The answer is simple – it’s all about connected information.

The digital end point in a legislative system can no longer be publication of PDFs. PDFs are nothing but a kludgy way to digitize paper — a way to preserve the old traditions and avoid the future. Try reading a PDF on a cell phone and you see the problem. Try clicking on a citation in a PDF and you see the problem. Try and scrape the information out of a PDF to make it computer readable and you see the problem. The only useful function that PDFs serve is as a bridge to the past.

The future is all about connected information — breaking the physical bounds of what we think of as a document and allowing the nuggets of information found within them to be connected, interrelated, and acted upon. This is the real reason why the future lies with XML and its related technologies.

In my blog last week I provided a brief glimpse into how our future amending tools will work. I explored how legislation could be managed similar to how software is managed with GitHub. This is an example of how useful connected information becomes. Rather than producing bills and amendments as paper documents, the information is stored in a way that it can be efficiently and accurately automated — and made available to the public in a computer readable way.

At Xcential, we’re building our new web-based authoring system — LegisPro. If you take a close look at it, you’ll see that it has two main components. Of course, there is a robust XML editor. However, at the system’s very heart is a linking system — something we call a resolver. It’s this resolver where the true power lies. It’s an HTTP-based system for managing all the linkages that exist in the system. It connects XML repositories, external data sources, and even SQL databases together to form a seamless universe of connected information.

We’re working hard to transform how legislation, and indeed, all government information is viewed. It’s not just about connecting laws and legislation together through simple web links. We talking about providing rich connections between all government information — tying financial data to laws and legislation, connecting regulatory information together, associating people, places, and things to government data, and on and on. We have barely started to scratch the surface, but it’s clear that the future lies with connected information.

While we today position LegisPro as a bill authoring system — it’s much more than that. It’s some of the fundamental underpinnings necessary for a system to transform government documents of today into the connected information of tomorrow.

Connected Information

Data Transparency Breakfast, LEX US Summer School 2015, First International Akoma Ntoso Conference, and LegisPro Edit reveal.

Last week was a very good week for my company, Xcential.

We started the week hosting a breakfast put on by the Data Transparency Coalition at the Booz Allen Hamilton facility in Washington D.C.. The topic was Transforming Law and Regulation. Unfortunately, an issue at home kept me away but I was able to make a brief pre-recorded presentation and my moderating role was played by Mark Stodder, our company President. Thank you, Mark!

Next up was the first U.S. edition of the LEX Summer School from Italy. I have attended this summer school every year since 2010 in Italy and it’s great to see the same opportunity for an open dialog amongst the legal informatics community finally come to the U.S. Monica Palmirani (@MonicaPalmirani), Fabio Vitali, and Luca Cervone (@lucacervone) put on the event from the University of Bologna. The teachers also included Jim Mangiafico  (@mangiafico) (the LoC data challenge winner), Veronique Parisse (@VeroParisse) from the European Union, Andrew Weber (@atweber) from the Library of Congress, Kirsten Gullickson (@GullicksonK) from the Office of the Clerk at the U.S. House of Representatives, and myself from Xcential. I flew in for an abbreviated visit covering the last two days of the Summer School where I covered how the U.S. Code is modeled in Akoma Ntoso and gave the students an opportunity to try out our new bill drafting editor — LegisProedit.

After the Summer School concluded, it was followed by the first International Akoma Ntoso Conference on Saturday, where I spoke about the architecture of our new editor as well as how the USLM schema is a derivative of the Akoma Ntoso schema. We had good turnout, from around the world, and a number of interesting speakers.

This week is NCSL in Seattle where we will be discussing our new editor with potential customers and partners. Mark Stodder from Xcential will be in attendance.

In a month, I’ll be in Ravenna once more for the European LEX Summer School — where I’ll be able to show even more progress towards the goal of a full product line of Akoma Ntoso tools. It’s interesting times for me.

The editor is coming along nicely and we’re beginning to firm up our QuickStarter beta plans. I’ve already received a number of requests and will be getting in touch with everyone as soon as we’re ready to roll out the program. If you would like to participate as a beta tester — or if you would just like more information, please contact us at info@xcential.com.

I’m really excited about how far we’ve come. Akoma Ntoso is on the verge of being certified as an official OASIS standard, our Akoma Ntoso products are coming into place, and interest around the world is growing. I can’t wait to see where we will be this time next year.

Data Transparency Breakfast, LEX US Summer School 2015, First International Akoma Ntoso Conference, and LegisPro Edit reveal.

Achieving Five Star Open Data

A couple weeks ago, I was in Ravenna, Italy at the LEX Summer School and follow-on Developer’s Workshop. There, the topic of a semantic web came up a lot. Despite cooling in the popular press in recent years, I’m still a big believer in the idea. The problem with the semantic web is that few people actually get it. At this point, it’s such an abstract idea that people invariably jump to the closest analog available today and mistake it for that.

Tim Berners-Lee (@timberners_lee), the inventor of the web and a big proponent of linked data, has suggested a five star deployment scheme for achieving open data — and what ultimately will be a semantic web. His chart can be thought of as a roadmap for how to get there.

Take a look at today’s Data.gov website. Everybody knows the problem with it — it’s a pretty wrapper around a dumping ground of open data. There are thousands and thousands of data sets available on a wide range of interesting topics. But, there is no unifying data model behind all these data dumps. Sometimes you’re directed to another pretty website that, while well-intentioned, hides the real information behind the decorations. Sometimes you can get a simple text file. If you’re lucky, you might even find the information in some more structured format such as a spreadsheet or XML file. Without any unifying model and with much of the data intended as downloads rather than as an information service, this is really still Tim’s first star of open data — even though some of the data is provided as spreadsheets or open data formats. It’s a good start, but there’s an awful long way to go.

So let’s imagine that a better solution is desired, providing information services, but keeping it all modest by using off-the-shelf technology that everyone is familiar with. Imagine that someone with the authority to do so, takes the initiative to mandate that henceforth, all government data will be produced as Excel spreadsheets. Every memo, report, regulation, piece of legislation, form that citizens fill out, and even the U.S. Code will be kept in Excel spreadsheets. Yes, you need to suspend disbelief to imagine this — the complications that would result would be incredibly tough to solve. But, imagine that all those hurdles were magically overcome.

What would it mean if all government information was stored as spreadsheets? What would be possible if all that information was available throughout the government in predictable and permanent locations? Let’s call the system that would result the Government Information Storehouse – a giant information repository for information regularized as Excel spreadsheets. (BTW, this would be the future of government publishing once paper and PDFs have become relics of the past.)

How would this information look? Think about a piece of legislation, for instance. Each section of the bill might be modeled as a single row in the spreadsheet. Every provision in that section would be it’s own spreadsheet cell (ignoring hierarchical considerations, etc.) Citations would turn into cell references or cell range references. Amending formulas, such as “Section 1234 of Title 10 is amended by…” could be expressed as a literal formula — a spreadsheet formula. It would refer to the specific cell in the appropriate U.S. Code Title and contain programmatic instructions for how to perform the amendment. In short, lots of once complex operations could be automated very efficiently and very precisely. Having the power to turn all government information into a giant spreadsheet has a certain appeal — even if it requires quite a stretch of the imagination.

Now imagine what it would mean if selected parts of this information were available to the public as these spreadsheets – in a regularized and permanent way — say Data.gov 2.0 or perhaps, more accurately, as Info.gov. Think of all the spreadsheet applications that would be built to tease out knowledge from the information that the government is providing through their information portal. Having the ability to programmatically monitor the government without having to resort to complex measures to extract the information would truly enable transparency.

At this point, while the linkages and information services give us some of the attributes of Tim’s four and five star open data solutions, but our focus on spreadsheet technology has left us with a less than desirable two star system. Besides, we all know that having the government publish everything as Excel spreadsheets is absurd. Not everything fits conveniently into a spreadsheet table to say nothing of the scalability problems that would result. I wouldn’t even want to try putting Title 42 of the U.S. Code into an Excel spreadsheet. So how do we really go about achieving this sort of open data and the efficiencies it enables — both inside and outside of government?

In order to realize true four and five star solutions, we need to quickly move on to fulfilling all the parts of Tim’s five star chart. In his chart, a three star solution replaces Excel spreadsheets with an open data format such as a comma separated file. I don’t actually care for this ordering because it sacrifices much to achieve the goal of having neutral file formats — so lets move on to full four and five star solutions. To get there, we need to become proficient in the open standards that exist and we must strive to create ones where they’re missing. That’s why we work so hard on the OASIS efforts to develop Akoma Ntoso and citations into standards for legal documents. And when we start producing real information services, we must ensure that the linkages in the information (those links and formulas I wrote about earlier), exist to the best extent possible. It shouldn’t be up to the consumer to figure out how a provision in a bill relates to a line item in some budget somewhere else — that linkage should be established from the get-go.

We’re working on a number of core pieces of technology to enable this vision and get to full five star open data. We integrating XML repositories and SQL databases into our architectures to give us the information storehouse I mentioned earlier. We’re building resolver technology that allows us to create and manage permanent linkages. These linkages can be as simple as citation references or as complex as instructions to extract from or make modifications to other information sources. Think of our resolver technology as akin to the engine in Excel than handles cell or range references, arithmetic formulas, and database lookups. And finally, we’re building editors that will resemble word processors in usage, but will allow complex sets of information to be authored and later modified. These editors will have many of the sophisticated capabilities such as track changes that you might see in a modern word processor, but underneath you will find a complex structured model rather than the ad hoc data structures of a word processor.

Building truly open data is going to be a challenging but exciting journey. The solutions that are in place today are a very primitive first step. Many new standards and technologies still need to be developed. But, we’re well on our way.

Achieving Five Star Open Data

Look how far legal informatics has come – in just a few years

Back in 2001 when I started in the legal informatics field, it seemed we were all alone. Certainly, we weren’t – there were many similar efforts underway around the country and around the world. But, we felt alone. All the efforts were working in isolation – making similar decisions and learning similar lessons. This was the state of the field, for the most part, for the next 6 to 8 years. Lots of isolated progress, but few opportunities to share what we had learned and build on what others knew.

In 2010, I visited the LEX Summer School, put on by the University of Bologna in Ravenna, Italy. What became apparent to me was just how isolated the various pockets of innovation had become around the world. There was lots of progress, especially in Europe, but legal informatics, as an industry, was still in a fledgling state – it was more of an academic field than a commercial industry. In fact, outside of academic circles, the term legal informatics was all but meaningless. When I wrote my first blog in 2011, I looked forward to the day when the might be a true Legal Informatics industry.

Now, just a few years later, it’s stunning how far we have come. Certainly, we still have far to travel, but now we’re all working together towards common goals rather than working alone towards the same, but isolated, goals. I thought I would spend this week’s blog to review just how far we have come.

  1. Working together
    We have come together in a number of important dimensions:

    • First of all, consider geography. This is a small field, but around the world we’re now all very much connected together. We routinely meet, share ideas, share lessons, and share expertise – no matter which continent we work and reside on.
    • Secondly, consider our viewpoints. There was once a real tension between the transparency camp, government, external industry, and academia. If you participated at the 2014 Legislative Data and Transparency conference a few weeks ago in Washington D.C., one of the striking things was how little tension remains between these various viewpoints. We’re all now working towards a common set of goals.
  2. Technology
    • I remember when we used to question whether XML was the right technology. The alternatives were to use Open Office or Microsoft Office, basing the legislative framework around office productivity tools. Others even proposed using relational database technology along with a forms-based interface. Those ideas have now generally faded away – XML is the clear choice. And faking XML by relying on the fact that the Open Document Format (ODF) or Office Open XML formats are based on XML, just isn’t credible anymore. XML means more than just relying on an internal file format that your tools happen to use – it means designing information models specifically to solve the challenges of legal informatics.
    • I remember when we used to debate how references should be managed. Should we use file paths? Should we use URNs? Should we use URLs? Today the answer is clear – we’re all settling around logical URLs with resolvers, sometimes federated, to stitch together a web of interconnected references. Along with this decision has been the basic assumption that web-based solutions are the future – desktop applications no longer have a place in a modern solution.
    • Consider database technology. We used to have three choices – use the file system, try and adapt mature but ill-fitting relational databases, or take a risk with emerging XML databases. Clearly XML databases were the future – but was it too early? Not anymore! XML database technology, along with XQuery, have come a long way in the past few years
  3. Standards
    Standards are what will create an industry. Without them, there is little opportunity to re-use – a necessary part of allowing cost-effective products to be built. After a few false starts over the years, we’re now on the cusp of having industry standards to work with . The OASIS LegalDocML (Akoma Ntoso) and LegalCiteM technical committees are hard at work on developing those standards. Certainly, it will be a number of years before we will see the all the benefits of these standards, but as they come to fruition, a real industry can emerge.
  4. Driving Forces
    Ten years ago, the motivation for using XML was to replace outdated drafting systems, often cobbled together on obsolete mainframes, that sorely needed replacement. The needs were all internal. Now, that has all changed. The end result is no longer a paper document which can be ordered from the “Bill Room” found in the basement of the Capitol building. It’s often not even a PDF rendition of that document. The new end result is information which needs to be shared in a timely and open way in order to achieve the modern transparency objectives, like the DATA Act, that have been mandated. This change in expectations is going to revolutionize how the public works with their representatives to ensure fair and open government.
  5. In the past dozen years, things sure have changed. Credit must be given to the Monica Palmirani (@MonicaPalmirani) and Fabio Vitali at the University of Bologna – an awful lot of the progress pivots around their initiatives. However, we’ve all played a part in creating an open, creative, and cooperative environment for Legal Informatics to thrive as more than just an academic curiosity – as a true industry with many participants working collaboratively and competitively to innovate and solve the challenges ahead.

Look how far legal informatics has come – in just a few years

Imagining Government Data in the 21st Century

After the 2014 Legislative Data and Transparency conference, I came away both encouraged and a little worried. I’m encouraged by the vast amount of progress we have seen in the past year, but at the same time a little concerned by how disjointed some of the initiatives seem to be. I would rather see new mandates forcing existing systems to be rethought rather than causing additional systems to be created – which can get very costly over time. But, it’s all still the Wild Wild West of computing.

What I want to do with my blog this week is try and define what I believe transparency is all about:

  1. The data must be available. First and foremost, the most important thing is that the data be provided at the very least – somehow, anyhow.
  2. The data must be provided in such a way that it is accessible and understandable by the widest possible audience. This means providing data formats that can be read by ubiquitous tools and, ensuring the coding necessary to support all types of readers including those with disabilities.
  3. The data must be provided in such a way that it should be easy for a computer to digest and analyze. This means using data formats that are easily parsed by a computer (not PDF, please!!!) and using data models that are comprehensible to widest possible audience of data analysts. Data formats that are difficult to parse or complex to understand should be discouraged. A transparent data format should not limit the visibility of the data to only those with very specialized tools or expertise.
  4. The data provided must be useful. This means that the most important characteristics of the data must be described in ways that allow it to be interpreted by a computer without too much work. For instance, important entities described by the data should be marked in ways that are easily found and characterized – preferably using broadly accepted open standards.
  5. The data must be easy to find. This means that the location at which data resides should be predictable, understandable, permanent, and reliable. It should reflect the nature of the data rather than the implementation of the website serving the data. URLs should be designed rather than simply fallout from the implementation.
  6. The data should be as raw as possible – but still comprehensible. This means that the data should have undergone as little processing as possible. The more that data is transformed, interpreted, or rearranged, the less like the original data it becomes. Processing data invariably damages its integrity – whether intentional or unintentional. There will always be some degree of healthy mistrust in data that has been over-processed.
  7. The data should be interactive. This means that it should be possible to search the data at its source – through both simple text search and more sophisticated data queries. It also means that whenever data is published, there should be an opportunity for the consumer to respond back – be it simple feedback, a formal request for change, or some other type of two way interaction.

How can this all be achieved for legislative data? This is the problem we are working to solve. We’re taking a holistic approach by designing data models that are both easy to understand and can be applied throughout the data life cycle. We’re striving to limit data transformations by designing our data models to present data in ways that are both understandable to humans and computers alike. We are defining URL schemes that are well thought out and could last for as long as URLs are how we find data in the digital era. We’re defining database solutions that allow data to not only be downloaded, but also searched and queried in place. We’re building tools that will allow the data to not only be created but also interacted with later. And finally, we’re working with standards bodies such as the LegalDocML and LegalCiteM technical committees at OASIS to ensure well thought out world wide standards such as Akoma Ntoso.

Take a look at Title 1 of the U.S. Code. If you’re using a reasonably modern web browser, you will notice that this data is very readable and understandable – its meant to be read by a human. Right click with the mouse and view the source. This is the USLM format that was released a year ago. If you’re familiar with the structure of the U.S. Code and you’re reasonably XML savvy, you should feel at ease with the data format. It’s meant to be understandable to both humans and to computer programs trying to analyze it. The objective here is to provide a single simple data model that is used from initial drafting all the way through publishing and beyond. Rather than transforming the XML into PDF and HTML forms, the XML format can be rendered into a readable form using Cascading Style Sheets (CSS). Modern XML repositories such as eXist allow documents such as this to be queried as easily as you would query a table in a relational database – using a query language called XQuery.

This is what we are doing – within the umbrella of legislative data. It’s a start, but ultimately there is a need for a broader solution. My hope is that government agencies will be able to come together under a common vision for our information should be created, published, and disseminated – in order to fulfill their evolving transparency mandates efficiently. As government agencies replace old systems with new systems, they should design around a common open framework for transparent data rather building new systems in the exact same footprint as the old systems that they demolish. The digital era and transparency mandates that have come with it demand new thinking far different than the thinking of the paper era which is now drawing to a close. If this can be achieved, then true data transparency can be achieved.

Imagining Government Data in the 21st Century

What is Transparency?

I’ve been thinking a lot about transparency lately. The disappearance of Malaysian Airline Flight 370 (MH370) provided an interesting case to look at – and some important lessons. Releasing data which requires great expertise to decipher isn’t transparency.

My boss, when I worked on process research at the Boeing Company many years ago, used to drill into me the difference between information and data. To him, data was raw – and meaningless unless you knew how to interpret it. Information, on the other hand, had the meaning applied so you could understand it – information, to him, was meaningful.

Let’s recall some of the details of the MH370 incident. The plane disappeared without a trace – for reasons that remain a mystery. The only useful information, after radar contact was lost, was a series of pings received by Inmarsat’s satellite. Using some very clever mathematics involving Doppler shifts, Inmarsat was able to use that data to plot a course for the lost plane. That course was revealed to the world and the search progressed. However, when that course failed to turn up the missing plane, there were increasingly angry calls for more transparency from Inmarsat – to reveal the raw data. Inmarsat’s response was that they had released the information, in the form of a plotted course, to the public and to the appropriate authorities, However, they chose to withhold the underlying data, claiming it wouldn’t be useful. The demands persisted, primarily from the press and the victims’ families. Eventually Inmarsat gave in and agreed to release the data. With great excitement, the press reported this as “Breaking News”. Then, a bewildered look seemed to come across everyone and the story quickly faded away. Inmarsat had provided the transparency in the form it was demanded, releasing the raw data along with a brief overview and the relevant data highlighted, but it still wasn’t particularly useful. We’re still waiting to hear if anyone will ever be able to find any new insights into whatever happened to MH370 using this data. Most likely though, that story has run its course – you simply need Inmarsat’s expertise to understand the data.

There is an important lesson to be learned – for better or worse. Raw data can be released, but without the tools and expertise necessary to intepret it, it’s meaningless. Is that transparency? Alternatively, raw data can be interpreted into meaningful information, but that opens up questions as to the honesty and accuracy of the interpretation. Is that transparency? It’s very easy to hide the facts in plain sight – by delivering it in a convoluted and indecipherable data format or by selectively interpreting it to tell an incomplete story. How do we manage transparency to achieve the objective of providing the public with an open, honest, and useful view of government activities?

Next week, I want to describe my vision for how government information should be made public. I want to tackle the conflicting needs of providing information that is both unfiltered yet comprehensible. While I don’t have the answers, I do want to start the process of clarifying what better transparency is really going to achieve.

What is Transparency?