What is a Semantic Web?

Tim Berners-Lee, inventor of the World Wide Web, defines a semantic web quite simply as “a web of data that can be processed directly and indirectly by machines“. In my experience, that simple definition quickly becomes confusing as people add their own wants and desires to the definition. There are technologies like RDF, OWL, and SPARQL that are considered key components of semantic web technology. It seems though that these technologies add so much confusion through abstraction that non-academic people quickly steer as far away from the notion of a semantic web as they can get.

So let’s stick to the simple definition from Tim Berners-Lee. We will simply distinguish the semantic web from our existing web by saying that a semantic web is designed to be meaningful to machines as well as to people. So what does it mean for a web of information to be meaningful to machines? A simple answer is to say that there are two primary things that a machine needs to understand about a web. First of all, what the pages are all about, and secondly what the relationships that connect the pages together are all about.

It turns out that making a machine capable of understanding even the most rudimentary aspects of pages and the links that connect them is quite challenging. Generally, you have to resort to fragile custom-built parsers or sophisticated algorithms that analyze the document pages and the references between them. Going from pages with lots of words connected somehow to other pages to a meaningful information model is quite a chore.

What we need to improve the situation are agreed upon information formats and referencing schemes in a semantic web that can more readily be interpreted by machines. Defining what those formats and schemes are is where the subject of semantic webs starts getting thorny. Before trying to tackle all of this, let’s first consider how this all applies to us.

What could benefit more from a semantic web than legal publishing? Understanding the law is a very complex subject which requires extensive analysis and know-how. This problem could be simplified substantially using a semantic web. Legal documents are an ideal fit to the notion of a semantic web. First of all, the documents are quite structured. Even though each jurisdiction might have their own presentation styles and procedural traditions, the underlying models are all quite similar around the world. Secondly, legal documents are rich with relationships or citations to other documents. Understanding these relationships and what they mean is quite important to understanding the meaning of the documents.

So let’s consider the current state of legal publishing – and from my perspective – legislative publishing. The good news is that the information is almost universally available online in a free and easily accessed format. We are, after all, subject to the law and providing access to that law is the duty of the people that make the laws. However, providing readable access to the documents is often the only objective and any which way of accomplishing that objective is simply the requirement. Documents are often published as PDFs which are nice to read, but really difficult for computers to understand. There is no uniformity between jurisdictions, minimal analysis capability (typically word search), and links connecting references and citations between documents are most often missing. This is a less than ideal situation.

We live in an era where our legal institutions are expected to provide more transparency into their functions. At the same time, we expect more from computers than merely allowing us to read documents online. It is becoming more and more important to have machines interpret and analyze the information within documents – and without error. Today, if you want to provide useful access to legal information by providing value-added analysis capabilities, you must first tackle the task of interpreting all the variations in which laws are published online. This is a monumental task which then subjects you to a barrage of changes as the manner in which the documents are released to the public evolves.

So what if there was a uniform semantic web for legal documents? What standards would be required? What services would be required? Would we need to have uniform standards or could existing fragmented standards be accommodated? Would it all need to come from a single provider, from a group of cooperating providers, or would there be a looser way to federate all the documents being provided by all the sources of law around the world? Should the legal entities that are sources of law assume responsibility for publishing legal documents or should this be left to third party providers? In my coming posts I want to explore these questions.

What is a Semantic Web?

4 thoughts on “What is a Semantic Web?

  1. It boggles the mind to think of the possibilities here, Grant. You’ve laid out what amounts to a fascinating schema for the coming revolution in law-making transparency. Can’t wait to read about what comes next!

    David T.

  2. On the legislative publishing side, I think the biggest obstacle is that most of the important relationships are temporal ones. It is rather easy to say law A relates to law B because A cites B, etc. but it is a whole different can of worms to get into how changes are made over time with the effectiveness of laws being triggered at different times in overlapping ways. For example, a 2010 statutory change that sets an 2011 effective date of a change in a codified law that in turn only applies to appeals filed for the 2008 tax year. If I want to know what the law was in 2008, will a machine understand that this law should be presented to me, even if it became effective years later? There is tons of overlap and conflicts that must be allowed for. Having these relationships expressed in a machine-readable way would be extremely useful but the complicated nature of these problems and the manual effort needed to record the meaning of these relationships accurately makes me somewhat pessimistic.

    In the court opinion realm, the temporal issues are not as problematic but I see the main challenge is having automatic systems that can accurately distinguish between court decisions that apply a concept, invalidate a concept, use a concept as an illustration, or add further support to a concept. The differences in the text can be very subtle because they would involve many of the same terms and references but semantic markup is of marginal value, if all it can do is say these documents all relate to concept A. I am more optimistic is this area, however.

    1. grantcv1 says:

      Hey Edward, thanks for the comment. I’m glad you brought up the temporal aspects of legislation. That is the very aspect that interested me in applying XML to legislation 10 years ago when I first entered this field. Back then I was influenced by Timothy Arnold-Moore and the EnAct system developed for Tasmania. I read his thesis, developed a small (and rather naive) prototype, which we (my two original business partners) sold as a concept to our first customer. I don’t think of the temporal aspects of legislation as the biggest obstacles – I think of them as the biggest opportunity to show how XML is really useful. I gave a talk in Austria earlier this year on the subject of a “legislative time machine” which I plan to cover in a coming post in the weeks to come. Of course, there are difficulties. In California, the temporal information gets embedded within the text and is not formalized into well structured language that can easily be deciphered by a machine. On top of that, some temporal aspects, as you mention and we call operational conditions, are based on situations or events which cannot be encoded into the system in any convenient way. Nonetheless, understanding the limitations, it is still quite possible to manage the temporal aspects of legislation using XML.

  3. legalinformatics says:

    Thanks for this interesting post. In terms of practices, I like how John Sheridan & Jeni Tennison apply Semantic Web technology in Legislation.gov.uk http://blog.law.cornell.edu/voxpop/2010/08/15/legislationgovuk/ ; how Rinke Hoekstra applies it in MetaLex Document Server http://blog.law.cornell.edu/voxpop/2011/10/25/the-metalex-document-server/ ; and how João Lima uses Semantic Web technology in Brazil’s LexML system: http://blog.law.cornell.edu/voxpop/2010/10/15/lexml-brazil-project/ . It might also be useful to look at public administration systems that take legislative Semantic Web data as an input, such as Leibniz Center’s AGILE system (described here http://bit.ly/iDRIMi and here http://bit.ly/ch45Sc ) and Monica Palmirani’s ICT4Law project in Piemonte http://www.ict4law.org/ . Monica has a nice presentation on how to integrate Semantic Web technology with rule modeling language and XML in legislative systems: http://bit.ly/bCWWxy . (Note that Monica is now leading the development of LegalRuleML: http://bit.ly/uDwWOf .) In terms of standards, including legal ontologies that might be used in conjunction with a legislative system, I’d suggest looking at the posts cited above, as well as Núria Casellas’s VoxPopuLII post: http://bit.ly/eFDBEk

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s