And now for something completely different… Chinese!

Last week we saw how Akoma Ntoso can be applied to a very large consolidated Code – the United States Code. This week we take the challenge in a different direction – applying Akoma Ntoso to a bilingual implementation involving a totally different writing system. Our test document this week is the Hong Kong Basic Law. This document serves as the constitutional document of the Hong Kong Special Administrative Region of the People’s Republic of China. It was adopted on the 4 April 1990 and went into effect on July 1, 1997 when the United Kingdom handed over the region to the People’s Republic of China.

The Hong Kong Basic Law is available in English, Traditional Chinese, and Simplified Chinese. For our exercise, we are demonstrating the document in English and in Traditional Chinese. (Thank you to Patrick for doing the conversion for me.) Fortunately, using modern technologies, supporting Chinese characters alongside Latin characters is quite straightforward. Unicode provides a Hong Kong supplementary character set to handle characters unique to Hong Kong. The biggest challenge is ensuring that all the unicode declarations throughout the various XML and HTML files that the information must flow through are set correctly. With the number of accents we find in names in California as well as the rigorous nature of California’s publishing rules, getting Unicode right is something we have grown accustomed to.

While I hadn’t expected there to be any problems with Unicode, I was pleasently surprised to find that the fonts used in Legix simply worked with the Traditional Chinese characters without issue as well. (Well at least as far as I can tell without the ability to actually read Chinese)

The only issue we encountered was Internet Explorer’s support for CSS3. Apparently, IE still does not recognize “list-style-type” with a value of “cjk-ideographic”. So instead of getting Traditional Chinese numerals, we get Arabic numerals. The other browsers handled this much better.

So what other considerations were there? A big consideration was the referencing mechanism. To me, modeling how you refer to something in an information model can be more important than the information model itself. The referencing mechanism defines how the information is organized and allows you to address a specific piece of information in a very precise and accurate way. Done right, any piece of information can be accessed very quickly and easily. Done wrong and you get chaos.

Our referencing mechanism relies on the Functional Requirements for Bibliographical Records (FRBR). This mechanism is used by both SLIM and Akomantoso. Another interesting FRBR proposal for legislation can be found here.

FRBR defines an information model based on a hierarchical scheme of Work-Expression-Manifestion-Item. Think of the work as the overall document being addressed, the expression being the version desired, the manifestation the format you want to information presented in, and finally the item as a means for addressing a specific instance of the information. Typically we’re only concerend with Work-Expression-Manifestation.

For a bilingual or multilingual system, the “expression” part of the reference is used to specify which language you wish the document to be returned in. If you check out the references at Legix.info you will see that the two references the the Hong Kong Basic Law are:

The expressions are called out as “doc;en-uk” for the English version and “doc;zh-yue” for the Chinese version. Relatively straightforward. The manifestations are not shown and the result is the default manifestation of HTML.

Check the samples out and let me know what you think.

And now for something completely different… Chinese!