Category Archives: Multiple language vocabularies

Readable URIs

Over the years we’ve been engaged in a number of discussions in which the ‘readability’ of URIs was raised, either as an issue with non-readable URIs or as a requirement in new URI schemes.

At the Registry, we understand and are sensitive to the desire for human readability in URIs. However embedding a language-specific label in the URI identifying concepts in multilingual vocabularies has the side effect of locking the concept into the language of the creator. It also unnecessarily formalizes the particular spelling-variant of the language of the creator, ‘colour’ vs. ‘color’ for instance.

When creating the URIs for the RDA vocabularies we acceded to requests to make the URIs ‘readable’ specifically to make it easier for programmers to create software that could guess the URI from the prefLabel We have come to regret that decision as the vocabularies gained prefLabels in multiple languages. And it creates issues for people extending the vocabulary and adding concepts that have no prefLabel in the chosen language of the vocabulary creator.

That said, the case is much less clear for URIs identifying ‘things’, such as Classes and Properties, in RDFS and OWL, since these are less likely to have a need to be semantically ‘understood’ independent of their label and are less likely to be labeled and defined in multiple languages. In that case the semantics of the Class or Property is often best communicated by a language-specific, readable URI.

In the end I personally lean heavily toward non-readable identifiers because of the flexibility in altering the label in the future, especially in the fairly common case of someone wishing to change the label even though the semantics have not changed. This becomes much more problematic when the label applied to the thing at a particular point in time has been locked into the URI.

I’m not trying to start a non-readable URIs campaign, just pointing out that the Registry, in particular, is designed to support vocabulary development by groups of people, whose collective agreement on labeling things may change over the course of the development cycle, who are creating and maintaining multilingual vocabularies. Our non-literal-label URI default is designed to support the understanding we’ve developed of that environment over time.

The German National Library: translating and registering RDA elements and vocabularies

A prerequisite for the registering of our terms in the NSDL Registry and one of the greatest challenges for the German National Library at the moment is the translation of the RDA elements and vocabularies.  Since bibliographic description is executed with a highly specialised vocabulary, we are finding that the process of pinpointing the appropriate terms is interesting but also very involved. Although the existing German rules for bibliographic description (RAK) and the authority files for subject headings (Schlagwortnormdatei, or SWD) have plenty of vocabulary to offer as equivalents to Anglo-American cataloguing terminology, RDA does include concepts relatively new to bibliographic description.

Before resorting to “inventing” words, always a last resort, we launch comprehensive vocabulary mining efforts, in the process of which, beyond checking already existing translations (FRBR, MARC 21), we consult the expertise such institutions as art libraries and film institutes to get the most up-to-date descriptive terms available in the German language. If we deem a word previously used in a translation suboptimal, we may deviate from its use and in particular cases forgo the advantages of standardisation in the interest of our primary criteria: consistency, currency, usability, and precision. A quick and general Google search can also be helpful to learn how terms are being (in)formally circulated. In the case that we should find it necessary to create a new term in German, as we are experiencing with such an example as the type unmediated, we have to weigh up what sort of etymological root we would like to lean towards, Latin or Germanic.  If we translate it with unmediatisiert, it can ease communication around cataloguing between nations because of its morphological similarity to many European languages.  However, leaning on Germanic roots may sometimes be necessary in the interest of standardisation and aligning with existing descriptive language or with the strengths and realities of the German language. In that case, we may be better off choosing nicht mediatisiert or ohne Hilfsmittel zu benutzende Medien, which seems awkward but conforms to types of uses already in existence in the subject headings. The option of the “new-proposed” status in the Registry for the concepts therefore suits our needs perfectly, since for the reasons just mentioned and outlined in Diane’s blog entry about multiple languages and RDA, none of the translations we have entered are as of yet official.

Once our small team of librarians from the Office for Library Standards has followed these processes and developed a pool of equivalent German terms which we deem worthy of proposing initially for the Registry and subsequently for our official translation of RDA, we make them available to groups of colleagues specialised in bibliographic description or subject headings at the German National Library for comment in a Wiki and working meetings. Our experience with translation has shown us that the translations of descriptive bibliographic elements and vocabulary into German must be handled by librarians (professional translators can potentially pick up from there) and peer-reviewed through the above-mentioned process to ensure accuracy and acceptance in the library community.

Beyond motivating us to begin our RDA translations early, our participation in the Registry really has also given us an opportunity to dabble in the semantic web through the process of assigning URIs to our German translations of RDA element and value vocabulary.  As a test run, it therefore allows us to toy with the idea of linked data by setting descriptive bibliographic vocabulary up with its prerequisite domain. The lessons learned and questions raised through this experience put us in a better position for strategic planning regarding the nature of the presentation and sharing of bibliographic data in the future.

What has particularly attracted us about the Registry and its connection with the RDA tool is that, provided that we do decide to provide linked bibliographic data in the future as an institution, the Registry makes it possible to do so in our national language. This is a condition for its wide-spread usability and acceptance in the German-speaking library and internet community and therefore of primary importance to us, provided of course that the Committee for Library Standards takes the decision to introduce RDA as the official rules for description and access in Germany and Austria.

Multiple languages and RDA

We’ve been thinking for some time about how to implement multi-lingual (and multi-script) vocabularies in the Registry. Some Registry users have been experimenting with language and script capability for some time (see Daniel Lovins’ Sandbox Hebrew GMD’s). But it was really when we started working with the RDA vocabularies that we got serious about multi-linguality.

At DC-2008 in Berlin, we started talking to the librarians at the Deutsche Nationalbibliothek about adding German language versions of RDA vocabularies into the Registry. I knew how eager the German libraries were to participate more actively in the RDA development, and had been talking to German librarians for some time about their frustrations with the notion that they had to wait until “later” to become involved. Christine Frodl and Veronika Leibrecht have been our primary contacts at the Deutsche Nationalbibliothek on this work, and they’ve been a real pleasure to work with.

We decided collectively to start with some of the value vocabularies, in particular Content Type, Media Type and Carrier Type. We enabled Veronika to become a maintainer on those vocabularies, and she worked within her library and associated German-speaking libraries to translate and develop labels and definitions in German for the existing terms. As she describes the challenge:

“Because RDA was not developed simultaneously in various languages (that would be an even more daunting task!), we are looking for ways to adapt German to English language/cataloguing concepts and must get agreement on the terms in our community. The search for terminology to translate RDA will therefore be an ongoing process in the short term for us. … Now I am looking forward to seeing French and Spanish come along 😉 and would be happy to share a few resources I found which could help people in their search for terminology.”

Those of you who know German (or have an interest in multilingual vocabularies in general, might want to take a look at some of the work done already:

Content Type Vocabulary (you can see that for now, all concepts display in English)

Detail for concept of “computer program”: (the German translation for the label appears in the list of properties of the concept)

Veronika points out that the process behind this effort is a complex one, but solidly based on existing relationships in the German-speaking world:

“[B]ecause of the federal system in Germany, the DNB works very closely with all library consortia in the country and Austria and decisions about cataloguing rules and data formats are reached through consensus with them. The reason for this it that the consortia include and represent libraries which existed long before the German state as such (or the DNB, for that matter) and therefore have traditionally and independently held the written cultural heritage of their individual counties, duchies, kingdoms etc.”

We have had some additional interest by other language communities in this effort, and Jon has added some detail on our wiki to describe how we plan to improve the software to make both building and maintenance of other language versions simpler, and easier to configure at the output end. Do note that this isn’t implemented yet, but is instead a blueprint for moving ahead in this critical area.