Category Archives: Registry Development

Multiple languages and RDA

We’ve been thinking for some time about how to implement multi-lingual (and multi-script) vocabularies in the Registry. Some Registry users have been experimenting with language and script capability for some time (see Daniel Lovins’ Sandbox Hebrew GMD’s). But it was really when we started working with the RDA vocabularies that we got serious about multi-linguality.

At DC-2008 in Berlin, we started talking to the librarians at the Deutsche Nationalbibliothek about adding German language versions of RDA vocabularies into the Registry. I knew how eager the German libraries were to participate more actively in the RDA development, and had been talking to German librarians for some time about their frustrations with the notion that they had to wait until “later” to become involved. Christine Frodl and Veronika Leibrecht have been our primary contacts at the Deutsche Nationalbibliothek on this work, and they’ve been a real pleasure to work with.

We decided collectively to start with some of the value vocabularies, in particular Content Type, Media Type and Carrier Type. We enabled Veronika to become a maintainer on those vocabularies, and she worked within her library and associated German-speaking libraries to translate and develop labels and definitions in German for the existing terms. As she describes the challenge:

“Because RDA was not developed simultaneously in various languages (that would be an even more daunting task!), we are looking for ways to adapt German to English language/cataloguing concepts and must get agreement on the terms in our community. The search for terminology to translate RDA will therefore be an ongoing process in the short term for us. … Now I am looking forward to seeing French and Spanish come along 😉 and would be happy to share a few resources I found which could help people in their search for terminology.”

Those of you who know German (or have an interest in multilingual vocabularies in general, might want to take a look at some of the work done already:

Content Type Vocabulary (you can see that for now, all concepts display in English)

Detail for concept of “computer program”: http://metadataregistry.org/concept/show/id/517.html (the German translation for the label appears in the list of properties of the concept)

Veronika points out that the process behind this effort is a complex one, but solidly based on existing relationships in the German-speaking world:

“[B]ecause of the federal system in Germany, the DNB works very closely with all library consortia in the country and Austria and decisions about cataloguing rules and data formats are reached through consensus with them. The reason for this it that the consortia include and represent libraries which existed long before the German state as such (or the DNB, for that matter) and therefore have traditionally and independently held the written cultural heritage of their individual counties, duchies, kingdoms etc.”

We have had some additional interest by other language communities in this effort, and Jon has added some detail on our wiki to describe how we plan to improve the software to make both building and maintenance of other language versions simpler, and easier to configure at the output end. Do note that this isn’t implemented yet, but is instead a blueprint for moving ahead in this critical area.

Updated Step-by-Step Instructions

Those of you who have actually discovered the Registry and tried to add stuff to it have (I hope) already realized that we had Step-by-step Instructions for doing so. They were old, and we’d added new things (mostly Jon added new things—I just rant, nag and test), so I finally re-did the instructions. They can be found here: http://wiki.metadataregistry.org/Step-By-Step_Instruction.
Looking at the old instructions was, for me at least, a reminder that we have made progress, much as it sometimes seems like we’re moving at a glacial pace. The interface has changed, we’ve added versioning and history, as well as schema registration (read Jon’s posts for more details). There’s still lots more to come, and believe me we have seemingly endless list of what’s still missing. But writing documentation, even basic stuff like these instructions, is a humbling experience. Trying to do things more linearly than I usually do reminds me yet again where the gaps are.

One of the issues, which I’m not sure I’ve papered over very well in the instructions, is something I call the “eating our own dog food” problem. Those of you who know me personally have heard me use that phrase before—it’s a favorite. It basically means that, if you’re just preaching about how to do something, and not doing it, you’re not eating your own dog food. Not a good thing, and likely as not it will affect your credibility in ways that aren’t very comfortable, because SOMEBODY will call you on it.

Where we managed to step in it (the natural product created from said dog food, that is), was when we extended the registry from value vocabularies only to value vocabularies and schemas. Then, our model of concepts and properties of concepts started getting a little funky. When you’re registering schemas, you’ve got an aggregation of schema properties, and then, um, properties of properties? Uh oh. You can see the problem, I think—it’s about identifying and defining terms (among other things), and isn’t that what we’re supposed to be doing?

So, for the moment, until we’ve figured out how to hold our noses and eat that unappetizing dog food, we’re making a distinction in the schema instructions between “schema properties” and “specific properties.” Not elegant, but until inspiration strikes, somewhat helpful, I hope.

If any of you have occasion to use the instructions or stumble upon them and want to provide some helpful (or not) comments, just send them along to me: metadata.maven@gmail.com.

Heck of a job, Phippsy

It’s been a busy summer, but not on the Registry front.

We’re currently working on integrating the ARC library so we can handle RDF a bit more intelligently. This will give us import capability, a SPARQL endpoint, and the ability to express vocabularies in more RDF serializations. We’ve also made some improvements to our URI-building feature, adding support for ‘hash’ namespaces and tokenized identifiers (rather than simply numeric). This means that a URI like http://www.w3.org/2008/05/skos#Concept will be built for you properly instead of having to edit the current default http://www.w3.org/2008/05/skos/12345 to get what you want. None of this even on the beta site, primarily because we haven’t had time to test it at all, and there are some things we know are still broken.

There’s also now a fairly simple PHP script that accesses the new Registry API to retrieve data remotely. You can see this in action at http://rdvocab.info/roles.rdf — there’s no data actually maintained on rdvocab.info, the data is retrieved from the Registry. We’re not publishing the script yet or documenting the API because, like so many things, they’re not quite finished — the script needs to be even simpler, tested with PHP4, and less dependent on .htaccess. The API needs a few more methods and also needs to require a key for some operations.

Expect to see some of this stuff appear in early September.

The grant to work on the Registry runs out in September, but I’ll keep working on it and hope to have some collaborators. I’ve been pretty poor at creating a welcoming collaborative environment, networking, and promotion so that may be a vain hope.

There’s a fairly long list of things yet to do and some of them are major. Application profile management is the biggest, but there are also things like the ability to follow, twitter-like, activity on a vocabulary, and more extensive control over notifications, and integrated discussions are needed to help support the vocabulary development features. The ability to import, export, edit, re-import, and have changes tracked throughout the process is also pretty critical. We want very much to integrate the sandbox into the main Registry, at least integrating user registration and making it possible to easily move a vocabulary from the sandbox to the registry. And there needs to be much more extensive help, better explanations of what’s going on, a place to report bugs and make suggestions that integrates with trac.

I’m off messing about in Canada on holiday for the next 2 weeks, so some of the things that I finished up this week will have to wait until I get back before they’re integrated into the site — I hate to potentially break things and then disappear.

Registry Installation Instructions

Jeepers, no posts for 3+ months and then two in one day! The truth is that I hadn’t realized the last post was still sitting in my drafts folder more than a month after I wrote it.

Moving on…

A number of folks have been interested in installing the Registry, especially since we’ve talked before about ‘easy installation’ being one of our design goals.

We’re pleased to announce that we have finally tweaked things to make a reasonably simple install from our subversion repository possible and provided some hopefully simple instructions detailing how to get the Registry up and running. We don’t provide enough tweaking or instructions (yet) to fully customize the interface, so once it’s installed it’ll still look exactly like the Registry, just running on your server instead of ours.

Whenever we update the production server, we’ll tag that code in subversion and update the link in the instructions (tying a string around my finger to help me remember as we speak), but there won’t be any other ‘release’ announcement unless we do something major.

Whenever we modify the database structure, we’ll provide a sql script to alter the database with each release. These scripts will always modify the database as it was after the previous release, so if you skip releases you’ll need to run the scripts sequentially. But this will all be on the instructions page.

We expect to update the production code quite often over the next few months.

Metadata Schema

If you’ve been watching the Registry closely (and we know you have), you’ll have noticed that a few weeks ago we started supporting the registration of metadata schemas. It’s not finished and far from perfect, but the perfect can often be the enemy of the good and at the moment it’s, well, good enough for now.

What makes it tough to get schema registration right is that our approach to what we’re calling registration attempts to be cross-cultural — trying to create a bridge from the technologies supporting the Semantic Web to the somewhat more ‘traditional’ data transfer technologies like XML.

We’re also trying to ‘eat our own dog food’ and are using an internally registered Application Profile to define the properties we’re using to describe metadata schemas and ultimately Application Profiles. This AP helps drive the schema registration user interface and we hope at some point we’ll be able to use a registered AP to generate many different interfaces, both human and application. It’s arguably too ambitious, but baby steps…

Vocabulary Management
The Registry is really more Vocabulary Management Application than Registry at this point, since we’ve layered so many management services on top of the basic registry functions. It manages two types of vocabularies:

  • Value vocabularies — unordered lists of values (terms) that we express as skos concept schemes in RDF and a simple enumeration in XML Schema
  • Class/Property vocabularies — lists of classes, properties (or attributes depending on your mental model) that we currently express as rdf:properties and rdfs:classes

Much of our terminology (value vocabularies, metadata schema, application profile) stems from our work with the Dublin Core Community more than the Semantic Web Community and maybe we’ll refactor some of those names as we move forward. But we hope the semweb folks can translate and we hope that the DC folks won’t hold our ultimate departure from some of their terms against us.

In the meantime, feel free to play in the sandbox.

Makes my head hurt

I was talking with Diane this morning about building the schema portion of the Registry and I feel the need to write down some of what we discussed.

For purposes of discussion, we have a draft schema property interface that defines some basic metadata schema property properties. We started the conversation because I was trying to get away from the “property property” nomenclature and because I couldn’t quite figure out the best way to extend the too-simple model to incorporate repeatable, typed notes/annotations.

Over the course of the discussion we came to a few conclusions:

  • What we’re really discussing is an Application Profile in the old DC sense of that term (it has since been changed to “Description Set Profile” to reflect the more DCAM-centric viewpoint of the current DC Community) in which we’re defining schema property restrictions, namespaces, and usage requirements: There can be only one token, definition, label, type and they’re required; ‘Type’ utilizes a controlled vocabulary containing the concepts ‘property’ and ‘subproperty’; etc.
  • We have a Schema Properties Vocabulary registered that identifies these schema property description ‘terms’ as ‘concepts’, but this isn’t really correct because they’re actually properties of a metadata schema ‘property’ (and so we’re back to property properties <sigh>) and as such they should be registered as an Application Profile rather than a Vocabulary.
  • The properties of each schema we register should be based on its own Application Profile, since there will be many different requirements and we’d like to provide some flexibility. For instance the RDA schema may need to have an additional property property that declares a relationship between the property and a FRBR entity.
  • We can’t register a Metadata Schema Properties Application Profile until we can register a Schema
  • In order to register a metadata schema we need a generic Metadata Schema Properties Application Profile
  • We’re stuck with “property properties”
  • This stuff makes my head hurt

In the interest of moving forward, stopping the spinning, and headache relief we’re going to pretend that a generic Metadata Schema Property Application Profile (MSPAP — pronounced ‘ems-pap’) exists and slap something together and make the interface fairly inflexibly tied to it. At some point in the future we’ll (hopefully) make it flexible enough to be based on any registered MSPAP.

TimeSlices and Versions

You can now retrieve a snapshot in time of the RDF or XSD serialization of a Concept/Scheme/Vocabulary by appending a ‘TimeSlice’ to the URI. For example: http://metadataregistry.org/uri/NSDLEdLvl/ts/20060422200002.rdf or http://metadataregistry.org/uri/NSDLEdLvl/ts/20060422200002.xsd will always and forever retrieve the SKOS/RDF or XML Schema representation of the NSDL Ed Level Vocabulary as it appeared at 2 seconds after 8PM on April 22, 2006 (2006-04-22 20:00:02). If you follow the above .rdf link you’ll notice that the concept URIs that the TimeSliced Vocabulary references have also had a TimeSlice appended: http://metadataregistry.org/uri/NSDLEdLvl/1001/ts/20060422200002.rdf in order to lock them into that precise point in time on an individual basis as well. We hope the utility of being able to reference a vocabulary at a particular point in time regardless of subsequent changes will be, well, useful.In order to make retrieving TimeSlices for specific events in the history of a vocabulary a bit handier, we added a TimeSlice link to every history event. You can specify a TimeSlice for any point in time regardless of its relationship to a history event, but the link just makes it simpler (it’s over on the right side of each line):

The Registry! :: NSDL Education Level Vocabulary :: History of ChangesUploaded with plasq‘s Skitch!

Named Versions

You’ll maybe also have noticed that there’s a ‘Name’ link nestled to the right of the RDF and XSD links. If you’re a Vocabulary Administrator, then you now have the ability to label a TimeSlice with a distinct version name. That link again is there to make it easy to reference a point in historical time and clicking on it pre-enters that TimeSlice into the Create new version form:

The Registry! :: Creating new versionUploaded with plasq‘s Skitch!There’s no limit to the number of versions you can create, and versions (unlike TimeSlices) can be deleted or edited by any Vocabulary Admin:

The Registry! :: NSDL Education Level Vocabulary :: List VersionsUploaded with plasq‘s Skitch!…although we think that either editing or deleting a version is likely to be a less-than-ideal practice. Still, we allow it — it’s your vocabulary after all.Once a named version has been associated with a TimeSlice, it will appear in the event history list just above the point in time it references:

The Registry! :: NSDL Education Level Vocabulary :: History of ChangesUploaded with plasq‘s Skitch!The RDF and XSD links on the right side of the version line now reference the version name:http://metadataregistry.org/uri/NSDLEdLvl/version/release+1.0.rdfBut this is where it gets a little incorrect… Since the named version URL is just a TimeSlice reference, it does a silent redirect to the referenced TimeSlice. It should probably do a 303 redirect instead. We’ll fix this later, unless it’s a show-stopper for one of our many users.

Improved user management

We’ve been promising for a while now that we’d make it easier, actually a better word is ‘possible’, for Vocabulary Owners to add ‘Members’ to Owner/Agents and ‘Maintainers’ to Vocabularies. We finally implemented it today! It’s unfortunate that it has taken us so long, since one of the primary goals of the Registry is to support multi-user vocabulary development, but it turned out to require more infrastructure twiddling than we thought it would.

If you’re a vocabulary owner and are logged in, you can add other registered folks as ‘members’ of your Owner/Agent and you can even make them administrators if you want:

The Registry! :: NSDL Registry :: List Members

The Registry! :: Editing NSDL Registry permissions for Diane Hillmann

Uploaded with plasq‘s Skitch!

We hope the process is pretty self-explanatory.

Once you’ve added a user as a member of your Owner/Agent group, you can add them to your vocabularies as Vocabulary Maintainers or Administrators.

The Registry! :: NSDL Registry Agents Vocabulary :: List Maintainers

The Registry! :: Adding maintainer to NSDL Registry Agents Vocabulary
Uploaded with plasq‘s Skitch!

We realize that this is still somewhat limited and of course the documentation is ummm, poor, but we’ll be doing more with user management shortly.

Major update

The registry database was updated last week, in both the sandbox and the registry, to support history tracking. This is in preparation for finally enabling timeslice retrieval and versioning.We also made some significant changes to the site layout and css, so if things still look a little funky, try refreshing your browser — most browsers seem to be caching our css and not detecting the changed files.In the process, we broke search (you may not have even noticed), but it’s fixed now.