Server move in progress

…from Rackspace to Digital Ocean and from one large-ish server to several smaller ones, from a relatively prehistoric version of RedHat to the latest Ubuntu 14LTS, from Zend Server+Apache to nginx (openresty actually), and from PHP5.2 to PHP5.5. It really wasn’t as painful as it sounds like it should have been. As part of the move, we also made some improvements in error and uptime reporting, integrated into a Slack channel for near real-time monitoring.

Actually the move is pretty much complete, although we’re still tinkering a bit with the caching and load balancing setup — we apologize for any brief outages you may experience while we do that — since we couldn’t afford to completely replicate the entire server stack in a staging environment.

Getting browser/proxy/server caching right is actually taking more time than we anticipated. We’ll let you know here as soon as we turn our attention to other things. In the meantime, if you see something, please say something — we’ve got issues!

Server issues

Normally the Registry just hums along quietly and doesn’t demand too much attention. But the last system update we performed seems to have altered our memory usage pretty dramatically and we’re quite suddenly having out-of-memory issues and some heavy swapping. We’ve expanded the server capacity twice already as a stopgap while we investigate, but before we move to an even larger server we’re testing some alternative configurations.

In the meantime there may continue to be periodic slowdowns, but you should see some improvement in a few days.

The last thing we want is for you to think we’re not seeing, or ignoring, the problem.

Still here, folks

When we first designed the NSDL Registry, part of the requirements was that it be able to run in Lynx (seriously) and be usable without JavaScript in all browsers including all versions of IE. As you will probably have noticed, the world of web development has changed a bit in the last 8 years.

Lately it has come to our attention that the Open Metadata Registry looks a lot like abandonware. The reality is that we’re hard at work on a long-planned, often delayed, and absolutely necessary update.

Stay tuned.

Whoops

Until lately we’ve been pretty happy with our ISP, Dreamhost. But a few months ago, after several during-the-presentation meltdowns of the Registry we determined that we needed to move to a higher-performing, more reliable server. We could have done the easy thing and moved to a Virtual Private Server at Dreamhost. Instead, we setup an entirely fresh server in the Rackspace Cloud and very carefully, with much testing created a fresh instance of the Registry with greatly expanded data capacity, some updated code, and considerably more speed. So far, so good.

We had a self-imposed deadline of several weeks before the DCMI 2011 Conference in The Hague and completely missed it. This left us with the choice of waiting until after the conference to redirect our domain to the new server or taking the risky step of switching domains just a few days before the conference. Of course, we didn’t wait. At which point we discovered that we couldn’t simply redirect our main domain to the new server but needed to redirect the subdomains as well, breaking our wiki and blog. Which we had a great deal of difficulty restoring while on the road to DCMI.

But everything’s back to normal now, and even updated. We now resume our regular programming.

SPARQL queries

We’re using Benjamin Nowack‘s excellent ARC libraries for a tiny bit of our RDF management. It may surprise you to know that we don’t use a triple store as our primary data store, but we do too many things with the data that we think are cumbersome at best when managed exclusively in a triple store (a subject for another post someday). Still, last year we started nightly importing of the full Registry into the ARC RDF store and enabled a SPARQL endpoint, thinking that it might be useful.

Lately, we’ve heard a few folks wishing for better searching in the Registry and since we’re actively building an entirely new version of the Registry in Drupal (a subject for another post someday) we’re loathe to spend time doing any serious upgrading of the current Registry. But we have SPARQL!

Yesterday, as part of another conversation, a colleague helped me figure out what I think is a fairly useful search query. If you follow the link, you’ll be taken to the Registry’s SPARQL endpoint which will display inline a list of all of the skos:Concepts in the RDA vocabularies which have no definitions. Well, 250 of them anyway since that’s the arbitrary limit we’ve set on the endpoint. They’re not hyperlinked (which would be really useful) but it’s still good info.

The SPARQL query used to create the list:

PREFIX skos: <http://www.w3.org/2004/02/skos/core#> 
SELECT DISTINCT ?s
WHERE { GRAPH ?g { ?s ?p ?o . }
OPTIONAL { ?s skos:definition ?Thing . }
FILTER (!bound(?Thing))
FILTER regex(str(?s), "^http://RDVocab.info/termList")}

…can be used to find any missing property (see ‘optional’) and the regex used in the filter can be modified to limit the search to any vocabulary, group of vocabularies, or a domain. I’m not enough of a SPARQL expert (meaning I’m completely clueless) to know how to filter by attribute, but it should be possible, if not easy, to find skos:Concepts that have an English definition, but no German definition (I look forward to your comments).

Check Your Bookmarks!

As of today, the link to Step-by-Step instruction on the Registry front page has been updated (and not a moment too soon). The new link still goes to the Registry Wiki, but the link itself is different, so those of you eager users who have bookmarked the old instructions might want to take a look at the new ones and update those bookmarks. The old instructions are still there, and, at the moment, not necessarily wrong, but they show the old logo, are pretty sparse in places, and haven’t been updated in a while so lack a few important functional bits.

The instructions are a work in progress–still being rewritten, expanded, and primped. We plan to continue building them up, and in addition are working on some training screencasts (with the valuable help of Tom Johnson)–and we’ll link those to the instructions as we complete them. We’ll try to continue to pop up and let you know when new parts are completed (you can see some empty stubs where we’ll be working along). As always we’re happy to hear if you have suggestions, complaints, etc., and votes for where we should be working next will be welcome (though not necessarily something we’re guaranteed to pay attention to).

We’re still working on planning for some major changes in the fairly near future, at which time the documentation will again be updated (we hope on a more ambitious schedule). As we all know, the work of a documentarian never ends … (can I have some cheese with that w(h)ine?)

Announcing the New Open Metadata Registry

As of this week, the familiar NSDL Regisry has a new name–the Open Metadata Registry–and a new logo. The name change reflects the fact that we’re no longer receiving funding from the National Science Foundation on behalf of the National Science Digital Library (NSDL), but also recognizes that the Registry has become one of the leaders in providing open, stable tools for those building infrastructure for the Semantic Web.

As part of this change, we’re joining our colleagues at JES & Co. as a project under their umbrella, as well as bringing current users and partners together in an Open Metadata Registry Consortium to build a sustainable plan for moving the Open Metadata Registry forward. Please watch for additional announcements and an expansion of the new look for our pages. If you’d like more detail on the Consortium, please contact Diane Hillmann at metadata dot maven at gmail dot com.

July 20, 2010

Readable URIs

Over the years we’ve been engaged in a number of discussions in which the ‘readability’ of URIs was raised, either as an issue with non-readable URIs or as a requirement in new URI schemes.

At the Registry, we understand and are sensitive to the desire for human readability in URIs. However embedding a language-specific label in the URI identifying concepts in multilingual vocabularies has the side effect of locking the concept into the language of the creator. It also unnecessarily formalizes the particular spelling-variant of the language of the creator, ‘colour’ vs. ‘color’ for instance.

When creating the URIs for the RDA vocabularies we acceded to requests to make the URIs ‘readable’ specifically to make it easier for programmers to create software that could guess the URI from the prefLabel We have come to regret that decision as the vocabularies gained prefLabels in multiple languages. And it creates issues for people extending the vocabulary and adding concepts that have no prefLabel in the chosen language of the vocabulary creator.

That said, the case is much less clear for URIs identifying ‘things’, such as Classes and Properties, in RDFS and OWL, since these are less likely to have a need to be semantically ‘understood’ independent of their label and are less likely to be labeled and defined in multiple languages. In that case the semantics of the Class or Property is often best communicated by a language-specific, readable URI.

In the end I personally lean heavily toward non-readable identifiers because of the flexibility in altering the label in the future, especially in the fairly common case of someone wishing to change the label even though the semantics have not changed. This becomes much more problematic when the label applied to the thing at a particular point in time has been locked into the URI.

I’m not trying to start a non-readable URIs campaign, just pointing out that the Registry, in particular, is designed to support vocabulary development by groups of people, whose collective agreement on labeling things may change over the course of the development cycle, who are creating and maintaining multilingual vocabularies. Our non-literal-label URI default is designed to support the understanding we’ve developed of that environment over time.

SKOS updated for Vocabularies

Just a quick note that today we updated the version of SKOS that we provide for describing value vocabularies. This deprecates the properties that were removed from the final SKOS release and adds the many new ones. We’ve also restricted the non-mapping relation properties (skos:broader, skos:narrower, skos:related) to the ‘containing’ scheme while providing cross-scheme mapping for the mapping relations.

We don’t yet provide a useful interface for building collections, but that’s coming real soon now.

Oh, and we added a SPARQL endpoint.

The German National Library: translating and registering RDA elements and vocabularies

A prerequisite for the registering of our terms in the NSDL Registry and one of the greatest challenges for the German National Library at the moment is the translation of the RDA elements and vocabularies.  Since bibliographic description is executed with a highly specialised vocabulary, we are finding that the process of pinpointing the appropriate terms is interesting but also very involved. Although the existing German rules for bibliographic description (RAK) and the authority files for subject headings (Schlagwortnormdatei, or SWD) have plenty of vocabulary to offer as equivalents to Anglo-American cataloguing terminology, RDA does include concepts relatively new to bibliographic description.

Before resorting to “inventing” words, always a last resort, we launch comprehensive vocabulary mining efforts, in the process of which, beyond checking already existing translations (FRBR, MARC 21), we consult the expertise such institutions as art libraries and film institutes to get the most up-to-date descriptive terms available in the German language. If we deem a word previously used in a translation suboptimal, we may deviate from its use and in particular cases forgo the advantages of standardisation in the interest of our primary criteria: consistency, currency, usability, and precision. A quick and general Google search can also be helpful to learn how terms are being (in)formally circulated. In the case that we should find it necessary to create a new term in German, as we are experiencing with such an example as the type unmediated, we have to weigh up what sort of etymological root we would like to lean towards, Latin or Germanic.  If we translate it with unmediatisiert, it can ease communication around cataloguing between nations because of its morphological similarity to many European languages.  However, leaning on Germanic roots may sometimes be necessary in the interest of standardisation and aligning with existing descriptive language or with the strengths and realities of the German language. In that case, we may be better off choosing nicht mediatisiert or ohne Hilfsmittel zu benutzende Medien, which seems awkward but conforms to types of uses already in existence in the subject headings. The option of the “new-proposed” status in the Registry for the concepts therefore suits our needs perfectly, since for the reasons just mentioned and outlined in Diane’s blog entry about multiple languages and RDA, none of the translations we have entered are as of yet official.

Once our small team of librarians from the Office for Library Standards has followed these processes and developed a pool of equivalent German terms which we deem worthy of proposing initially for the Registry and subsequently for our official translation of RDA, we make them available to groups of colleagues specialised in bibliographic description or subject headings at the German National Library for comment in a Wiki and working meetings. Our experience with translation has shown us that the translations of descriptive bibliographic elements and vocabulary into German must be handled by librarians (professional translators can potentially pick up from there) and peer-reviewed through the above-mentioned process to ensure accuracy and acceptance in the library community.

Beyond motivating us to begin our RDA translations early, our participation in the Registry really has also given us an opportunity to dabble in the semantic web through the process of assigning URIs to our German translations of RDA element and value vocabulary.  As a test run, it therefore allows us to toy with the idea of linked data by setting descriptive bibliographic vocabulary up with its prerequisite domain. The lessons learned and questions raised through this experience put us in a better position for strategic planning regarding the nature of the presentation and sharing of bibliographic data in the future.

What has particularly attracted us about the Registry and its connection with the RDA tool is that, provided that we do decide to provide linked bibliographic data in the future as an institution, the Registry makes it possible to do so in our national language. This is a condition for its wide-spread usability and acceptance in the German-speaking library and internet community and therefore of primary importance to us, provided of course that the Committee for Library Standards takes the decision to introduce RDA as the official rules for description and access in Germany and Austria.