Category Archives: Knowledge Management

LCSH, SKOS and subfields

This week, Karen Coyle wrote a post about LCSH as linked data: beyond “dash-dash” which provoked a discussion on the discussion list.

It seems to me that there are several memes at play in this conversation:


As Karen points out, LCSH is more than just a simple thesaurus. It’s also a set of instructions for building structured strings in a way that’s highly meaningful for ordering physical cards in a physical catalog. In addition, each string component has specific semantics related to its position in the string, so it’s possible, if everyone knows and agrees on the rules, to parse the string and derive the semantics of each individual component. The result is a pre-coordinated index string.

These stand-alone pre-coordinated strings are perhaps much less meaningful in the context of LOD, but this certainly doesn’t apply to the components. I think what Karen is pointing out is that, while it’s wonderful to have a subset of all of the components that can be used to construct LC Subject Headings published as LOD, there’s enough missing information to reduce the overall value. As I read it, she’s wishing for the missing semantics to be published as part of the LCSH linked data, and hoping that LC doesn’t rest on its well-earned laurels and call it a day.

Structured Strings

Dublin Core calls the rules that define a structured string a "Syntax Encoding Scheme" (SES) and basically, that’s what the rules defining the construction of LC Subject Headings seem to be. It’s structurally no different than saying that the string "05/10/09", if interpreted as a date using an encoding scheme/mask of "mm/dd/yy", ‘means’ day 10 in the month May in the year 2009 using the Gregorian calendar. Fascinatingly, that same ‘date’ can be expressed as a Julian date of "2454962", but I digress.

As far as I can tell, no one has figured out a universally accepted (or any) way to define the semantic structure of a SES in a way that can be used by common semantic inference engines, and I don’t think that anyone in this discussion is asking for that. What’s needed is a way to say "Here’s a pre-coordinated string expressed as a skos:prefLabel, it has an identity, and here are it’s semantic components."

Additional data



…is expressed in as…

@prefix rdf: <> .
@prefix skos: <> .  
@prefix terms: <> .  
@prefix owl: <> .

    skos:prefLabel "Italy--History--1492-1559--Fiction"@en ; 
    rdf:type ns0:Concept ;    
    terms:modified "2008-03-15T08:10:27-04:00"^^<> ; 
    terms:created "2008-03-14T00:00:00-04:00"^^<> ; 
    owl:sameAs <info:lc/authorities/sh2008115565> ; 
        <> , 
        <> ; 
    terms:source "Work cat.: The family, 2001"@en . 

…and has a 151 field expressed in the authority file as…

151 __* |a *Italy* |x *History* |y *1492-1559* |v *Fiction

…which has the additional minimal semantics of…

    loc_id:type "Geographic Name" ; #note that this is also expressed as a skos:inScheme property
    loc_id:topicalDivision "History" ;
    loc_id:chronologicalSubdivision "1492-1559" ;
    loc_id:formSubdivision "Fiction" ;
    loc_id:geographicName "Italy" .

…and this might also be expressed as…

   loc_id:type ;
   loc_id:topicalDivision ;
   loc_id:formSubdivision ;
   loc_id:geographicName ;
   dc:temporal "1492-1559" ;
   dc:spatial ;
   dc:spatial .

Making sure that those strings in the first example are expressed as resource identifiers is also something that I think Karen is asking for. (BTW, The ability to lookup a label by URL at is really useful)

I should point out that Ed, Antoine, Clay, and Dan’s DC2008 paper detailing the conversion of LCSH to SKOS goes into some detail (see section 2.7) about the LCSH to SKOS mapping, but doesn’t directly address the issue that Karen is raising about mapping the explicit semantics of the subfields.

Knowledge Organization Systems

There was an interesting discussion at lunch about SKOS and I finally had something click about the taxonomy of terms for ontologies. Yeah, yeah I know, but give me a second… It seems to me that it goes like this:

1. In the beginng was a controlled vocabulary, and it was good, but it was a flat list of terms.

2. A taxonomy is a single controlled vocabulary that contains an explicit hierarchy of terms that are related solely by their relative position in the hierarchy.

3. A thesaurus is composed of one or more controlled vocabularies and the relationship betwen the terms is expressed using a controlled vocabulary of ‘thesaural’ relationship concepts.

4. An ontology is composed of one or more controlled vocablaries and the relationship betwen the terms is expressed using relationship concepts that do not necessarily belong to any specific controlled vocabulary of concepts.

This may not be entirely correct, but it sure helped me make sense of that whole landscape.