CAT | Cataloging/Classification
23
2007Modeling Things or Revealing Things
0 Comments | Posted by Steve in Cataloging/Classification, Technology
Karen Coyle has a great piece on Hierarchies vs. Relationships in bibliographic modeling. She points out that the point of the FRBR model is not so much the hierarchy that you get to model, but the relationships that you can reveal among things.
This is a keen insight in my view since it really begins to get at the fun stuff that the Googles, Amazons, etc are doing with data that libraries long to do with bibliographic data. Coyle starts to articulate something here that I have not been able to put my finger on: the way that FRBR is a huge step forward but still only has an eye toward an implementation rooted in the way libraries have traditionally done things.
My library right now has been in discussions about subject guides and how to best build and provide access to them. I have felt for some time now that it would be great to get out of a next-generation catalog a system that imparts the kind of knowledge our librarians and subject liaisons put into these projects. Coyle’s post renewed this thought by framing the new catalog model in terms of a “Knowledge Management system,†which to my mind is the true aim of a discovery system.
In the past when I have tried to express a hybrid of a next-generation catalog and a subject discovery tool, I have always framed it in terms of applying graph theory to bibliographic data. I think Coyle’s post helps me to understand this. It seems obvious to use subject terms and call number ranges as one type of edge/vertex for nodes which are bibliographic items. However, her discussion raises the possibility of a new set of different kinds of edge types: translations, abridgements, extensions, etc.
9
2007What I want from a catalog
0 Comments | Posted by nate in Cataloging/Classification, Technology
It’s been a while since I’ve thought about what, in my mind, electronic catalogs are supposed to do. Today, Steve sent me a link to a test version of a very elegant catalog app built with a fraction of our catalog data. It really brings the cataloging data (you know, that stuff that librarians worked so hard to create) to the forefront, and has a great “shelf browse” view. This (plus this OPAC survey posted to code4lib) got me thinking: what should our catalog be, really?
It’s easy to get all Web 2.0 starry-eyed about this, perhaps partly because our catalog has been so ghastly for so long. People talk about social recommendations, comments, tags, structured blogging, and so on. There are a few problems with going down this road, though:
- Other people are alerady doing this, well, and for free.
- The Information Superhighway is littered with the charred-out husks of failed social networks. (Did you know Amazon added tagging a year ago? Have you ever used it?)
- Library catalogs, by definition, contain only your library’s stuff.
The first two points might be surmountable (and are really the same thing anyhow), but the third is the killing blow to any idea of catalog-as-research-tool. Amazon has more data than you. Google Books has more data than you. Worldcat has more data than you. The thing you need to do your research may be at someone else’s library; this is why we have ILL, after all. Using the OPAC to do research means you’ll miss out on everything that’s not local. We can’t fix that. All of the social networking, “More about this book,” “More books like this,” and so on are all based on using the OPAC as a research tool. We just shouldn’t do that.
The place where our catalog can excel, the place where no one can compete, is in finding things already in our collection. Try using your Voyager-based catalog to find out where a particular book (or journal volume) is. Want extra credit? Try finding a NASA technical report. For some stuff, it’s nearly impossible to do, even for librarians. The number of times I’ve heard a librarian say “Well, I just know this is probably over here…” makes me want to scream. We’re using a catalog that indexes all of our millions of things so badly that our librarians often need to ask other librarians to help find things that are sitting on a shelf or in a file drawer.
It’s shameful.
So… I’m happy to wait on all of the Web 2.0 goodness until we’ve mastered the Web 1.0 thing.
6
2007Strategic (cataloging) objectives
0 Comments | Posted by Steve in Cataloging/Classification, Culture, Technology
I have wondered lately whether the fundamental goals of cataloging are at odds with the 21st century digital environment? In a digital world, we build networks and networks are for bringing together remote objects. Now it is important to note that it is not simply a transfer of files from one physical location to another, but more of an expression language for telling a narrative (think REST). Those remote objects are more appropriately understood as concepts than physical files. The work is always more important than the document that gives it form.
Elaine Svenonius in The Intellectual Foundation of Information Organization traces the timeline of what I would call major missteps that we are only now beginning to recover from in library land. It goes something like this:
- Cutter stated his cataloging objectives which importantly included what Svenonius calls the collocating objective
- Lubetzky states his objectives and formally introduces the work/document distinction. His formulation was essentially adopted as the Paris principles.
- IFLA only corrects this mistake of omission 36 years later
Cutter’s original collocating principle is ignored in favor of a conceptualization that places a heavy emphasis on books/bibliographic items as uniquely identifiable things. It is only later in 1997 that IFLA reformulates and modernizes the objectives, which leads us to the current state of debate over FRBR and the new (old) world of facets. This is the point at which Svenonius hits the nail right on the head:
The traditional finding objective specifies that what is to be found is a particular known document, while the traditional collocating objective specifies that what is to be found is a set of documents, defined by criteria such as author, work, and subject. The first IFLA objective integrates these into a single finding objective. While this is logical and introduces a certain elegance of expression, at the same time it diminishes the importance of the concept of collocation. This concept is well entrenched in bibliographic discourse. It is particularly useful for the emphasis it gives to what in the first instance is the primary act of information organization — bringing like things together. Both for its set-forming connotations and its ties to tradition it is too valuable to lose.
- Chapter 2, section on traditional objectives
The final emphasis is mine. While uniquely identifying an item is important this is going to happen whether we like it or not since it is an inherent feature of any system that functions at the most simplistic level. Now for an incredibly long time leading up to the current era cataloging was rooted in identification of bibliographic items at the expense of collocation of bibliographic items. Adding to the significance of this, during the era of mass digitization of bibliographic records (we can call it the Gorman era), this is the model that was used: identify.
In the past few years we have seen library land, with more tradition, history and rich data than anyone else in the world, get trounced by corporations and mashups who understand that you can get miles farther with vastly simpler description of the physical item if you give people something more important: collocation. People who bought this also bought this.
It boggles my mind why more librarians don’t seem understand this: we will never have to make the sales pitch to other bibliophiles, those people already understand the value of a book or other bibliographic entity. Anyone who obsesses over the edition of a book and who it was that wrote the introduction or preface is probably already sold on the value of libraries. It is the people who don’t understand how rich a library collection is that we need to spend our efforts seducing. We can do that by weaving the web: description should be base on 21st century collocation, not that old sixties issue, identity.
Karen Coyle recently pointed to a paper by Allen Renear and Yunseon Choi [pdf] in which they claim that inheritance is a poor way of describing the hierarchy in Group 1 FRBR entities. Renear/Choi mistakenly claim that the work entity is a model of an abstract thing and therefore work entities have some kind of “is-abstract” attribute. There is no thing like an “is-abstract” attribute for works. Rather the attributes of a works, expressions, manifestations and items are the pieces of bibliographic description such as “title”, “uniform title”, “copy number”, “call number” etc…
Karen Coyle has is right and Renear/Choi are confused in there concept of FRBR Group 1 entities. Coyle states, “I tend to consider all aspects of metadata to be abstract in nature, since it is a representation of something else.” Renear/Choi state
The argument is simple: FRBR describes works as abstract and items as concrete. If all properties of “higher†entities are inherited by “lower†entities then items inherit the property of being abstract, and therefore items will be both abstract and concrete. But nothing is both abstract and concrete – therefore there is no unlimited general property inheritance in FRBR.
They mistakenly seem to think that because works model something abstract that the model has some kind of “is-abstract” attribute or property while items, which model concrete things have an attribute on par with “is-concrete”.
There is no “is-abstract” attribute for works, expressions or manifestations and there is no “is-concrete” attribute for items. The attributes of an item might be things like “copy number” or “location code.” The attributes of works might include a unique identifier that serves as a reference to collocate related expressions. However, that ID is still a concrete thing (likely an integer, see below). Or a work might have an attribute like “uniform title”. Expressions might add an attribute like “transcribed title” or “language”. Manifestations … well, you get the point. (See, for example, page 32 of the FRBR document.)
There is nothing abstract about an implementation of FRBR. Even if the Work was represented as nothing more than an arbitrary identifier in a system, it is still a concrete thing. Works may be no more concrete than integers, but it is important to remember that in a modern system integers are the thread that binds together the cloth of the fabric of content in a RDBMS. The work entity is simply the thread that binds together the more substantial entities one level down: the cloth-like expressions.
The hierarchy of the inherritance relationship still works because there is no logical conflict between any attributes of a work and any of an item. Renear/Choi come close to acknowledging this in their paper:
It may be objected that this argument is sound but irrelevant as the only properties ever at issue were the attributes (or attribute values, or attribute/value pairs) explicitly specified in FRBR. So for the work entity the only relevant properties are ones such as title, form, context, and so on; but not the property of being abstract. On this account only specified attributes (and/or attribute values) are inherited and the argument given does not apply as “abstract†and “concrete†are neither attributes nor attribute pairs. We believe that even this limited version of inheritance is misconceived, but before presenting our arguments against it we explore it further.
But what I think they miss is that works do not have a property of abstraction at all. I am not sure that Romeo and Julliet the work has any property of abstractness. As Meriam Webster points out “the word poem is concrete, poetry is abstract.” However, investigating the properties of poetry (being the productions of a poet, being poetic) we do not find a property corresponding to abstractness. It is the same with FRBR entities. “A work is an abstract entity” (FRBR document), but that does not mean that Romeo and Julliet the work has the property of abstractness.
Is this moot anyway?
The conflict they raise may actually be desirable. I think there is a bit of a tension between a theoretical model here and a practical model. I live in a practical realm and think about ways to implement a FRBR-based application. In the practical world there is at least one important solution to the conflict of interest that Renear/Choi raise: overriding an attribute (or behavior).
Take a programming language like Java as an example. In Java it is possible to override the behavior or attributes of an inherrited class in a subclass. In implementing a system and working in one of the prominent object-oriented programming languages you might be able to define works as abstract and items as concrete. This could then have useful implications in the display of items in your collection. A user interface could browse works and items differently. When just looking at lists of items you see things that you do not when merely taking a look at the work-view of a bibliographic entity (such as a shelf location).
7
2005Library technology woes
2 Comments | Posted by Steve in Cataloging/Classification, Technology
Maybe if I were to set it to music, people will do it:
The bane of my existence for the past couple of weeks has been the lack of a universal identifier for electronic resources like databases and full text aggregators. When trying to match records between one system and another, e.g., a link resolver and catalog, an ISSN is a blessed thing. Why, oh, why can’t I have one for my database resources? Sure, there are OCLC numbers, but there is no incentive to add such a thing to a system that does not participate in cooperative cataloging.
-Sad Librarian
19
2005Popular bibliographic description
2 Comments | Posted by Steve in Cataloging/Classification, Technology
I subscribe to the Web4Lib listserv and for the last 8 or 9 months there has been a steady stream of comparisons between popular online web services (namely Google) and what we do in libraries. So my mind has trained lately to notice the differences between a librarian’s notion of what is important when it comes to information and what the rest of the world thinks of the world of information.
So I stumbled upon something that left me a little bit shocked: unless I am missing something, the iTunes music program does not provide a display for the record label of a given song. Does it strike anyone else as odd that this little bit of bibliographic description does not make the cut for the popular music catalog of the iPod era? Am I just a librarian for thinking this is odd or is “record label” beyond the 4/4 time signature of what is needed to identify and browse a music collection?
(I did check and it turns out that the Gracenote CDDB that provides the service for the getting track names into iTunes does contain a much richer bit of metadata for an album than Apple uses in its program. So it appears to be a conscious decision to use a weak set of descriptive elements.)