CAT | Code
11
2010MARC 856 Fields Should Change
0 Comments | Posted by Steve in Cataloging/Classification, Code
MARC URL cataloging for online resources should be changed to accommodate a critical, but missing, piece of descriptive information: an open access indication. After thinking through the issues with the MARC URL cataloging and working through a pragmatic solution in my professional life, it seems to me that there is a simple but powerful change that should happen to the cataloging 856 fields.
As a library web developer with practical experience processing MARC data, I believe the first indicator of the 856 field is useless. It is useless for two reasons. First, the access method for a given URI is usually built into the URI itself. The vast majority of the URIs found in subfield $u of an 856 field will tell you the access method is, for example, based on the File Transfer Protocol (FTP) or the Hyper Text Transfer Protocol (HTTP). That is why our URIs have prefixes like ftp:// or http://. The second reason it is useless is because there is already a subfield $2 that can be used if you need to be explicit about what the access method is.
When I have MARC data that looks like the following (in pretty-print form, of course):
856 40 $u http://digital.library.wisc.edu/1711.dl/AldoLeopold $z Available through UW Digital Collections
That first indicator is redundant. I know that the access method is HTTP by the URI itself. Furthermore, if it was absolutely necessary, it could be coded like this:
856 40 $2 HTTP $u http://digital.library.wisc.edu/1711.dl/AldoLeopold $z Available through UW Digital Collections
This would free up a valuable indicator to be used to say whether the electronic location is for a resource with open access. A value of 0 could indicate not open access and a value of 1 could indicate open access. This could be taken a step further in the case where the resource is not open access. There could be a recommendation for local copies of the catalog record to use a subfield which says for whom access is restricted, such as, “Faculty, Staff and Students.”
20
2010Woe is MARC (& URLs)
1 Comment | Posted by Steve in Cataloging/Classification, Code, Technology
Background
At present I am working on a project that we have given the code name UW · Forward. Forward is a union index over the library collections within the UW System. Currently the UW System Libraries has 14 different ILS Catalogs, one for each of the 13 four year campuses and another one for all of the two year campuses. Forward searches the data across all libraries.
We deduplicate all of our MARC records so that we can have a single record for items held at multiple locations. As part of the process of deduplication, for certain MARC fields, we add all instances of a given field into the combined record. We do this for holdings represented in 852 fields and for URLs in 856 fields.
The Problem
MARC capture of URL description has proven to be inadequate in our context. We simply do not have the kinds of information coded in 856 fields to clearly present access to online representations of library items.
Licensed Content
When a particular campus has access to a licensed online resource, the local version of the URL used is incredibly important. The library in question usually needs to proxy the URL through a campus authentication & authorization mechanism to provide off campus access to databases and e-journals. These are arguably the most important online resources in our catalogs – they are not only carefully selected, like all online resources, they are also some of our patron’s preferred library resources and the ones taking up greater and greater percentages of the library budget.
This is where the first problem lies: the MARC 856 field does not provide any indication that a given URL is for a licensed resource that has restricted access. This problem is complicated by the fact that sometimes a UW System campus catalogs a URL in its proxied form and other campuses insert a proxy string at runtime. In this case the cataloged form of the URL is not proxied but the OPAC software inserts the proxy prefix when it outputs a web page. A “restricted access” or “deep web” indicator would be great here. Alas, the indicators are already reserved.
What makes this more complex here at the UW System is the fact that the Forward project is a union index and provides a single view for all campuses. So we have the added problem of not just telling the end user what the correct URL is, but for whom that URL is valid. Traditionally this kind of information is coded into the free text public note subfields (usually $z). As a free text field, the way any given library indicates that a URL is available only to certain users is all over the place. There is nothing consistent enough in our MARC records that come from multiple sources that I can address in code.
Free Online Resources
The free online resources that have been cataloged in our OPACs are also extremely problematic when introduced into a consortial/union context. The following scenario represents a huge missed opportunity. There are certain resources that have been cataloged by one or a few of the campuses within the UW System. A great example would be our own UW Digital Collection Center’s site for The Aldo Leopold Archives. Forward currently has a record for this archival collection, for which the data comes from a few schools within the UW System. A close inspection of the MARC data in question the staff view raises a few problems:
There are two URLs, each cataloged 6 times by different schools. However the digital collection that has been cataloged should clearly indicate that it is available to the entire patron base of faculty, staff & students from the entire UW System, not just the six schools who contributed records. We are missing an opportunity to let the other schools who did not catalog the resource themselves benefit from the cataloging provided by the 6 schools that did catalog this digital collection. Again, it would be nice to have an indicator which states this is an “unrestricted free online resource”. The same indicator but opposite value as what is needed for licensed online resources.
Furthermore, there is no agreement among the schools who did catalog the resource as to whether the URLs in question are for the thing itself or a related resource. Notice the way that the second MARC indicator is applied inconsistently in the 856 fields. Some campuses consider both URLs to be related to this resource, while one school thinks one of the URLs is for the resource itself and another thinks that both URLs are the resource itself. The inconsistencies here make it impossible to write code that efficiently displays to an end user how to use the resource in question, which is to say that the descriptive function of the cataloging in question is failing in our unified context.
Steel cage grudge match!
In this corner we have Licensed Resources, weighing in at a hulking 60% of your library budget. In the opposite corner are Free Resources, nimble, agile and known to fell a giant or two. Get your tickets. The fight will be phenomenal and sensational.
Finding solutions that make sense for free online resources are proving to be the opposite solutions that work for licensed content. For example, for free content, the best thing to do is drop any affiliation with the school who cataloged the resource in question, deduplicate the set of URLs and display a small set of links to everyone. However, for licensed content the affiliation between school and URL is important so that the end user can authenticate and prove she has the credentials to use the resource in question. And all the while there is nothing in the MARC data that tells me a given URL is restricted to a particular subset of the population for whom we intend will use our union catalog.
I am spending most of my time at work these days on the UW · Forward project. One of the big features we are trying to bake into the application is the ability to place requests from one library to another or from campus to campus in the UW System. To accomplish this we are using the Voyager XML Over HTTP Web Services API. Using these web services feels right, like a nice sweet spot in the relationship between libraries and their ILS vendors.
The API works well, though my institution is only using Voyager 7.1 and so we are using one of the early versions of the APIs and it has parts that are not documented well or that are a bit sloppy. There are seeming mismatches between some of the XML values returned and the data the element names represent. For example, when placing a request there is an expiration date that is returned. However, the timestamp that is returned corresponds to the time the request was placed.
Additionally, it is difficult to find complete documentation from Ex Libris on all the different cases for which data is returned. When staff process a request made by a patron, there are many stages that the request steps through. The documentation is a little light on what numeric codes correspond meaningful stage names. Or certain elements appear and disappear as a request is put through different stages, but it is difficult to know this without nearly reverse engineering the process.
These are places that the API could be improved (and yes, documentation is just as important a piece of an API as the API code and requests/responses themselves). But overall, I am quite glad that our ILS vendor is putting these services in place. It enables us to embed ILS functionality in places for which it would be unreasonable to expect the vendors themselves to put them. The work we are doing on our Forward project can also be used in places like the campus portal, where people manage their other accounts associated with university life.
We are writing a Ruby plugin for the Voyager API. We are starting with the basic functions we need:
- authenticate a patron
- return his/her ILS account
- renew items
- place a request for an item
- cancel requests for an item
Our code is not fully baked itself and not documented yet, so there nothing to share at this time. We do intend to share it more broadly in the future.
Last week a colleague introduced me to Sinatra, a lightweight web app framework for Ruby. The Sinatra website describes it as, “a Domain Specific Language (DSL) for quickly creating web-applications in Ruby.”
Just as David Berman urges residents to “leave Kentucky, come to Tennessee,” I can urge my shop to finally ♫ leave PHP, come to Ruby ♫. I have no problem with PHP, I would just like to move to Ruby for the small stuff as well as full-blown Rails apps. I have been looking for something to write simple one-off Ruby apps with, the kind of project that does not require a full Rails application because, for example, it requires no ORM as it has no database. Usually these one-offs were the kind of things I would punt over to PHP.
The particular case in which I am employing Sinatra is a one page web form for paying library fines with a credit card. Our campus has a central credit card payment vendor, so all we need to do is log someone in and figure out how much he owes according to our ILS. The form action submits somewhere else so we don’t need a full web app. We do need to take the logged in user’s ID and query our ILS through its API to get the fine amount. So we will wrap this web form in a campus login and before writing the form, make an HTTP call to the ILS to prepopulate the fine field.
The Sinatra app looks like the following:
./: -rw-r--r--@ 1 myuser admin 336 Dec 2 16:21 config.ru -rw-r--r--@ 1 myuser mygroup 623 Dec 3 07:54 application.rb drwxr-xr-x 5 myuser mygroup 170 Dec 2 13:10 lib drwxr-xr-x 6 myuser mygroup 204 Dec 2 16:22 public drwxr-xr-x 4 myuser mygroup 136 Dec 2 16:13 tmp drwxr-xr-x 5 myuser mygroup 170 Dec 2 13:10 views ./lib: -rw-r--r--@ 1 myuser mygroup 585 Dec 2 13:10 authenticate_patron.rb -rw-r--r-- 1 myuser mygroup 441 Dec 2 13:10 my_account_service.rb ./public: -rw-r--r--@ 1 myuser mygroup 19 Dec 2 13:10 index.html ./tmp: ./views: -rw-r--r--@ 1 myuser mygroup 3552 Dec 2 13:10 layout.haml -rw-r--r--@ 1 myuser mygroup 2761 Dec 2 13:10 payfines.haml
Here is a breakdown of the files involved, getting the simple stuff out of the way first.
Rack & the ./tmp and ./public directories
Sinatra can be deployed as a Rack based app. On our servers we will run this application through Passenger/ModRails, so the tmp directory exists primarily for bouncing the app via
$ touch tmp/restart.txt
The public directory is what Passenger uses for the application root in the Sinatra app. We are deploying to a sub-URI on the server so we have the following in the apache conf:
RackBaseURI /fees
And in the document root for the server a symbolic link to point to the public directory:
$ ls -l /path/to/apache/docroot lrwxr-xr-x myuser mygroup somedate fees -> /path/to/sinatra/webapp/public
The ./views directory
I place the HTML views in this location. I use a layout file for the full template for my website with a yield just like I would in a Rails app. The other file, payfines, is a view that corresponds to a matching route defined in the ./application.rb. This info renders the HTML form.
The ./lib directory
I am using this location to keep files that map the XML responses I expect to get back from my ILS into objects that will be available in my payfines view. I am using HappyMapper. The great thing about Sinatra is that you can require 'rubygems' and then use any Gem in your application.
The app itself and deploying
application.rb
The following is my first version of the application file itself. It is in need of error handling and refactoring, but I include it to show just how simple an app can be.
# application.rb require 'rubygems' require 'sinatra' require 'haml' require 'happymapper' require 'lib/authenticate_patron' require 'lib/my_account_service' require 'net/http' get '/payfines' do h = Net::HTTP.new('localhost') hresp, @auth_patron_xml = h.get('/fees/auth-patron-response.xml', nil) @patron = AuthenticatePatron::ServiceData.parse(@auth_patron_xml, :single => true) hresp, account_xml = h.get("/fees/my-account-response.xml?patronId=#{@patron.patron_identifier.patron_id}&patronHomeUbId=YYYY", nil) @account = MyAccountService::ServiceData.parse(account_xml, :single => true) haml :payfines end
config.ru
The final piece is to create a configuration file for Rack. The Sinatra Book has a section on deploying to Passenger. You can also Google for examples like the following:
This is just an old fashioned link log post. The following is a good series on functional loops implemented in Ruby by Rails Spikes:
- Functional programming and looping
- Understanding map and reduce
- MapReduce, with inspiration from functional programming