Circulatable: a Librarian’s Group

Because sometimes you need to trammel the editor and exorcise the rules of grammar…

CAT | Culture

My former boss and colleague Andrew Pace recently commented on the nature of the network and how he was rebuffed by a colleague for overlooking the fact people that make up the network and this is the most sigificant piece of a network. I would like to respectfully disagree with his post. Andrew used to boast that he is 100% right 50% of the time and in this case I believe he was right during the initial part of his musings on this topic.

What is the significance of the network in the 21st century? What we understand as the network is a contemporary realization, or maybe the automated reality, of the old adage that the total is greater than the sum of its parts. And quite frankly this realization was made possible by the amazing things that computers are doing with data.

Page Rank is arguably the shot heard throughout the Web. With their Page Rank algorithm Google was able to solve a problem that was plaguing relevancy in Internet search results: we’re all a bunch of dirty rotten liars. Back in the Yahoo/Alta Vista early days of search engines people were figuring out ways to game the system by lying through their metadata. In order to have their crappy cover band’s web page show up when a user searches for the Rolling Stones the cover band simply needed to put ‘rolling stones’ into its metadata.

Page Rank came along and solved the problem by saying, ok, we will let the network sort out the relevancy and if the network can prove that your website is a good one, you will be rewarded in search results rankings. This is the significance of the network. For better or for worse, the network can prove whether or not the data byproduct of the people is in fact worth what those people claim it is worth.

As Ian Ayers points out in his book Super Crunchers, the world is now using data to make better predictions than traditional experts. What is more, the statistical models being used by doctors, corporations, governments and non-profits are able to leverage the network effects of large data sets to verify how well those predictions are performing and improve those predictions instantly as new data becomes available.

I believe that my issue here is all sematics and I may simply be quibbling over something petty. However, I am splitting hairs over this point because this is a troubling area for libraries in my view. If we get caught up in the mushy people narrative over one of the most significant cultural shifts that is occurring right now, we will miss the point and consequently we will miss the opportunities to maintain the cultural relevancy of libraries in the future. The danger, in my opinion, is similar to the paralogism that because I know the structure of a MARC record I understand how it is stored in a modern RDBMS.

It is imperitive that we know how Lucene/Solr works so that we can make better resource discovery systems. It is similarly imperitive that we understand how to get in the super crunching game. As Andrew and his colleague Lorcan Dempsey have noted on numerous occassions, we need to do much more with our data, because it’s the network effect, stupid.

(For the record, I do not intend to call either Andrew or his colleagues stupid, I am just leveraging a theme that he and I have been riffing on for a couple of years.)

No tags Hide

The New York Times has a short piece on a new Google service called Knol that sounds like it could have been conceived by librarians:

“We believe that knowing who wrote what will significantly help users make better use of web content,” wrote Udi Manber, vice president of engineering, on the official Google blog.

The service appears to be a wiki-style hosting service that puts a premium on identifying authorship.

No tags Hide

Oct

17

2007

I’ve been busted!

Unless Karen Coombs is writing about some other reference statistics tracking package that has an (until recently) undocumented dependency on Pear::DB, her blog post calls out one of the (numerous) failings of Libstats: Installation is difficult for a lot of people. I get a lot of questions from people who have trouble with mod_rewrite or don’t know DB is required or various other things.

I’ve had similar negative experiences with open-source software, and actually releasing something gave me a much better understanding of why things wind up like this.

A few years ago, our library decided to write a reference tracking system and pilot it at a few libraries across campus. Since I was, then, the only developer at our library, the task fell to me. Once the system had proven successful at Madison, I thought, “Hey, maybe other people would like this, too.” I got the OK from my boss to release the code under an open-source license.

This, it turns out, is tricker than it might seem. All of those steps I’d fumbled through to make the software run, I had to eliminate, or at least explain, to people installing this software on the servers they have on hand. Databases need to be created and populated with initial data. Web servers need to be configured. Did I want to provide a demo? Screenshots? Big software projects provide installation wizards, but writing those is a bunch of work, and from my boss’s perspective, the software was written and done, and I had other projects to work on.

Then, there were concerns over the quality of the code. There’s some ugly shit in there. Did I really want people looking at that, and pointing and laughing? What if there’s a security bug in the code that could compromise someone’s server? Even if it relies on server misconfiguration, I’d feel pretty lousy if my code got someone hacked. How will people find out about, obtain, and install patches? Seriously, I wondered, is it even worth the work it’s gonna take to release this code?

Finally, I decided that it was worth the work, and that I’d release it, warts and all, in the hopes that it would be useful to some people. In the time since then, I’ve realized that the motivations of an open-source developer are different from that of a commercial project manager. I don’t get any reward from wide adoption, except a warm fuzzy feeling inside and possibly bragging rights if I make something exceptionally neat.

The bottom line: There’s a large cost and a limited benefit to making an open-source project into an open-source product, and that work will never ever happen as long as the project is only used internally — it’s not needed.

Here’s the question, then: Is it better to release something half-baked, in the hopes that it will be useful, or to keep it purely internal and let someone else solve the problem?

(On the particular topic of not documenting the Pear::DB requirement: when Libstats was released, DB was part of the standard PHP install, so this wasn’t a common issue. Reworking the code to use Pear::MDB is the right option, but that’s nontrivial.)

No tags Hide

Jun

13

2007

Libstats 1.0.4 – Security Release

Hi there,

Y’all will want to download & install the latest version of Libstats. This version fixes a security bug that will affect anyone running with register_globals on in their PHP setup.

No tags Hide

May

1

2007

Libstats 1.0.3 released

Yup, it’s time for that very occasional release of Libstats. Changes in this version:

  • On the report form, times are honored in the range fields
  • asked_at is now on the data dump report
  • question_time, question_weekday, etc now reflect when a question was asked, not when it was entered
  • ‘All Libraries’ is now an option for reports.

Enjoy!

No tags Hide

Mar

15

2007

Generation G

In the past couple of weeks I have had casual discussions with colleagues about the surge of Google in the university sphere. For example, our library is involved with the Google book scanning project. There have been other discussions around one of Google’s latest service offerings to universities: email.

Getting out of the email business is a very attractive proposition. It is a costly piece of infrastructure to support and it requires talented people to do well. This diverts some of our best human capital and technology resources away from other areas that are more specific to the university’s teaching, research and outreach domains. If the privacy issues surrounding financial aid decisions and other sensitive data can be resolved when storing this information on privately owned servers, it is very tempting for a university to get out of the email game.

However, it is important to ask the question about why a company like Google would want to get into the email business for universities. Here is a guess: having access to the email of university students offers a solution to the Generation X problem. To my mind, the lasting significance of the phrase “Generation X,” as well as its newer spinoffs “Generation Y” and “Millenials,” are the implications for business, advertising and marketing.

While the image of Kurt Cobain may constitute the tragic poster of Generation X both his in life and in his death, from the ‘follow the money’ perspective the phrase refers to a black hole in the advertising business. Advertising revenue goes in and yet no sales come out. Remember that Generation X, in part, was meant to describe a segment of the population, “twentysomethings,” that were unpredictable and therefore unreachable by marketers.

Children and teenagers are easy – just give them candy or some outlandish rebellious style. Middle aged folks are similarly easy – sell them big expensive things like sports cars or retirement packages. But people in their twenties have a lot of discretionary income from their first jobs and no major cash drains like mortgages and offspring. And yet marketers were failing to connect.

So let us take this discussion back to the original point: why is Google eager to take email over for universities. There are 2 significant pieces surrounding Google’s way of doing email:

  1. Google and similar companies do not make their money off of the immediate services. You don’t pay Google for searches you do at its website. Likewise, Google probably could not make a significant amount of money on selling an email service. How does Google make money? Advertising. They don’t make money off the service provided to an end user, email in this case. Instead they make money on the data they collect about end users during the service’s use.
  2. Google has a model for email vastly different from everyone else until everyone else began to ape Google’s model for email. The motto for their email service is, don’t delete, archive. Interviews I have seen and read with the Google founders also discuss their amazement that advertising has not caught up to the technology available in the 20th and 21st centuries. By harvesting large quantities of data, be it web pages or 2500 megabytes of email, you can deliver advertising to a person that is smarter and more likely to produce results.

Imagine that you could learn about and understand the personal associations and cultural references of an educated young adult just as s/he was heading out into the world to collect those first paychecks. What would you need to do? Well, one strategy would be to harvest their preprofessional communications that relate to their studies. Then you might combine that with harvesting personal and social communications among their peer groups. And doing this for, say, 4 to 5 years would provide a nice robust data set.

Now I do believe that Google is not going to sell any personal identities to advertisers or anyone else because losing their customer base as a result of what would essentially be corporate identity theft would be detrimental to their bottom line. However, you could build some rather impressive anonymous marketing personas that would be worth their digital weight in gold to advertisers.

I don’t think that if universities get out of the email business it is necessarily a bad thing, but I do think we need to be cognizant of what future we are contributing to. This is a technology decision that should be discussed with some of our best and brightest minds not only in the IT departments on our campuses, but also in our philosophy/ethics, business and sociology departments.

No tags Hide

Karen Schneider had an energizing warning for the conference attendees — for years now, libraries have given up ownership, control, and expertise in information management. We don’t own or build our catalogs and supporting — we rent embarrassingly poor systems from unresponsive vendors. We don’t catalog our own data (or when we do, we’ve brilliantly decided to pay OCLC for the privilege of doing this work) — again, we rent. We don’t even own the materials our customers need; this, too, is rented.

This made me think: I’m all in support of Google’s book digitization project, but… um… do we have any plan whatsoever for when our physical collection completely loses relevance? More to the point, if the entire function of libraries becomes that of collection managers (read: people who sign great huge checks to cartels of publishers)… well, how many librarians do we really need on campus, then?

Research is getting easier all the time — I know serious researchers that use Google Scholar almost exclusively… to very good effect. Teaching information literacy will be relevant only until incoming students have better information literacy skills than our instructional staff.

Anyone want to take bets on when that’ll happen?

Part of the answer, as demonstrated at this conference, is to take the power back. We need to start building things; we need to find new, better ways for customers to find information. Remember — librarians did this for centuries, until computers came along and scared everyone. Google isn’t the only company that can build a good search engine. And a concerted effort by a few institutions could take any of the commercial ILS vendors sitting down, as the Evergreen folk have shown.

We can do this. We need to totally change what we’re up to.

No tags Hide

I have wondered lately whether the fundamental goals of cataloging are at odds with the 21st century digital environment? In a digital world, we build networks and networks are for bringing together remote objects. Now it is important to note that it is not simply a transfer of files from one physical location to another, but more of an expression language for telling a narrative (think REST). Those remote objects are more appropriately understood as concepts than physical files. The work is always more important than the document that gives it form.

Elaine Svenonius in The Intellectual Foundation of Information Organization traces the timeline of what I would call major missteps that we are only now beginning to recover from in library land. It goes something like this:

  1. Cutter stated his cataloging objectives which importantly included what Svenonius calls the collocating objective
  2. Lubetzky states his objectives and formally introduces the work/document distinction. His formulation was essentially adopted as the Paris principles.
  3. IFLA only corrects this mistake of omission 36 years later

Cutter’s original collocating principle is ignored in favor of a conceptualization that places a heavy emphasis on books/bibliographic items as uniquely identifiable things. It is only later in 1997 that IFLA reformulates and modernizes the objectives, which leads us to the current state of debate over FRBR and the new (old) world of facets. This is the point at which Svenonius hits the nail right on the head:

The traditional finding objective specifies that what is to be found is a particular known document, while the traditional collocating objective specifies that what is to be found is a set of documents, defined by criteria such as author, work, and subject. The first IFLA objective integrates these into a single finding objective. While this is logical and introduces a certain elegance of expression, at the same time it diminishes the importance of the concept of collocation. This concept is well entrenched in bibliographic discourse. It is particularly useful for the emphasis it gives to what in the first instance is the primary act of information organization — bringing like things together. Both for its set-forming connotations and its ties to tradition it is too valuable to lose.

- Chapter 2, section on traditional objectives

The final emphasis is mine. While uniquely identifying an item is important this is going to happen whether we like it or not since it is an inherent feature of any system that functions at the most simplistic level. Now for an incredibly long time leading up to the current era cataloging was rooted in identification of bibliographic items at the expense of collocation of bibliographic items. Adding to the significance of this, during the era of mass digitization of bibliographic records (we can call it the Gorman era), this is the model that was used: identify.

In the past few years we have seen library land, with more tradition, history and rich data than anyone else in the world, get trounced by corporations and mashups who understand that you can get miles farther with vastly simpler description of the physical item if you give people something more important: collocation. People who bought this also bought this.

It boggles my mind why more librarians don’t seem understand this: we will never have to make the sales pitch to other bibliophiles, those people already understand the value of a book or other bibliographic entity. Anyone who obsesses over the edition of a book and who it was that wrote the introduction or preface is probably already sold on the value of libraries. It is the people who don’t understand how rich a library collection is that we need to spend our efforts seducing. We can do that by weaving the web: description should be base on 21st century collocation, not that old sixties issue, identity.

No tags Hide

Jan

6

2007

A book is not forever

A recent article from the Washington Post exposes the sometimes “ruthless” practice of weeding in our public libraries.

I find myself more and more wondering if I’m a conservative man.

It strikes me that there is a very real danger in the glibness of this comment: “We’re being very ruthless,” said Sam Clay, director of the 21-branch system since 1982. “A book is not forever. If you have 40 feet of shelf space taken up by books on tulips and you find that only one is checked out, that’s a cost.”

Of course the article isn’t concerned with forty feet of books about tulips. It calls attention to Abraham Lincoln, Emily Dickinson, and To Kill a Mockingbird. And that last one, in particular, is interesting, if only because there’s been a real resurgence in interest in that book, and in Harper Lee in general.

No tags Hide

If librarians were the first wave of professionals to have their careers “threatened” by the advent of the Internet, 2006 proved to be the year when journalists came under similar fire. I just watched a News Hour piece titled “New Media, New Year” (mp3 version) in which journalists, educators of journalists and journalism analysts debated the pros, cons and significance of the year in which Time Magazine declared “You” to be the person of the year.

Somehow, it all sounded familiar: the fretting, the fear that the greater public does not understand the added value that a group of professionals brings to the information environment, the bated enthusiasm that one has for a new medium which has massive amounts of potential but which also severely disrupts all that you hold dear.

Journalists, welcome to the 21st century. As your gentle 5 minute ambassador from LibraryLand, I would like to assure you that things will work out. See these promising numbers from a forward thinking (and public!) institution, they are proof that relevance is, well, relevant to the effort a profession makes:

Last year we announced that items circulated during the 2003-04 year passed the 2 million mark. This year, we circulated just over 3 million items. This new circulation record represents a 33% increase and the highest annual percentage of increase in the Library’s history.

We are also experiencing growth in other key areas. Our buildings were visited 1.3 million times, an 8% increase. Attendance at programs increased 14%, over 51,000, and more individuals used library computers than ever before…223,000 logins represent a 37% increase over last year.

The need to expand our space and adapt to the needs and interests of the community is clear. Let us know what you need from the Library.

(source)

No tags Hide

Older posts >>