Circulatable: a Librarian’s Group

Because sometimes you need to trammel the editor and exorcise the rules of grammar…

Last week a colleague introduced me to Sinatra, a lightweight web app framework for Ruby. The Sinatra website describes it as, “a Domain Specific Language (DSL) for quickly creating web-applications in Ruby.”

Just as David Berman urges residents to “leave Kentucky, come to Tennessee,” I can urge my shop to finally ♫ leave PHP, come to Ruby ♫. I have no problem with PHP, I would just like to move to Ruby for the small stuff as well as full-blown Rails apps. I have been looking for something to write simple one-off Ruby apps with, the kind of project that does not require a full Rails application because, for example, it requires no ORM as it has no database. Usually these one-offs were the kind of things I would punt over to PHP.

The particular case in which I am employing Sinatra is a one page web form for paying library fines with a credit card. Our campus has a central credit card payment vendor, so all we need to do is log someone in and figure out how much he owes according to our ILS. The form action submits somewhere else so we don’t need a full web app. We do need to take the logged in user’s ID and query our ILS through its API to get the fine amount. So we will wrap this web form in a campus login and before writing the form, make an HTTP call to the ILS to prepopulate the fine field.

The Sinatra app looks like the following:

./:
-rw-r--r--@ 1 myuser  admin   336 Dec  2 16:21 config.ru
-rw-r--r--@ 1 myuser  mygroup   623 Dec  3 07:54 application.rb
drwxr-xr-x  5 myuser  mygroup   170 Dec  2 13:10 lib
drwxr-xr-x  6 myuser  mygroup   204 Dec  2 16:22 public
drwxr-xr-x  4 myuser  mygroup   136 Dec  2 16:13 tmp
drwxr-xr-x  5 myuser  mygroup   170 Dec  2 13:10 views

./lib:
-rw-r--r--@ 1 myuser  mygroup  585 Dec  2 13:10 authenticate_patron.rb
-rw-r--r--  1 myuser  mygroup  441 Dec  2 13:10 my_account_service.rb

./public:
-rw-r--r--@ 1 myuser  mygroup    19 Dec  2 13:10 index.html

./tmp:

./views:
-rw-r--r--@ 1 myuser  mygroup  3552 Dec  2 13:10 layout.haml
-rw-r--r--@ 1 myuser  mygroup  2761 Dec  2 13:10 payfines.haml

Here is a breakdown of the files involved, getting the simple stuff out of the way first.

Rack & the ./tmp and ./public directories

Sinatra can be deployed as a Rack based app. On our servers we will run this application through Passenger/ModRails, so the tmp directory exists primarily for bouncing the app via

  $ touch tmp/restart.txt

The public directory is what Passenger uses for the application root in the Sinatra app. We are deploying to a sub-URI on the server so we have the following in the apache conf:

  RackBaseURI /fees

And in the document root for the server a symbolic link to point to the public directory:

  $ ls -l /path/to/apache/docroot
  lrwxr-xr-x myuser mygroup somedate fees -> /path/to/sinatra/webapp/public

The ./views directory

I place the HTML views in this location. I use a layout file for the full template for my website with a yield just like I would in a Rails app. The other file, payfines, is a view that corresponds to a matching route defined in the ./application.rb. This info renders the HTML form.

The ./lib directory

I am using this location to keep files that map the XML responses I expect to get back from my ILS into objects that will be available in my payfines view. I am using HappyMapper. The great thing about Sinatra is that you can require 'rubygems' and then use any Gem in your application.

The app itself and deploying

application.rb

The following is my first version of the application file itself. It is in need of error handling and refactoring, but I include it to show just how simple an app can be.

# application.rb
require 'rubygems'
require 'sinatra'
require 'haml'
require 'happymapper'
require 'lib/authenticate_patron'
require 'lib/my_account_service'
require 'net/http'

get '/payfines' do
  h = Net::HTTP.new('localhost')
  hresp, @auth_patron_xml = h.get('/fees/auth-patron-response.xml', nil)
  @patron = AuthenticatePatron::ServiceData.parse(@auth_patron_xml, :single => true)
  hresp, account_xml = h.get("/fees/my-account-response.xml?patronId=#{@patron.patron_identifier.patron_id}&patronHomeUbId=YYYY", nil)
  @account = MyAccountService::ServiceData.parse(account_xml, :single => true)
  haml :payfines
end

config.ru

The final piece is to create a configuration file for Rack. The Sinatra Book has a section on deploying to Passenger. You can also Google for examples like the following:

, Hide

This is just an old fashioned link log post. The following is a good series on functional loops implemented in Ruby by Rails Spikes:

, Hide

Jul

6

2009

Lifelike, messy

I wrote an article for the journal code4lib “Using a Web Services Architecture with Me, Myself and I” and I keep realizing all of the things it is missing. But that is what a blog is good for, right?

There is something that just feels right about creating three applications all working in concert to do the job of a single application: it feels a little bit messy, but good messy. It is not that the code is sloppy or carelessly composed. And while I wouldn’t necessarily go so far as to use cliches about the whole being greater than the sum of the parts, the messiness is what makes the application lifelike. In other words, it is like a library. It is as if each individual application comprises a different department making a contribution to the entire teaching and research mission of the library.

Part of this line of thinking is influenced by an excellent article my colleague Allan forwarded, “Design in the Age of Biology.” In it, the author discusses what he calls the rise of service design. He characterizes service in the following way:

Robert Lusch [14] wrote about changes in marketing, describing a service-dominant logic in which “value is defined by and co-created with the consumer rather than embedded in output.” The “make-and-sell” strategy of linear value chains gives way to the “sense-and-respond” strategy of self-reinforcing “value cycles.” Lusch described traditional goods-centered dominant logic as focused on “operand resources,” tangible assets with inherent value. He contrasted that logic with emerging service-centered dominant logic focused on “operant resources,” intangible assets, which create value in their use, such as skills, technologies, and knowledge.

In our case looking at the way in which our applications operate, the value is derived from continuing to further develop their service orientation. Their value is initially based on the service providing behavior: they expose data that is reused and repurposed by other applications. But now I am finding that there is a self reinforcing cycle that is beginning to emerge as we discover other ways to put that data to work.

Which is to say, these applications are beginning to take on a life of their own.

No tags Hide

My former boss and colleague Andrew Pace recently commented on the nature of the network and how he was rebuffed by a colleague for overlooking the fact people that make up the network and this is the most sigificant piece of a network. I would like to respectfully disagree with his post. Andrew used to boast that he is 100% right 50% of the time and in this case I believe he was right during the initial part of his musings on this topic.

What is the significance of the network in the 21st century? What we understand as the network is a contemporary realization, or maybe the automated reality, of the old adage that the total is greater than the sum of its parts. And quite frankly this realization was made possible by the amazing things that computers are doing with data.

Page Rank is arguably the shot heard throughout the Web. With their Page Rank algorithm Google was able to solve a problem that was plaguing relevancy in Internet search results: we’re all a bunch of dirty rotten liars. Back in the Yahoo/Alta Vista early days of search engines people were figuring out ways to game the system by lying through their metadata. In order to have their crappy cover band’s web page show up when a user searches for the Rolling Stones the cover band simply needed to put ‘rolling stones’ into its metadata.

Page Rank came along and solved the problem by saying, ok, we will let the network sort out the relevancy and if the network can prove that your website is a good one, you will be rewarded in search results rankings. This is the significance of the network. For better or for worse, the network can prove whether or not the data byproduct of the people is in fact worth what those people claim it is worth.

As Ian Ayers points out in his book Super Crunchers, the world is now using data to make better predictions than traditional experts. What is more, the statistical models being used by doctors, corporations, governments and non-profits are able to leverage the network effects of large data sets to verify how well those predictions are performing and improve those predictions instantly as new data becomes available.

I believe that my issue here is all sematics and I may simply be quibbling over something petty. However, I am splitting hairs over this point because this is a troubling area for libraries in my view. If we get caught up in the mushy people narrative over one of the most significant cultural shifts that is occurring right now, we will miss the point and consequently we will miss the opportunities to maintain the cultural relevancy of libraries in the future. The danger, in my opinion, is similar to the paralogism that because I know the structure of a MARC record I understand how it is stored in a modern RDBMS.

It is imperitive that we know how Lucene/Solr works so that we can make better resource discovery systems. It is similarly imperitive that we understand how to get in the super crunching game. As Andrew and his colleague Lorcan Dempsey have noted on numerous occassions, we need to do much more with our data, because it’s the network effect, stupid.

(For the record, I do not intend to call either Andrew or his colleagues stupid, I am just leveraging a theme that he and I have been riffing on for a couple of years.)

No tags Hide

May

12

2008

Is search != search

Here is a simple question with profound implications: is library search the same thing as the “search” in the way the population at large understands search or Googling?

The question is very simple and one that I think has been in the back of my mind for quite some time, but I just read an excerpt on statelessness on the Web from RESTful Web Services that provided me with a new way to frame the question. Richardson and Ruby write:

When you ask for a directory of resources about mice or jellyfish, you don’t get the whole directory. You get a single page of the directory: a list of the 10 or so items the search engine considers the best matches for your query. To get more of the directory you must make more HTTP requests. The second and subsequent pages are distinct states of the application, and they need to have their own URIs: something like http://www.google.com/search?q=jellyfish&start=10. As with any addressable resource, you can transmit that state of the application to someone else, cache it, or bookmark it and come back to it later. (emphasis added)

Here the user behavior seems to be: “Hey, Google, show me whatcha got for jellyfish.”

When I go to my library’s catalog and search for the word jellyfish I think my behavior is different because my expectations are different. I am not expecting the top 10 items on the topic. I am instead doing two different things:

  1. First, determining whether anything exists on the topic at my library
  2. Second, retrieving and evaluating a list of these items if they do in fact exist

The difference is that of course Google will have information on a topic because Google aggregates everything (or so it goes in the popular consciousness). The library on the other hand should have something on your topic if your topic serves one of the known collection areas of the library. Understanding the stateless nature of the Web seems to bring this out. The following URIs do not reveal the same state:

  1. http://www.google.com/search?q=jellyfish: what are the ten best resources about jelly fish according to Google
  2. madcat.library.wisc.edu…Search_Arg=jellyfish…: how many, if any, resources about jellyfish does my library have

In designing the interfaces for a library catalog front-end, it would be important to be mindful of this distinction since you are answering two very different questions.

No tags Hide

Feb

3

2008

Know Yourself First

It does not matter that Microsoft may buy Yahoo–the acquisition is based on a flawed premise. Technology companies cannot operate like the GEs and General Motors of the world and serve as the be-all-end-all of technology. The New York Times today put the acquisition in the right context. Describing the business culture of Silicon Valley, they write:

The economist Joseph Alois Schumpeter had a name for this principle of capitalism: creative destruction. Perhaps nowhere does it play out more dramatically — and more rapidly — than in Silicon Valley, where innovation unleashes a force that creates and destroys, over and over.

Technology companies are susceptible to creatively destructive forces when they try to expand too far beyond their original mission. Technologies like computer programming can only be successful if they break problems into smaller pieces that individually solve only a single component of the larger goal. At the time of writing, a computer programming function is defined by the masses (Wikipedia) as “a portion of code within a larger program, which performs a specific task and can be relatively independent of the remaining code” (my emphasis). This principle of modularization at the most basic level of contemporary information technology is important to a technology organization’s business model.

Microsoft and Yahoo both fail so horribly at the world of search and Internet advertising because those problem domains lie at the heart of neither companies’ core service: the operating system/desktop platform and the Internet portal. The reason Google so thoroughly dominates the world of search and Internet advertising is because that is its only core. Everything it does revolves around this core service and all of its activities support this model. The moral of the story is that you must choose your core, your identity and your raison d’être and you must choose it wisely because trying to be all things to all people is a futile exercise.

What does this mean for libraries? In the techie realm of libraries, an institution needs to determine what its core mission is and decide how it will define itself in a world of creative destruction. It will need to be able to clearly and succinctly articulate what those goals are to its affiliate institutions: universities or local governments. The library must not try to do everything; as the current computing paradigm of APIs and web services demonstrates, technology works when it is implemented singularly and exceptionally, but in a manner that is open and unafraid of sharing its data and services.

And finally, the modern library must not be afraid to get in the game and take a turn at trying to creatively destroy the old guard, lest it fall prey to the fate of the Yahoos of the world.

No tags Hide

The New York Times has a short piece on a new Google service called Knol that sounds like it could have been conceived by librarians:

“We believe that knowing who wrote what will significantly help users make better use of web content,” wrote Udi Manber, vice president of engineering, on the official Google blog.

The service appears to be a wiki-style hosting service that puts a premium on identifying authorship.

No tags Hide

Karen Coyle has a great piece on Hierarchies vs. Relationships in bibliographic modeling. She points out that the point of the FRBR model is not so much the hierarchy that you get to model, but the relationships that you can reveal among things.

This is a keen insight in my view since it really begins to get at the fun stuff that the Googles, Amazons, etc are doing with data that libraries long to do with bibliographic data. Coyle starts to articulate something here that I have not been able to put my finger on: the way that FRBR is a huge step forward but still only has an eye toward an implementation rooted in the way libraries have traditionally done things.

My library right now has been in discussions about subject guides and how to best build and provide access to them. I have felt for some time now that it would be great to get out of a next-generation catalog a system that imparts the kind of knowledge our librarians and subject liaisons put into these projects. Coyle’s post renewed this thought by framing the new catalog model in terms of a “Knowledge Management system,” which to my mind is the true aim of a discovery system.

In the past when I have tried to express a hybrid of a next-generation catalog and a subject discovery tool, I have always framed it in terms of applying graph theory to bibliographic data. I think Coyle’s post helps me to understand this. It seems obvious to use subject terms and call number ranges as one type of edge/vertex for nodes which are bibliographic items. However, her discussion raises the possibility of a new set of different kinds of edge types: translations, abridgements, extensions, etc.

More on this later…

No tags Hide

Oct

17

2007

I’ve been busted!

Unless Karen Coombs is writing about some other reference statistics tracking package that has an (until recently) undocumented dependency on Pear::DB, her blog post calls out one of the (numerous) failings of Libstats: Installation is difficult for a lot of people. I get a lot of questions from people who have trouble with mod_rewrite or don’t know DB is required or various other things.

I’ve had similar negative experiences with open-source software, and actually releasing something gave me a much better understanding of why things wind up like this.

A few years ago, our library decided to write a reference tracking system and pilot it at a few libraries across campus. Since I was, then, the only developer at our library, the task fell to me. Once the system had proven successful at Madison, I thought, “Hey, maybe other people would like this, too.” I got the OK from my boss to release the code under an open-source license.

This, it turns out, is tricker than it might seem. All of those steps I’d fumbled through to make the software run, I had to eliminate, or at least explain, to people installing this software on the servers they have on hand. Databases need to be created and populated with initial data. Web servers need to be configured. Did I want to provide a demo? Screenshots? Big software projects provide installation wizards, but writing those is a bunch of work, and from my boss’s perspective, the software was written and done, and I had other projects to work on.

Then, there were concerns over the quality of the code. There’s some ugly shit in there. Did I really want people looking at that, and pointing and laughing? What if there’s a security bug in the code that could compromise someone’s server? Even if it relies on server misconfiguration, I’d feel pretty lousy if my code got someone hacked. How will people find out about, obtain, and install patches? Seriously, I wondered, is it even worth the work it’s gonna take to release this code?

Finally, I decided that it was worth the work, and that I’d release it, warts and all, in the hopes that it would be useful to some people. In the time since then, I’ve realized that the motivations of an open-source developer are different from that of a commercial project manager. I don’t get any reward from wide adoption, except a warm fuzzy feeling inside and possibly bragging rights if I make something exceptionally neat.

The bottom line: There’s a large cost and a limited benefit to making an open-source project into an open-source product, and that work will never ever happen as long as the project is only used internally — it’s not needed.

Here’s the question, then: Is it better to release something half-baked, in the hopes that it will be useful, or to keep it purely internal and let someone else solve the problem?

(On the particular topic of not documenting the Pear::DB requirement: when Libstats was released, DB was part of the standard PHP install, so this wasn’t a common issue. Reworking the code to use Pear::MDB is the right option, but that’s nontrivial.)

No tags Hide

Pubcookie is pretty neat. It lets you authenticate against a login server without ever personally seeing the user’s password — it’s all handled via clever web server modules, redirects, and the REMOTE_USER variable. But, when you go to build a web app with it, you’ll likely find yourself pining for session-based logins. Fortunately, it’s easy to build an OpenID service that’s backed by Pubcookie. Here’s how:

What you need

  1. A web server with working Pubcookie authentication.
  2. An OpenID server. I had good luck with PHP-OpenID, and I’ll be using their example server in this post.

Set up your identity URLs

OpenID identity URLs are what people enter in OpenID login boxes around the net. The pages they point to aren’t anything special — in the simplest case, they just need to have a link to your OpenID server (also called a ‘provider’). It’ll look like:

<link rel="openid.server" href="http://example.edu/op/server.php" />

I used Apache’s mod_rewrite such that all URLs of the format:

http://example.edu/id/<username>

Would be valid identity URLs, linking to an identity provider service.

Note: Your identity URLs don’t need to be served over HTTPS, and they must not be protected behind Pubcookie.

Set up the OpenID provider

Follow your package’s installation notes, and get one statically-defined identity URL working. Also test to make sure the other OpenID identity URLs you’re providing don’t work.

If you’re looking for a place to test URLs, try this OpenURL test service. Your provider URL can’t be behind a firewall or protected by Pubcookie — other web servers need to talk to it.

Make note of the name of the session key your OpenID library is using. By default, PHP-OpenID uses openid_server. You’ll need it in the next step.

Make Pubcookie set a session variable

Here’s the magic step. You need a script, protected by Pubcookie, that puts the value of REMOTE_USER into your session (remember, your provider can’t be behind Pubcookie) and redirects you to your OpenID provider’s login URL. Since no one can view this script without authenticating via Pubcookie, and this script is the only place this session variable can be set, you need to go through Pubcookie to set this variable.

I put this script in http://example.edu/op/pubcookie/index.php:

session_name('openid_server');
session_start();
$_SESSION['pubcookie_user'] = $_SERVER['REMOTE_USER'];
header("Location: http://example.edu/op/server.php/login");

Hack your OpenID provider to respect the session

Here, you want to find the code in which authentication is checked, and replace it with a check for the session variable you set above. In this example, I replaced action_login() in actions.php with:


function action_login() {
if (isset($_SESSION['pubcookie_user'])) {
$info = getRequestInfo();
$openid_url = "http://example.edu/id/".$_SESSION['pubcookie_user'];
setLoggedInUser($openid_url);
return doAuth($info);
}
else {
return login_pubcookie_render();
}
}

I also added login_pubcookie_render() to render/login.php — it simply uses redirect_render() to send visitors to the pubcookie-protected page. Anywhere else in the code you’re showing the login page, use login_pubcookie_render() instead.

Finally, you’ll want to do a check in the method that actually does the authentication to make sure the identity URL matches with the Pubcookie username — you don’t want people to use their own credentials to log in as someone else. In common.php, I added a check to the start of doAuth():

if ($req_url != $user) {
return login_pubcookie_mismatch($user, $req_url);
}

And added a login_pubcookie_mismatch() method to login.php, which warns that their username and URL don’t match, and that they should fix that situation.

Log out of everything and give the OpenID test a try. It should redirect you to your Pubcookie login system, and from there, to a working ID.

No tags Hide

Older posts >>