Fiddling while the Library Burns

John Hubbard
8 min readJan 18, 2017

“There is nothing quite so useless as doing with great efficiency something that should not be done at all.”
— Peter Drucker

Aside from an integrated resource discovery layer on the front end (Primo), a wealth of new options for managing how we acquire and describe materials, especially in electronic formats, is available on our new library platform (Alma). The circulation functions have many enhanced features as well. Both our Archives and Special Collections departments have migrated from using triplicate paper forms to taking advantage of Alma’s capabilities to handle paging requests, walk-in patrons, and in-house use. This allows them to measure use statistics, using a built-in analytics module, much more easily.

One of the reasons for this conversion is because our consortium decided to no longer purge patron records, namely the complete history of what you’ve ever checked out, and those units had previously tracked that information permanently and separately. This was a contentious decision. I wish we had the ability to offer patrons a choice to opt-out or maybe even opt-in for that sort of data retention. Barring that, I will say the small number of times I’ve dealt with a user who had a problem with our privacy practices was because they couldn’t see what they had checked out years ago due to our previous policy of scrubbing those transaction histories.

Collecting this sort of data is necessary so that we may offer customized service, which is a site component more and more patrons are expecting to benefit from. Primo and similar products are starting to follow commercial trends: they now offer personalization features, tailored recommendation engines, and are otherwise doing away with presenting objective search results.

The choice to store this data should not be made lightly. Any personally-identifying information an organization retains is essentially a toxic asset, and not just because it can be hacked. Considering how our President-elect is a fan of waterboarding and wants to kill the families of terrorists, I don’t see our concerns over the government snooping on library patrons as baseless hysteria.

Librarians’ commitment to our readers’ privacy is a basic tenet of librarianship. Along with the concept of “The Freedom to Read,” it’s heavily instilled at library school, where we learn not only of the FBI’s Library Awareness Program and Section 215 of the USA PATRIOT Act, but also our fundamental distinction from advertising companies like Facebook, which deliberately and increasingly collect and profit from consumer data. Recent examples of this dedication in action are the push for HTTPS to deliver library website traffic and our active promotion of privacy safeguards, such as Tor.

In a utopian world, a library would need to collect zero data on its patrons and their use of the library. Several recent thefts (well, technically the immoral emptying) involving those community bookshelves known as “Little Free Libraries” demonstrate the folly of such an approach. We need to keep track of who has what books checked out to ensure their return; without the ability to report you to a collection agency, withhold your diploma, and maybe even get an arrest warrant issued for theft of public property, libraries couldn’t control their losses.

Then there’s a greyer area. Think of those delayed-egress locks you see on fire exit doors as an analogous blurring of the priority that liberty and safety now take over increased security. My library, like many others, has surveillance cameras set up in our public spaces. Although there’s certainly no reason for these to exist in order for us to provide services, as explained by our privacy policy: “For the security of our users, our staff, and our collections, activity in the UWM Libraries public areas may be covered by security video surveillance cameras. Cameras are monitored from stations within the staff offices. The activity captured on the cameras is also recorded and archived.”

Another way your library records are becoming slipperier to manage is because of the cloud. In theory, our patron data in Alma could be accessed from the vendor’s servers without us ever knowing about it. This sort of issue has come up with Canadian and other foreign libraries’ subscriptions to US-hosted products such as RefWorks, and it’s also reflected in concerns over how library search engines call upon external sites (namely Google and Amazon) to serve up cover images within their results display.

Pretty much every library out there is nowadays under budget constraints and a mandate to justify their continued existence in the light of traditional library services becoming, maybe not obsolete, but clearly of diminished demand. Case in point, the research and discovery process is increasingly happening outside the library. As we look for more ways to measure our impact and communicate our value, there’s also pressure, or at least a contrived desire, to measure library use right down to the individual level, for which tracking personal data becomes necessary.

Looking at actual use data and other patron behavior is a great way to solve debates about what your library should be doing. Usability testing is a great example of this, especially given how seemingly everyone who works in a library imagines themselves a UX expert when it comes to building the best interface design.

Another best practice is objectively measuring collection use for making purchasing decisions. We recently completed one in a series of journal cancellation projects, and to the credit of our assessment librarian, the determination on what to cut was based on cost-per-use calculations and not the time-honored method of measuring who could whine the loudest.

Implicit in the gathering of and reliance upon aggregated use statistics is the assumption that all users (and non-users, for that matter) are of equal importance. While we may be curious to discover who’s reading what, finding out that physics students are accessing physics journals isn’t terribly actionable data. Continual assessment is valuable if and only if the data collected and analyzed makes a difference in decision-making. Demographic information could identify where outreach is most needed, for instance. I suppose individual library use could be correlated with academic performance, and if significant, would constitute strong evidence for funding specific resources. But is pairing library use with better grades really going to win over state legislators for increasing our budgets?

Vendors would surely love to track personal use of subscription sites, if for nothing else to block access by certain individuals (or “foreign thieves,” according to one publisher) conducting massive or systematic downloads in violation of our licensed terms of use. Whether it be from something like phishing or activities where the patron is more complicit in the sharing of their credentials, we regularly deal with a steady stream of compromised accounts that need to be deactivated. A big tool for aiding in these investigations is our remote authentication system, EZproxy.

The original purpose of EZproxy, and a function it continues to serve quite well, is that it allows our patrons to access institutional subscriptions from outside the campus network. Rather than having to configure proxy server settings in a browser (which, as anyone who remembers doing, is a substantial support commitment), a user is seamlessly directed to a login page when attempting to access a library database or other electronic subscriptions from off campus. There are the occasional configuration and troubleshooting issues involving patron accounts, cached cookies, and moved or secure sites, but otherwise it usually just works.

As recently posted on the EZproxy e-mail discussion list, several academic libraries are now using the system as a gateway for all traffic, effectively treating everyone as if they were off campus and thereby requiring authentication. This is done, as one librarian put it, “so that we could garner accurate usage stats for our resources.” Presumably, this has to do with either vendors who provide inadequate data or a desire to track individual use patterns.

Most library subscription sites, although not all, already offer libraries rather detailed use statistics. They are not (unless in the case where users create accounts) tied to personal use, however. So this new “off label” method of using EZproxy would catch and log additional usage, but not all: assuming a vendor retains IP-based access for the entire campus community, then old, unproxied bookmarks, web search results, e-mailed links, and the like, when used on campus, would still function and avoid such tracking.

If people had to log in, regardless of their location, to access any library resources, one ancillary benefit would be a bit of user education that such licensed content isn’t free. There would also be a uniform access procedure for patrons, regardless of location. One sticking point to such a plan is how, as a public research university, we should at minimum retain access to licensed sites from walk-in workstations for community users and other unaffiliated patrons.

Aside from those concerns, there is the usability impediment of that extra and not-totally-required sign-in screen which all on-campus users would be faced with. So explaining and justifying this to people who have not previously had to login while on campus is one potential hurdle. We don’t want them feeling the library is being needlessly invasive. Remember how Radio Shack used to ask all customers for their phone number?

While pretty much the entire reason for Bitcoin’s success is because some of us don’t want to be tracked, citizens today are probably on average becoming less unnerved by how invasive corporations have become. Maybe they just don’t realize how much information is being collected about them. I’ll bet that the average student sure is unaware their teacher can see and scrutinize a full and precise record of every time each student has accessed their course site, for example.

If we really want to track personal use, I’m also wondering, given the amount of data we already collect (namely the existing ability to log all remote users), if statistical sampling of on and off campus traffic, without implementing the added end user barrier, could be used to adequately extrapolate any desired information. Offering library services to vulnerable populations and patrons who wish to remain anonymous (which, as with the aforementioned Tor example, has some very real applications these days) may also substantially promote their use, much in the way that virtual reference alleviates the so-called fear of reference effect.

Maybe that’s why, as the ALA puts it, “Databases and other digital resources provided by the library should allow anonymous searching and should not require users to reveal personally identifiable information.”

All of these baby steps that libraries are taking in collecting more and more data about their users is eroding one of the last differences which separate us from corporate entities. As our commercial competitors have so aptly demonstrated, we have to create the future or else others will do it for us. I’m therefore not sold on our desire to track individual use as justification for abandoning, or at least cutting corners with, our commitment to reader privacy. Libraries should tread carefully when it comes to collecting patron data, particularly if solely done for assessment purposes. Bad things start happening when we view patrons as a means to an end.

Further Reading

--

--