GPO and the LOCKSS Alliance

The U.S. Government Printing Office (GPO) is a member of the LOCKSS (Lots of Copies Keep Stuff Safe) Alliance. LOCKSS provides libraries with digital preservation tools and support so they can collect and preserve their own copies of authorized electronic content.

GPO’s work in this area is in keeping with our mission to provide permanent public access to official Federal Government publications in print and electronic formats through the Federal Depository Library Program (FDLP). The FDLP was established by Congress to ensure that the American public has access to its Government’s information. Since 1813, depository libraries have safeguarded the public’s right to know by collecting, organizing, maintaining, preserving, and assisting users with information from the Federal Government.

In a July 2010 interview in Library Journal (LJ), Ric Davis, GPO's acting Superintendent of Documents, told LJ that libraries working with GPO have expressed interest in having FDsys established as a trusted digital repository as well as the possibility of having their own local copy of GPO content.

"The foundation of the Federal Depository Library Program was built on a distributed model, with tangible publications being held throughout the country," Davis said. "We're looking at how we can continue that activity in the electronic world."

GPO's participation in the LOCKSS alliance is a step toward fulfilling the interests that have been expressed by the Federal depository library community.

What is LOCKSS?

LOCKSS (for"Lots of Copies Keep Stuff Safe") is open source software that provides institutions with a way to collect, store, and preserve access to their own, local copy of content. LOCKSS was developed by Stanford University, and it is currently maintained by the Stanford University LOCKSS Program Management Office with support from the LOCKSS Alliance. LOCKSS runs on standard desktop hardware and requires minimal technical administration. Once installed, the LOCKSS software converts a personal computer into a digital preservation box that creates low-cost, persistent, accessible copies of e-journal content as it is published. The accuracy and completeness of content stored in a LOCKSS box is assured through a robust and secure, peer-to-peer polling and reputation system. A LOCKSS box performs the following four functions:

  • It collects newly published content from the target e-journals using a Web crawler similar to those used by search engines.
  • It continually compares the content it has collected with the same content collected by other boxes, and repairs any differences.
  • It acts as a Web proxy or cache, providing browsers in the institution's community with access to the publisher's content or the preserved content as appropriate.
  • It provides a Web-based administrative interface that allows the institution staff to target new journals for preservation, monitor the state of the journals being preserved, and control access to the preserved journals.

Collecting

Before LOCKSS boxes can preserve a journal, two things have to happen:

  • The publisher has to give permission for the LOCKSS system to collect and preserve the journal. They do this by adding a page to the journal's Web site containing a permission statement and links to the issues of the journal as they are published.
  • The LOCKSS box has to know where to find this page, how far to follow the chains of Web links so that it doesn't crawl off the edge of the journal and try to collect the whole Web, some bibliographic information, and so on. In order to add new publishing platforms, the LOCKSS system provides a fill-in-the-blanks tool that a librarian or administrator can use to collect this information and test that it is correct. The information is then saved in a file (the LOCKSS plug-in) and added to the publisher's Web site or to some other plug-in repository, so that it is available to all LOCKSS systems.

Preserving and Auditing

The LOCKSS boxes at libraries around the world use the Internet to audit, continually but very slowly, the content they are preserving. At intervals boxes take part in polls, voting on the digest of some part of the content they have in common. If the content in one box is damaged or incomplete that box will lose the poll, and it can repair the content from other boxes. This cooperation between the boxes avoids the need to back them up individually. It also provides unambiguous reassurance that the system is performing its function and that the correct content will be available to readers when they try to access it. The more organizations that preserve given content, the stronger the guarantee they each get of continued access.

Providing Access

LOCKSS boxes provide transparent access to the content they preserve. Institutions often run Web proxies, to allow off-campus users to access their journal subscriptions, and Web caches, to reduce the bandwidth cost of providing Web access to their community. Their LOCKSS box integrates with these systems, intercepting requests from the community's browsers to the journals being preserved. When a request for a page from a preserved journal arrives, it is first forwarded to the publisher. If the publisher returns content, that is what the browser gets. Otherwise the browser gets the preserved copy.

Administering

Staff administer their LOCKSS box via a Web user interface. It allows for targeting the appliance to preserve new journals, monitoring the preservation of existing journals, controlling access to the box and other functions.

Additional information about LOCKSS is available from the Stanford University LOCKSS Web site.

Pilot Project

GPO received numerous requests from research institutions, universities, depository libraries, and other Federal Government agencies to investigate using LOCKSS as a means to manage, disseminate, and preserve access to Web-based Federal Government e-journals that are within the scope of the FDLP and the IES. As a result, GPO conducted a 12 month pilot to make Federal Government e-journals available to select pilot libraries that are operating LOCKSS boxes. The following provide a summary of GPO's evaluation of the pilot project.

 pdfExecutive Summary (40 KB)

 pdfReport (282 KB)