Web Harvesting
Documents related to GPO's Web Harvesting pilot project to capture official Environmental Protection Agency (EPA) publications in scope of GPO's information dissemination programs.
Categories
Publications from the sample pilot are available here and will be cataloged in the CGP in the future.
Two tests were applied by Blue Angel to determine if a document was in scope (i.e. considered to be an EPA publication). The first was to determine if a document was considered an in-scope publication, the second was to determine if the document is an EPA publication.
Outlines criteria specifying the characteristics of publications within scope of GPO’s information dissemination programs and the pilot project to harvest publications from the U.S. Environmental Protection Agency (EPA) Web site.
Table representing the final rules used by IIA.
Blue Angel Technologies and Information International Associates, simultaneously but separately crawled the EPA Web site for official EPA publications from March to September 2006. This document contains those results.
SOW on providing a number of different products and/or services related to the discovery, harvesting, and assessment of documents and publications from Web sites using Web crawler and other appropriate technologies (to be specified by vendor).
Guidelines for creating brief metadata records for publications in the special materials category.
Timeline for the special materials cataloging demonstration project.
Reports on the specific context of the results of the pilot, including a summary of analysis done on the work performed, an assessment of lessons learned, and planned future direction and next steps for further development of the harvesting function to be implemented during Release 2 of GPO's Future Digital System (FDsys), currently scheduled for mid-2008.