Web Harvesting

Documents related to GPO's Web Harvesting pilot project to capture official Environmental Protection Agency (EPA) publications in scope of GPO's information dissemination programs.


Publications from the sample pilot are available here and will be cataloged in the CGP in the future.


Two tests were applied by Blue Angel to determine if a document was in scope (i.e. considered to be an EPA publication). The first was to determine if a document was considered an in-scope publication, the second was to determine if the document is an EPA publication.
SOW on providing a number of different products and/or services related to the discovery, harvesting, and assessment of documents and publications from Web sites using Web crawler and other appropriate technologies (to be specified by vendor).
Reports on the specific context of the results of the pilot, including a summary of analysis done on the work performed, an assessment of lessons learned, and planned future direction and next steps for further development of the harvesting function to be implemented during Release 2 of GPO's Future Digital System (FDsys), currently scheduled for mid-2008.