Skip to main content
An official website of the United States Government Here’s how you know
Official websites use .gov A .gov website belongs to an official Government organization in the United States.
Secure .gov websites use HTTPS A lock or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
  • GPO
    • U.S. Government Publishing Office
    • govinfo
    • U.S. Government Bookstore
    • Ben's Guide to the U.S. Government
GPO
  • Contact Us
  • Login
FDLP
  • Depository Tools
    • askGPO
    • Ben's Guide to the U.S. Government
    • Claims
    • Depository Selection Information Management System (DSIMS)
    • DiscoverGov
    • FDLP Data Manager (FDM)
    • FDLP eXchange
    • FDLP Resource Guides
    • FDLP Web Archive
    • Federal Depository Library Directory
    • GovInfo
    • Item Lister
    • List of Classes
    • LSCM GitHub Repositories
    • Print Distribution Dashboard
    • PURL Usage Report
    • Reporting Publications
    • Shipping Lists
    • WEBTech Notes
    • UNION-L
  • Requirements & Guidance
    • Collections & Databases
    • FDLP eXchange Tips
    • Guidance
    • Instructions
    • Promotion
    • Regulations
  • Preservation
    • Preservation at GPO
    • Trusted Digital Repository Audit and Certification
  • About The FDLP
    • Collaborations with GPO
    • Depository Library Council
    • Digital FDLP Implementation
    • Events and Conferences
    • FDLP Academy
    • FDLP Events Calendar
    • FDLP Training
    • File Repository
    • Join the FDLP
    • LSCM Digital Imaging Efforts
    • Mission & History
    • Notable Numbers
    • Projects
    • Superintendent of Documents
    • The National Collection
  • Cataloging & Classification
    • Catalog of U.S. Government Publications
    • Cataloging & Indexing
    • Cataloging Record Distribution Program
    • GPO Cataloging Guidelines
    • GPO Statement on Outdated and Offensive Language
    • Sources of GPO Cataloging Records
    • Superintendent of Documents Classification Guidelines
  1. Home
  2. FDLP Web Archive

FDLP Web Archive

  • Last Updated: June 23, 2025
  • Published: June 23, 2025

About

The FDLP Web Archive provides point in time captures of U.S. Federal agency websites. Unlike archiving and hosting individual documents, a web archive preserves the functionality of the entire website to the extent possible. The aim is to provide permanent public access to content found on Federal agency websites. GPO harvests and archives the websites with Archive-It, a subscription-based web harvesting and archiving service offered by the Internet Archive. 
 

Ways to Access the Archived Sites

Archive-It Website

Search ‘GPO’ or ‘FDLP’ on the Internet Archive’s Archive-It page to get to the FDLP Web Archive collection. This is the most direct way to search for and access archived websites in the FDLP Web Archive collection. All archived content in the collections is full-text searchable.

Catalog of U.S. Government Publications (CGP)

Bibliographic records are available for the archived websites, which describe the sites and link to them via PURL (Persistent URL). They are searchable and accessible through the Catalog of U.S. Government Publications (CGP) FDLP Web Archive page. A list of all FDLP Web Archive records is also available in the CGP.

Internet Archive’s Wayback Machine

FDLP Web Archive content is discoverable when a URL is searched in the Internet Archive’s Wayback Machine.

Frequently Asked Questions

Federal websites have become an important way that agencies communicate information to the public. However, web content often appears or disappears without warning. Archiving these websites is part of fulfilling GPO’s mission to provide permanent public access to Government information. The content is made available in accordance with Title 44 of the U.S. Code.

Archive-It uses a combination of crawling tools they have developed, including Heritrix and Umbra, to gather content. The crawler searches and captures an entire content rich website, creating a working facsimile of the site as it appeared when it was crawled. This helps preserve the website content as it appeared at a particular point in time. After the first crawl, the website is then re-crawled on a scheduled frequency. In that process, the crawler searches and captures the entire website again, creating a new working facsimile of the website as it appeared at the time of the re-crawl.

While there is other web archiving software available that could potentially meet some needs, the Archive-It service provides the technical support, training, storage, and user interface all in one package to meet the needs.

  1. Determine if the website is in the scope of the FDLP.
  2. Determine if Archive-It is the best tool for harvesting the website’s achievability.
  3. Notify the agency of intent to harvest data from their website, for new collections.
  4. Review the website to create or edit a seed list of domains that instruct the crawl.
  5. Run a test crawl and then perform Quality Assurance (QA) looking for any out of scope or missing content.
  6. Run any additional test crawls as needed to ensure the crawl will be effective and efficient. Before saving a crawl, typically multiple test crawls are done.
  7. Save successful test crawl(s) and QA the saved crawl(s).
  8. Run patch crawl(s).
  9. Create a record for the website in the CGP, for new collections.

Steps 3-7 are repeated for each re-crawl of the website.

All the harvested data is stored on Archive-It’s servers.

GPO owns all the harvested data. The data is in the public domain.

Archive-It uses the WARC (Web ARChive) file format which conforms to ISO 28500:2009.

The Internet Archive (the parent organization of Archive-It) keeps two copies of all data, including one copy on the Wayback machine. Read more about Internet Archive Storage and Preservation policy.

Videos in more simple formats such as WMV or MPEG4 can easily be captured and played back, however it can vary with more complex formats. Archive-It crawling technology can capture videos in other formats and platforms, such as Flash or Vimeo, however playback can vary due to the complexity of the make-up or how the video is embedded on a page. Archive-It is continuously working to improve video playback for all formats and platforms, and regular enhancements are made.

The initial collection development strategy to build the collection was to harvest all websites in the Y3 SuDoc classification of the Superintendent of Documents (SuDocs) classification scheme, which includes commissions, committees, and independent agencies.

From here, there was a concentration on a curated selection of non-standard Government sites, such as cio.gov. There was also a concept of topical collections, the first being Federal Native American resources on the web. This has been expanded to other topics of interest to the FDLP community, done in collaboration with GPO Collection Development Librarians. Also, nominations come from the FDLP community through askGPO.

To avoid duplication of effort, GPO refers to the Federal Web Archiving Interest Group for information about other existing or planned Federal Government web archive collections.

In an attempt to avoid duplicative effort, content found in GPO’s GovInfo is not harvested or archived, nor is anything already archived by other Archive-It partners, or anything already archived by our FDLP partners who are digitizing specific content from their FDLP collections (FDLP Partnerships). Nothing outside the scope of the FDLP is harvested.

Additionally, some websites, such as those with extensive databases or datasets are difficult for capture and playback. In these instances, partnerships with the providing agencies are sought to ensure permanent public access to their web content.

To avoid duplication of effort, GPO refers to the Federal Web Archiving Interest Group for information about other existing or planned Federal Government web archive collections.

Yes, please log in to askGPO and submit an inquiry. Under Select a Category, choose FDLP Web Archive.

After a website is harvested and archived, how frequently is ire-crawled for new content?

In the beginning the focus was on building the web archive. Then the focus shifted to maintaining and enhancing what had been built. After a crawl is complete the site is analyzed to determine frequency of updates according to how often the site is updated, either annual, biannual, or quarterly. Re-crawls are not automatically run and follow a workflow very much like what is done for any new site. For all re-crawls the site is fully analyzed, to evaluate if there are any changes to it, if the seed list needs to be updated, or if any new modifications need to be made before the new crawls are run.

The test crawl function of Archive-It eliminates this problem. If a test crawl brings back undesired results, the seed list is modified accordingly to make sure resources aren’t wasted capturing unwanted material during the actual Archive-It crawl.

The main collection development practice is to archive content that would traditionally be included in the FDLP. As such, only content that is publicly available is sought. It is never intended to harvest any material that was copyrighted, proprietary, or that contained PII. If you suspect that such content has been harvested, please contact us through askGPO, and provide us with the information, including the Wayback Machine URL, and it will be reviewed for possible removal following Superintendent of Documents policy.

The granularity of the cataloging depends on the content of the website. Some websites warrant individual catalog records for individual seeds of the website, while other websites only need one record that links to the full site.

Yes, please log in to askGPO, and submit an inquiry. Under Select a Category, choose FDLP Web Archive.

After a re-crawl, the website is reviewed to see if there were any major changes to the content. If there were major changes, the record is updated.

Yes. The websites are classified under the agency’s general publications category from the List of Classes, and then INTERNET is added to the end of the class. An archived website is assigned the regular item number that accompanies the general publications class for each agency.

For example, the SuDocs class for “NARAtions: the blog of the United States National Archives” is AE 1.102:INTERNET, and the Item Number is 0569-B-02 (online).

Because records for the websites are created in the normal workflow, libraries can obtain these records the same way they obtain all their other CGP records. See Sources of GPO Cataloging Records for ideas.

These websites are cataloged in the CGP because they are in scope of GPO’s Cataloging and Indexing Program, which aims to develop a comprehensive and authoritative national bibliography of U.S. Government publications, to increase the visibility and use of Government information products, and to develop a premier destination for information searchers.

There are two kinds of error messages that users might encounter:

  1. “Not in Archive”– This means the content underlying the selected link was never captured.
  2. “Error”– This most likely suggests a problem with the local media player software, and the user should check to see if their software needs updating.

The FDLP Web Archive is used for permanent access to entire Federal agency websites. The web archive was created using a variety of crawling technologies used by Archive-It. The harvested sites are then stored on Archive-It’s servers. The archived sites are full-text searchable and accessible through the Archive-It User Interface.

The CGP provides MARC bibliographic records. Records for digital format resources include PURLs, which are links to the digital content stored in repositories or hosted online. GPO began archiving or storing a copy of some web-based resources in 1998. Publications are saved to GPO’s Permanent server, and a PURL to that content is added to the bibliographic record. As needed, a variety of tools are used to capture monographs, serials, and some video and audio recordings.

GovInfo is a searchable content repository comprised of deposited content ingested by agreement with Federal agencies. GovInfo includes resources from all three branches of Government and includes most of the Congressional publications that are cataloged. GovInfo users can search within the repository across the full text and content metadata, as well as browse for content, view it, and download it.

Policies Related to the FDLP Web Archive

  • Harvesting Digital Federal Government Information Dissemination Products for GPO’s Superintendent of Documents Programs, SOD-PP-2016-5 (effective 12/19/2016)
  • Withdrawal of Federal information products from the National Collection of U.S. Government Public Information and GPO’s online U.S. Government Bookstore, SOD-PPS-8-2024 (effective 7/8/2024)

Training

  • Web Archiving for the FDLP (Video, 60 minutes, recorded in 2014)
  • Archiving & Cataloging Federal Agency Web Sites - GPO's Web Archiving Project (Video, 54 minutes, recorded in 2014)
  • A Time Machine for Federal Information - Using Web Archive content in government information reference work (Video, 59 minutes, recorded in 2017)
  • Tangible and Digital Preservation: Bridging the Divide by Preserving Government Information in All Formats (Video, 57 minutes, update on FDLP Web Archive begins 30 minutes in, slides are available, recorded in 2017)

Link

FDLP Web Archive

Login information

No login or account is required.

Help

For questions or suggestions, log in to askGPO to submit an inquiry. Select the category ‘FDLP Web Archive.’

FDLP

Federal Depository Library Program (FDLP) • 732 N Capitol Street, NW, Washington, DC 20401

Privacy Policy | GPO.gov | GPO Inspector General

Connect with GPO