New Set of All CGP Records Available on GitHub
Library Services and Content Management (LSCM) has posted a new set of all MARC bibliographic records (1,022,882) in the Catalog of U.S. Government Publications (CGP) as of March 30, 2022, on GitHub. Brief bibliographic records, also known as brief bibs, are not included in the files. The records are available in UTF-8 in the cataloging-records-all-cgp-utf8 repository and in MARCXML in the cataloging-records-all-cgp-marcxml repository.
In combination with the monthly files in the CGP_MARC_Records collection, these files essentially represent the entire CGP. The GPO will periodically post new snapshots of the entire CGP.
The total size of the UTF-8 files is 1.87 GB, and the total size of the MARCXML files is 5.4 GB. The records in the files are not organized in any particular manner. The UTF-8 repository has 26 files, each of which contains approximately 40,000 records. The MARCXML repository has 102 files, each of which contains approximately 10,000 records.
Please submit questions about the files via askGPO in the “Cataloging/Metadata (Policy and Records)” category.