Click here to skip to this page's main content.

 Hello!   The State of Washington is participating in our eBook lending program. Browse the growing lending library of over 250,000 eBooks!

Site Search

Site Search
Last edited by Anand Chitipothu
December 14, 2011 | History

Developers / Data


Open Library has a lot of catalog records, over 20 million editions and some 6 million authors. We're always looking for new records too, in addition to those created one-by-one by our patrons.

We always encourage new sources and are eager to explore new partnership opportunities. Since the project's inception, we have developed a process for working with new data feeds and merging them into the Open Library system.

This page is mainly about sending data into Open Library. If you're curious about getting data back out in something other than a dump of the entire dataset, check out our API.

Bulk Upload

If you want to send your bibliographic records to the Open Library, please follow these few suggestions:

  • Please do not include acquisitions records, or records for items that have not yet gone through the cataloging process.
  • Use an available bibliographic standard. Examples of such standards are MARC21, UniMARC & ONIX.
  • Each record must have a unique identifier. This is generally the local system record number plus an identifier for the system. In MARC21, the local system identifier is placed in the 001 field, and the library or system identifier, from the LC organization code, is in the 003. Another source of codes is the National Bibliographic Number, or NBN.
  • When sending records in a MARC or UNIMARC format, use that format's character set. When sending records in an XML format, use Unicode.

How to upload

  • Create an archive.org account: http://www.archive.org/account/login.createaccount.php
  • FTP the files to catalog-upload.archive.org using the username (most likely your email address) and password you just created.
  • Contact us and let us know if you have uploaded something.

Processing Catalog Records
Aside from some special cases (e.g. lists of ISBNs, book covers, holdings data), we take each data source, write a processor for it, and output Python dictionaries.

As records are added, an algorithm detects whether the book is already represented in the database. In that case, some new fields from the incoming record may be added to the record in the database, such as additional identifiers, new subjects, or a table of contents. The success of determining duplicates depends on the quality and accuracy of the data in the records.

We hope to make it easy to merge duplicates manually through the user interface so that Open Library patrons can do what the algorithm cannot.

We are also analyzing relationships between works (example: all of these editions of Tom Sawyer are all editions the same conceptual work). From this we can add relationships to each object and create new objects (like works). This process is known in the library world as "FRBRization". See http://frbr.org for more information.


Bulk Download

Open Library provides dumps of all its records in JSON format. If you want to download everything, please use this instead of our API.

See Open Library Data Dumps documentation for details.

We're looking forward to making the API more flexible so you don't have to download the whole thing in chunks.

BookServer

BookServer is an Internet Archive initiative intended to enable content creators and distributors to distribute digital books via a simple catalog format. At Open Library, we are excited about BookServer as it enables anyone to set up their own shingle and bring attention to a subset of books they specialize in. The Internet Archive is providing an open source OPDS aggregator as part of the bookserver project.

For More Information

Example Catalogs

BookServer is a useful mechanism for aggregating feeds and identifying books that are classified with different identifiers. If you create a catalog, there are a few rules-of-thumb to follow to ensure your catalogs will be included and your books properly identified.

  1. Provide crawlable feeds
  2. Provide identifiers that can be used for de-duping (ISBN, etc)
  3. Provide additional metadata aggregators can often understand a 'fuzzy' match - throw those obscure identifiers out there!

BookServer is a work in progress. We invite you to read the spec and get involved on the mailing list.

History Created November 21, 2009 · 35 revisions

April 16, 2013 Edited by Anand Chitipothu strike off ftp upload to catalog-upload.archive.org.
December 30, 2011 Edited by Edward Betts add Lending library MARC records
December 14, 2011 Edited by Anand Chitipothu Linked Open Library Data Dumps page.
September 9, 2010 Edited by Anand Chitipothu Edited without comment.
November 21, 2009 Created by George Added new Data page