Open Library provides dumps of all its data, generated every month. All of the dumps are formatted as tab separated files with the following columns:
-
type
- type of record (/type/edition, /type/work etc.)
-
key
- unique key of the record. (/books/OL1M etc.)
-
revision
- revision number of the record
-
last_modified
- last modified timestamp
-
JSON
- the complete record in JSON format
Dumps:
-
editions dump (~ 6.6G)
-
works dump (~ 2.0G)
-
authors dump (~ 0.4G)
-
all types dump (~ 9.2G): includes editions, works, authors, redirects, etc.
- complete dump (~ 23.3G): also includes past revisions of all the records in Open Library
For past dumps, see: https://archive.org/details/ol_exports?sort=-publicdate
Format of JSON records
A JSON schema for the various types is located at https://github.com/internetarchive/openlibrary-client/tree/master/olclient/schemata
-
Author Records: JSON serialization of a type/author
-
Edition Records: JSON serialization of a type/edition
- Work Records: JSON serialization of a type/work
OL Covers Dump
:TODO:
History
- Created December 14, 2011
- 34 revisions
August 7, 2024 | Edited by Drini | Fix dump sizes / instructions |
August 7, 2024 | Edited by Drini | New dumps are now available! |
June 8, 2024 | Edited by Mek | Edited without comment. |
June 8, 2024 | Edited by Mek | Edited without comment. |
December 14, 2011 | Created by Anand Chitipothu | Documented Open Library Data Dumps |