Click here to skip to this page's main content.

 Hello!   The State of New Jersey is participating in our eBook lending program. Browse the growing lending library of over 250,000 eBooks!

Site Search

Site Search
Last edited by Jessamyn West
February 14, 2015 | History

Internet Archive BookReader

The BookReader was developed by the Internet Archive and open source contributors to provide online access to scanned books. The Internet Archive has more than 1,000,000 scanned books available to read online. The BookReader is also used to provide access to materials from many other organizations.

Accessing the BookReader on Open Library

Look for the Read Online links or the open book icon Read Online Icon to open a book in the BookReader. You can search for available books by checking the "Only show eBooks" checkbox in the Open Library search.

Accessing the BookReader on the Internet Archive

To use the BookReader, click on the "Read Online" link on the left side of an archive.org details page.

Download source code

The BookReader source code is available in the openlibrary.org github repository.

Report a bug or request a feature!

Developer Info

Contributors to the Internet Archive Bookreader (2010-2011)

  • Michael Ang (mang) - Bookreader Core - archive.org
  • Lance Arthur (lance.arthur) - Markup and CSS - archive.org
  • Edward Betts (edward) - Full-text search - archive.org
  • Jeff Kaplan (kaplan) - QA - archive.org
  • Raj Kumar (raj) - Bookreader Core - archive.org
  • Mike McCabe (mccabe) - Table of Contents - archive.org
  • George Oates (glo) - Design - archive.org
  • Alexis Rossi (alexis) - QA - archive.org
  • Yankl - Right-to-left patch - github commit
  • shenzhuxi - encapsulation of $ and IE6 fixes

Contributors to older versions

Contributors to version 2 (gnubook, 2008-2010)

  • Raj Kumar (raj) - Bookreader Core - archive.org
  • Michael Ang (mang) - Bookreader Core - archive.org
  • Rebecca Malamud (webchick) - OLPC interface - invisible.net
  • Anand Chitipothu (anand) - OLPC interface - archive.org
  • Alex Osborne (aosborne) - back button fix - nla.gov.au
  • Jeffrey Ventrella (jeffrey) - icons - ventrella.com
  • Lance Arthur (lance.arthur) - icons - archive.org
  • Stephanie Collett (scollett) - thumbnail view mode - California Digital Library

Contributors to version 1 (flippy, 2005-2008)

  • Jesse Crossen (jesse) - original author - tackledesign.com
  • Brad Neuberg (bkn3) - IA integration - columbia.edu

Examples in the Wild

Attribution

The Bookreader is licensed under the GNU Affero General Public License v3.0. It is built upon these open source tools:

BookReader demo

Once you've downloaded the source code you can open BookReaderDemo/index.html in your web browser and you should see an example book. To use your own images you would modify BookReaderJSSimple.js to connect the BookReader with your book's page images and metadata.

Embed example

For books hosted on the Internet Archive the BookReader can be embedded on any site that allows you to add an iframe, for example using the code below.

<iframe src="http://www.archive.org/stream/abroadcranethoma00craniala?ui=embed" width="480px" height="480px"></iframe>

You can also link to a specific page and specify that two-page mode should be used:

<iframe src="http://www.archive.org/stream/abroadcranethoma00craniala?ui=embed#mode/2up/page/18" width="450px" height="400px"></iframe>

Features

  • Single-Page, Two-page, and Thumbnail view
  • Zoom
  • Right-to-left page progression (e.g. for Yiddish and Chinese)
  • Full-text search with highlighting of search results
  • Support for foldouts and variable page size
  • In-Browser Text-To-Speech
  • Embeddable
  • Bookmark-friendly URLs
  • Works with a variety of image servers, or a simple directory of images
  • Simple access control

Serving Images

In the case of the Standalone Demo, operation is fairly simple. Images are numbered sequentially and stored in a directory called "StandAloneImages". The images are all the same size, and two functions in BookReaderJSSimple.js, getPageWidth() and getPageHeight(), return the page size. Scaling is done in the web browser.

For books scanned by the Internet Archive and stored on archive.org, bookreader operation is a bit more complex. During the book scanning process, each page is imaged using a high-resolution digital camera, and then each page is cropped and deskewed. The size of the cropped image is stored in a file called scandata.xml. During this process, some images, such as color cards, white cards, and tissue paper pages, are marked as pages that should not be displayed. This information is also stored in scandata.xml. The cropped and deskewed images are stored in JPEG 2000 format in a zip file called bookid_jp2.zip. The raw images, the cropped and deskewed images, and the scandata.xml file are available for each book on archive.org.

Because crop boxes can vary between pages for Internet Archive books, the getPageWidth() and getPageHeight() functions can return a different size for each image in the book. Also, since some pages are not supposed to be displayed, the getPageURI() function maps an "index number" that the book reader uses to a "leaf number" that corresponds to an image in the jp2.zip file. These functions use information from scandata.xml to determine size and url for each page.

Because web browsers generally cannot display JPEG2000 images, a piece of code called BookReaderImages.php performs on-the-fly JPEG2000 to JPEG conversion on the archive.org cluster. For efficiency, this code also provides server-side image scaling and other image processing.

More information on how BookReaderImages.php works.

Testing and Release Process

Bugs and feature requests against the BookReader can be filed in the BookReader Launchpad Bug Tracker. During development features and bugs in progress are targeted to the next milestone. Once all the bugs for the milestone are in the "Triaged" status and ready to test the code is given to QA for testing. Since Launchpad does not have separate QA bug statuses we use the tags "needs-qa", "qa-verified" and "qa-reopened" to designate bugs that are ready for testing, verified by QA or have problems with the fix.

Code in progress on a development milestone branch may be pushed to our openlibrary/BookReader GitHub account on that branch. Once the release candidate has been approved by QA it is pushed out to GitHub on the branch and also merged into master. The branch point is tagged with the release name which will automatically make it appear on our BookReader GitHub downloads page.

Unit Testing

The BookReader is starting to have unit tests, written in QUnit. We are setting up a BookReader TestSwarm server.

Extending bookreader functionality

To make is it easier to work with other sources, the bookreader can expect a book interface from the client with the following methods.

getPageCount()
getPage(index)
getPageWidth(index)
getPageHeight(index)

The book reader can have optional functionality which is enabled only when the book implementation has a specific method defined.
For example, bookmarks functionality can be enabled only when the book interface has getBookmarks method defined.

getBookmarks(index)

Typical usage:

<script src="/js/bookreader.js" type="text/javascript"/>
<script src="/js/archivebook.js" type="text/javascript"/>
<div id="book1"></div>
<script>
    document.onload = function() {
        var b = bookreader("book1", ArchiveBook("tomsawyer"));
       b.showPage(42);
    }
</script>

History Created September 15, 2008 · 154 revisions

February 14, 2015 Edited by Jessamyn West added new example!
November 21, 2013 Edited by Anand Chitipothu Fixed the link to download source code.
January 3, 2012 Edited by mangtronix Edited without comment.
January 3, 2012 Edited by mangtronix Edited without comment.
September 15, 2008 Created by Anand Chitipothu book reader suggestions