← Back to "About the Open Library"
The Internet Archive BookReaderTo view the new bookreader on archive.org, click on the "Read Online" link on the left side of a book's details page, for example the details page for the Bird Book.
The GnuBook Bookreader is licensed under the GNU Affero General Public License v3.0. It is built upon these open source tools:
The GnuBook source code is available in the openlibrary.org github repository. Click here to browse the source code.
Once you've downloaded the source code you can open GnuBookDemo/index.html in your web browser and you should see an example book. This is a good starting point when making modifications to GnuBook since it uses the latest code from GitHub.
This demo will show you how to use the bookreader with your own images and does not require an Internet connection. The standalone demo uses an older version of GnuBook and is not recommended as a starting point for modifications (download from GitHub instead). Download the standalone demo here (6.9MB)
The book reader can be embedded on any site that allows you to add an iframe, for example using the code below.
<iframe src='http://www.archive.org/bookreader/id/abroadcranethoma00craniala.html'
width='450px' height='450px'></iframe>
In the case of the Standalone Demo, operation is fairly simple. Images are numbered sequentially and stored in a directory called "StandAloneImages". The images are all the same size, and two functions in GnuBookJSSimple.js, getPageWidth() and getPageHeight(), return the page size. Scaling is done in the web browser.
For books scanned by the Internet Archive and stored on archive.org, bookreader operation is a bit more complex. During the book scanning process, each page is imaged using a high-resolution digital camera, and then each page is cropped and deskewed. The size of the cropped image is stored in a file called scandata.xml. During this process, some images, such as color cards, white cards, and tissue paper pages, are marked as pages that should not be displayed. This information is also stored in scandata.xml. The cropped and deskewed images are stored in JPEG 2000 format in a zip file called bookid_jp2.zip. The raw images, the cropped and deskewed images, and the scandata.xml file are available for each book on archive.org.
Because crop boxes can vary between pages for Internet Archive books, the getPageWidth() and getPageHeight() functions can return a different size for each image in the book. Also, since some pages are not supposed to be displayed, the getPageURI() function maps an "index number" that the book reader uses to a "leaf number" that corresponds to an image in the jp2.zip file. These functions use information from scandata.xml to determine size and url for each page. To use BookReader with your own book system you should implement the functions found in GnuBookJSIA.php.
Because web browsers cannot display JPEG2000 images, a piece of code called GnuBookImages.php performs on-the-fly JPEG2000 to JPEG conversion on the archive.org cluster. For efficiency, this code also provides server-side image scaling.
The search implementation is inherited from the old version of the Flipbook. It requires OCR data with word coordinate information in DjVu XML format. An example of a DjVu XML file is here.