It looks like you're offline.
Open Library logo
additional options menu
Last edited by tracey pooh
May 13, 2011 | History

Book URLs

Goals

Bookreader URLs have the following goals:

Key-Value

Bookreader URLs are composed of key-value pairs. The keys and values are separated by '/'. We specify a canonical order in which the key-value pairs should occur but accept the key-value pairs in whatever order the user-agent specifies. A user supplied URL will be remapped to the canonical order when given back to the user (e.g. by redirecting, so it appears in the address bar). The purpose for the remapping to canonical order is to reduce the number of URLs "out there" on the net that point to the same resource.

If a reader implementation does not understand a given key-value pair, it should be ignored.

Functionality

Bookreader URLs support the following functionality:

Example URLs

http://www.archive.org/stream/aliceinwonderlan00carriala#15
http://www.archive.org/stream/aliceinwonderlan00carriala#page/23
http://www.archive.org/download/aliceinwonderlan00carriala#page/15/region/10,212,256,256

These two are equivalent and would be remapped to the canonical order:

http://www.archive.org/stream/aliceinwonderlan00carriala#highlight/20,20,30,500/mode/2up/page/23
http://www.archive.org/stream/aliceinwonderlan00carriala#mode/2up/page/23/highlight/20,20,30,500

Canonical order:

http://www.archive.org/stream/aliceinwonderlan00carriala#page/23/mode/2up/highlight/20,20,30,500

Referring to pages, leafs, indices

For a book with a set of numbered camera images we do not always have a mapping between these images and the page numbers (as printed in the book). In addition, certain pages are not numbered at all (e.g. a completely blank page may face a figure page, both of which are inserted between consecutively numbered pages). The image stack can also contain images which should not be considered for access (e.g. colour calibration cards).

When the page numbers are available they may be referenced with:

page/{page number}

The page numbers may be either numeric or a string (e.g. 'iii'). Our earlier Scribe 1 books may have Roman Numeral pages marked. Books scanned with Scribe 2 do not. String-based page numbers should be compared in lowercase. Named pages (such as the title page) may be referred to using the page name.

Examples:

page/2
page/iv
page/title

The following named pages are supported:
title cover first (corresponds to first page 1 if marked) last

Question: There exist books (e.g. compilations of articles) which may have more than one page with the same number. How do we handle these?

An external site or embedding should not assume that the page numbers are available or monotonically increasing. There may be foldouts, pages missing (e.g. damaged) or other reasons page numbers are not continuous.

Note: "Leaf" is a concept from the Archive's Scribe scanning software. It corresponds to the image sequence taken during the scanning process. The Archive.org scandata.xml refers to leafs. At the level of the bookreader and user-visible URLs the underlying leaf numbers should not be exposed unless necessary.

"Accessible page index" (n). Each page that should be included in the access formats (bookreader, PDF, etc) is given a monotonically increasing number starting from 0. For the Archive this corresponds to pages with addToAccessFormat true in the scandata.xml.

Examples:

page/n0
page/n23

For books where there are multiple leafs with the same page number the index form can be used to uniquely identify the page.

Display options

Display options inform the bookreader how the book should be displayed to the user. The GnuBook reader supports the following options:

Note: If we force region to only use percentages that will provide some future-proofing against resolution changes. (mang)

Searching

Search terms can be highlighted by using search/ followed by the search string. The search string should be URL escaped and the slash character ('/') is not allowed. Spaces in the search query should be escaped to the '+' character and the '+' character in a search string should be escaped to '%2b'.

Examples:

Searching for "cats":

 search/cats

Searching for "cheshire cat":

 search/cheshire+cat

Searching for "cheshire+cat":

search/cheshire%2bcat

Highlighting (not yet implemented)

A region of a page can be highlighted using syntax similar to region. When highlight is specified a i or page should be specified.

Examples:

Full list of stream key-value pairs in canonical order

Downloading / Linking Page Images

The stream URL provides access to books in a format designed for online reading. The download URLs allows a book or portion of a book to be downloaded.

Images of the individual pages will be accessible at the the following URLs (not yet implemented):

http://www.archive.org/download/{itemId}(/{path_to_book})/page/{page_specifier}({image_options}).jpg

The page specifier must be one of the following:

The following image options are supported:

The following image options are not yet implemented:


If multiple size specifiers are used simultaneously, the result is not defined (e.g. don't use page_w200_thumb.jpg).

Note: In general the size of the returned image size will not exactly match the size requested due to
server-side image processing constraints (the closest size that is efficient to process will be returned).
The client requesting the image should do final scaling of the image to the needed size. In general we return the next larger power of 2 reduction (2x, 4x, 8x, etc) compared to the requested size since it can be done efficiently when processing our JP2 source images.

Examples:

http://www.archive.org/download/coloritsapplicat00andriala/page/cover.jpg

http://www.archive.org/download/coloritsapplicat00andriala/page/title.jpg

http://www.archive.org/download/coloritsapplicat00andriala/page/cover_thumb.jpg

http://www.archive.org/download/coloritsapplicat00andriala/page/cover_w200.jpg

http://www.archive.org/download/coloritsapplicat00andriala/page/page35.jpg

http://www.archive.org/download/coloritsapplicat00andriala/page/n25_s4.jpg

Books inside sub-directories and multi-book items

For books inside a subdirectory in the item the "sub-prefix" can be specified to indicate which
book is being requested. Here's an example book in a sub-directory:
http://www.archive.org/download/BozorSobirkhonavodaParokandaShud/Bozor_Sobir_Khonavoda_Parokanda_Shud/page/n0.jpg

If the sub-prefix is not specified the first book found (alphabetically, by prefix) is returned.
For items containing multiple books, the sub-prefix must be specified to access books other than the first book
inside the item.

http://www.archive.org/download/SubBookTest/subdir/subsubdir/book3/Rfp008011ResponseInternetArchive-without-resume/page/cover.jpg

http://www.archive.org/download/SubBookTest/subdir/book2/brewster_kahle_internet_archive/page/cover.jpg

http://www.archive.org/download/SubBookTest/book1/GPORFP/page/cover.jpg

http://www.archive.org/download/SubBookTest/page/cover.jpg

Future Work

Fine-grained image scaling

We currently only support power of 2 reductions for download image sub-regions. We make this restriction since power of 2 reductions are efficient with our JPEG2000 image backend. More fine-grained image resolution requests could be supported if the backend was powerful enough to allow it.

Zoom to fit

Originally we thought to support the URL encoding whether to zoom to fit the page. If region is insufficient for this purpose we could allow a zoom key.

Existing Archive.org book URLs

The following example URLs also exist on Archive.org (as of March 26, 2009). These URLs should continue to be supported. This is not an exhaustive list.

Streaming full text (show the text file inside a basic online viewer):

http://www.archive.org/stream/happyhearts00isleiala/happyhearts00isleiala_djvu.txt

Streaming DJVU using a viewing applet:

http://www.archive.org/stream/happyhearts00isleiala/happyhearts00isleiala.djvu

Downloading files using /download:

http://www.archive.org/download/happyhearts00isleiala/happyhearts00isleiala.djvu

The old flipbook reader, opening at a specific page. In this case we should open the new flipbook reader if possible (the new reader should support #{pagenumber} as legacy.

http://www.archive.org/stream/happyhearts00isleiala#56

Questions / Issues for Existing URLs

We need some mechanism to distinguish when streaming or downloading a specific file inside the item is requested. Should it be possible that after an individual file is specified after stream or download that trailing key/value pairs could occur? It seems we want to support that behaviour. What happens in the case where a directory or file inside the item has the same name as one of the bookreader key names?

Do files inside an item always start with the item identifier in the name? Answer Not generally true but may be true for Scribe scanned books (unverified).

Other Documents

Ideas from meeting 2009-02-29

Concatenating multiple books (or sections of books):

stream/alice#pageRange/22-25/id/tomsawyer/pageRange/15-20

For canonical order, it should be possible to chop key-value pairs off from the right and have the URL still work (but be less precise).

If a page and index are given and conflict, we take the one to the left when the URL is put into canonical order.

Named pages for beginning and end.

For zoom give fit values (width, height). Do small, medium, large, original like Flickr?

New functionality - croprect/region. Returns an image which could be used in for example <img src='download/alice/page/15/region/10,212,256,256'>

All lowercase in URLs.

Use region instead of zoomrect. For region use something similar to djatoka (y,x,height,width)

Key-value pairs:

Drill on keywords and order them.

Title page detection

Feedback from Brewster 2009-03-19

Related documents:

Image tiling

Transclusion

History

May 17, 2018 Edited by tracey pooh use newer https://archive.org urls
December 5, 2011 Edited by mangtronix Edited without comment.
May 27, 2011 Edited by mangtronix Edited without comment.
May 27, 2011 Edited by mangtronix Live examples
April 2, 2009 Created by mangtronix Edited without comment.