Last edited by mangtronix

March 3, 2009 | History

Edit

Books URL - Proposal at 2009-01-29

Note: This document is in process -- if interested please send feedback to mang at archive org

Goals

Bookreader URLs have the following goals:

permanency - should be stable over time
compactness - short enough to be printed on the cover of a book or included in an academic paper
"translucency" - while not being fully descriptive, bookreader URLs should give some indication to a human what they point to
resilience - display and other options should be accepted in any order

Key-Value

Bookreader URLs are composed of key-value pairs. The keys and values are separated by '/'. The key-value pairs can occur in any order (really what we want?) but we will specify a canonical order. A user supplied URL will be remapped to this canonical order when given back to the user (e.g. by redirecting, so it appears in the address bar). The purpose for the remapping to canonical order is to reduce the number of URLs "out there" on the net that point to the same resource.

If a reader implementation does not understand a given key-value pair, it should be ignored.

We decided to lose the distinction between "display options" and other options since there could be confusion over which options are "display" options versus "location" options.

Functionality

Bookreader URLs support the following functionality:

referencing a specific page
highlighting search terms
specifying display options (zoom level, 2-page view)

Example URLs

http://www.archive.org/stream/aliceinwonderlan00carriala/
http://www.archive.org/stream/aliceinwonderlan00carriala/page/23
http://www.archive.org/stream/aliceinwonderlan00carriala/page/23/zoom/50

These two are equivalent and would be remapped to a single URL:

http://www.archive.org/stream/aliceinwonderlan00carriala/page/23/zoom/50/mode/2up
http://www.archive.org/stream/aliceinwonderlan00carriala/zoom/50/mode/2up/page/23

Referring to pages, leafs, indices

For a book with a set of numbered camera images we do not always have a mapping between these images and the page numbers (as printed in the book). In addition, certain pages are not numbered at all (e.g. a completely blank page may face a figure page, both of which are inserted between consecutively numbered pages). The image stack can also contain images which should not be considered for access (e.g. colour calibration cards).

When the page numbers are available they may be referenced with:

page/{page number}

The page numbers may be either numeric or a string (e.g. 'iii'). Our earlier Scribe 1 books may have Roman Numeral pages marked. Books scanned with Scribe 2 do not. String-based page numbers should be compared in lowercase. Named pages (such as the title page) may be referred to using the page name.

Examples:

page/2
page/iv
page/title

Question: There exist books (e.g. compilations of articles) which may have more than one page with the same number. How do we handle these?
Question: What named pages should we support?

An external site or embedding should not assume that the page numbers are available or monotonically increasing. There may be foldouts, pages missing (e.g. damaged) or other reasons page numbers are not continuous.

"Leaf" is a concept from the Archive's Scribe scanning software. It corresponds to the image sequence taken during the scanning process. The Archive.org scandata.xml refers to leafs. At the level of the bookreader and user-visible URLs the underlying leaf numbers should not be exposed unless necessary.

"Accessible page index" (pindex). Each page that should be included in the access formats (bookreader, PDF, etc) is given a monotonically increasing number starting from 0. For the Archive this corresponds to pages with addToAccessFormat true in the scandata.xml. The

Examples:

pindex/0
pindex/23

For books where there are multiple leafs with the same page number both the pindex and the page number should be specified.

Example:
pindex/23/page/4

Display options

Display options inform the bookreader how the book should be displayed to the user. The GnuBook reader supports the following options:

mode - can be 1up for single page display or 2up for two page display
zoom - the zoom level
zoomrect - the reader should attempt to show the given rectangle in source image coordinates. The rectangle is specified as leftX,topY,rightX,bottomY with the image origin at 0,0 in the top-left corner.

Searching

Search terms can be highlighted by using search/ followed by the search string. The search string should be URL escaped and the slash character ('/') is not allowed.

Examples:

Searching for "cats":
search/cats

Searching for "cheshire cat":
search/cheshire%20cat

Ideas from meeting 2009-02-29

Concatenating multiple books (or sections of books):

stream/alice/pageRange/22-25/id/tomsawyer/pageRange/15-20

History

Created January 22, 2009
13 revisions

March 3, 2009	Edited by mangtronix	Edited without comment.
March 3, 2009	Edited by mangtronix	Edited without comment.
March 3, 2009	Edited by mangtronix	Edited without comment.
February 25, 2009	Edited by mangtronix	Edited without comment.
January 22, 2009	Created by mangtronix	Edited without comment.

Books URL - Proposal at 2009-01-29

Goals

Key-Value

Functionality

Example URLs

Referring to pages, leafs, indices

Display options

Searching

More background reading

Image tiling

Transclusion

Ideas from meeting 2009-02-29

History