Note: This document is in process -- if interested please send feedback to mang at archive org
Goals
Bookreader URLs have the following goals:
-
permanency - should be stable over time
-
compactness - short enough to be printed on the cover of a book or included in an academic paper
-
"translucency" - while not being fully descriptive, bookreader URLs should give some indication to a human what they point to
- resilience - display and other options should be accepted in any order
Key-Value
Bookreader URLs are composed of key-value pairs. The keys and values are separated by '/'. The key-value pairs can occur in any order (really what we want?) but we will specify a canonical order. A user supplied URL will be remapped to this canonical order when given back to the user (e.g. by redirecting, so it appears in the address bar). The purpose for the remapping to canonical order is to reduce the number of URLs "out there" on the net that point to the same resource.
If a reader implementation does not understand a given key-value pair, it should be ignored.
We decided to lose the distinction between "display options" and other options since there could be confusion over which options are "display" options versus "location" options.
Functionality
Bookreader URLs support the following functionality:
-
referencing a specific page
-
highlighting search terms
- specifying display options (zoom level, 2-page view)
Example URLs
http://www.archive.org/stream/aliceinwonderlan00carriala/
http://www.archive.org/stream/aliceinwonderlan00carriala/page/23
http://www.archive.org/stream/aliceinwonderlan00carriala/page/23/zoom/50
These two are equivalent and would be remapped to a single URL:
http://www.archive.org/stream/aliceinwonderlan00carriala/page/23/zoom/50/mode/2up
http://www.archive.org/stream/aliceinwonderlan00carriala/zoom/50/mode/2up/page/23
Referring to pages, leafs, indices
For a book with a set of numbered camera images we do not always have a mapping between these images and the page numbers (as printed in the book). In addition, certain pages are not numbered at all (e.g. a completely blank page may face a figure page, both of which are inserted between consecutively numbered pages). The image stack can also contain images which should not be considered for access (e.g. colour calibration cards).
When the page numbers are available they may be referenced with:
page/{page number}
The page numbers may be either numeric or a string (e.g. 'iii'). Our earlier Scribe 1 books may have Roman Numeral pages marked. Books scanned with Scribe 2 do not. String-based page numbers should be compared in lowercase. Named pages (such as the title page) may be referred to using the page name.
Examples:
page/2
page/iv
page/title
Question: There exist books (e.g. compilations of articles) which may have more than one page with the same number. How do we handle these?
Question: What named pages should we support?
An external site or embedding should not assume that the page numbers are available or monotonically increasing. There may be foldouts, pages missing (e.g. damaged) or other reasons page numbers are not continuous.
"Leaf" is a concept from the Archive's Scribe scanning software. It corresponds to the image sequence taken during the scanning process. The Archive.org scandata.xml refers to leafs. At the level of the bookreader and user-visible URLs the underlying leaf numbers should not be exposed unless necessary.
"Accessible page index" (pindex). Each page that should be included in the access formats (bookreader, PDF, etc) is given a monotonically increasing number starting from 0. For the Archive this corresponds to pages with addToAccessFormat true in the scandata.xml. The
Examples:
pindex/0
pindex/23
For books where there are multiple leafs with the same page number both the pindex and the page number should be specified.
Example:
pindex/23/page/4
Display options
Display options inform the bookreader how the book should be displayed to the user. The GnuBook reader supports the following options:
-
mode
- can be1up
for single page display or2up
for two page display
-
zoom
- the zoom level
-
zoomrect
- the reader should attempt to show the given rectangle in source image coordinates. The rectangle is specified asleftX,topY,rightX,bottomY
with the image origin at 0,0 in the top-left corner.
Searching
Search terms can be highlighted by using search/
followed by the search string. The search string should be URL escaped and the slash character ('/') is not allowed.
Examples:
Searching for "cats":
search/cats
Searching for "cheshire cat":
search/cheshire%20cat
More background reading
Image tiling
Transclusion
-
Fine-Grained Transclusion in the Hypertext Markup Language - 1997
-
Methods for implementing transclusion of text into HTML pages - 1996
- purple-include - client-side JS library for transclusion using xpath
Ideas from meeting 2009-02-29
Concatenating multiple books (or sections of books):
stream/alice/pageRange/22-25/id/tomsawyer/pageRange/15-20
History
- Created January 22, 2009
- 13 revisions
March 3, 2009 | Edited by mangtronix | Edited without comment. |
March 3, 2009 | Edited by mangtronix | Edited without comment. |
March 3, 2009 | Edited by mangtronix | Edited without comment. |
February 25, 2009 | Edited by mangtronix | Edited without comment. |
January 22, 2009 | Created by mangtronix | Edited without comment. |