Click here to skip to this page's main content.

Internet Archive logo

Site Search

Site Search
Last edited by Brenton Cheng
October 26, 2016 | History

Search inside individual book API

WARNING: This is an experimental API and can change in future.

Here is an example of searching inside a book using the searching within a book using the API.

http://ia700401.us.archive.org/fulltext/inside.php?item_id=designevaluation25clin&doc=designevaluation25clin&path=/28/items/designevaluation25clin&q=%22library%20science%22&callback=reply

Information you need to search inside a book, with an example from the above search:

  • hostname: ia700401.us.archive.org (host where the book is stored)
  • item_id: designevaluation25clin (archive.org item ID)
  • doc: designevaluation25clin (most times this is the same as the item_id)
  • path: /28/items/designevaluation25clin (path of the book on this host)
  • q: "library science" (phrase to search for)
  • callback: reply (optional callback for JSONP)

You can find the hostname and path using the archive.org locator service.

Example of output from API call:

reply( {
    "ia": "designevaluation25clin",
    "q": "\"library science\"",
    "page_count": 224,
    "body_length": 475677,
    "leaf0_missing": true,
    "matches": [
       ...
    ]
} )

The reply includes page count, this is the number of pages that were passed to the OCR.

Example of a match:

{
    "text": "The first Clinic on Library Applications of Data Processing was held at the Illini Union on the Urbana-Champaign campus of the University of Illinois, April 28 - May 1, 1963 under the sponsorship of the University of Illinois Graduate School of {{{Library}}} {{{Science}}}. Writing in the Foreword to the Clinic proceedings, Herbert Goldhor (1964) provides the rationale for sponsoring such a Clinic:",
    "par": [
        {
            "page": 14, "page_width": 2134, "page_height": 3328,
            "b": 1090, "t": 700, "r": 2024, "l": 192,
            "boxes": [
                { "r": 1560, "b": 957, "t": 899, "l": 1378 },
                { "r": 1767, "b": 957, "t": 899, "l": 1587 }
            ]
        }
    ]
}

Each match contains a 'text' field. This is usually a complete paragraph. The matched words are surrounded by three braces either side, like {{{this}}}.

The other field is called par, it contains details of every page that is part of this match. Paragraphs can cross pages. Each par object provides a page number, page width, height, and coordinates for the paragraph on the page. The boxes field field lists the coordinates to draw around each word or part of word in the match.

Hyphenation means words can break across lines and across pages.

History Created October 22, 2010 · 10 revisions

October 26, 2016 Edited by Brenton Cheng Edited without comment.
January 7, 2011 Edited by Edward Betts host and path of sample book changed
October 23, 2010 Edited by Anand Chitipothu improved the response example.
October 22, 2010 Edited by Lance Arthur reverted to revision 5
October 22, 2010 Created by Edward Betts started page