Open Library logo
Last edited by Agent Sapphire
May 28, 2021 | History

Librarianship

Home | About Librarianship | Becoming a Librarian | Librarian Issue Tracker | Sandbox (old wiki-editing playground)

The Open Library provides an exciting opportunity for anyone interested in libraries and their future in cyberspace. It is a new kind of library that is not limited to any point in space or stretch of time. You can linger, sprint, and return to the contents of your choice, and no one will tell you time's up as long as you wish to virtually remain. There are well-organized categories with a catalog (aka search bar) for easy access to desired content. The Open Library shares many elements with brick and mortar libraries culminating in the shared mission to connect people with resources for learning, research, and entertainment, and to make the journey as interesting and rewarding as possible.

Furthermore, the Open Library strives to provide quality bibliographic information for its resources. Much of the information contained herein was created by library professionals, often in the form of cataloging records in MARC format. Other information comes from sources such as online bookstores, publishers, and open online sources such as Wikipedia. Because of its "wiki nature", Open Library can mix and match data from different sources, thus enhancing the traditional library record to include information that helps users find interesting resources and make an informed selection from the large number of published materials that are part of our cultural heritage.

Open Library users themselves can add information to the catalog and even add entire works from another source that the Library has not yet received. Collectively, we can create a new kind of library catalog; one that not only links users to materials, but allows users to engage in the dialog that a community library represents.

The "Things" of the Open Library

The international library community has developed a model of library data (Functional Requirements for Bibliographic Records, or FRBR) that defines the essential things that a library catalog describes. The Open Library embraces the identification of basic things, which in Open Library are called "Types." These Types are entities like Works, Authors, Editions, and Subjects in Open Library, but in fact anything that can be described can be a Type. These Types provide a focus for the user view of the Open Library data. For example, a search on an author results in an author "page" that links to works by the author, subjects the author has written on, and related authors. This is similar to the linked data concept that forms much of the current semantic web activity.

Works

FRBR defines the Work as the abstract creative product. For texts, the Work stays the same even when translated. In Open Library, where possible, Works have used the work titles provided in library uniform titles. To the extent possible, all of the various editions of a work (in FRBR called Manifestations) are linked to the Work so the user can retrieve them with a single command and view them together. Merging works still remains a manual process.

Authors

One of the more difficult problems, not only in library catalogs, but in all information systems, is that of uniquely identifying individuals. Libraries achieve this with authority control, a process by which a single unique form is devised for each person's name. The Open Library does not use the library forms of the name (which tend to be in the format "Smith, John, 1926- "), instead turning the names to natural order ("John Smith") and placing birth and death dates in their own fields. Algorithms are used to compare names and dates, and to the extent possible each author is identified and given a unique entry in the Open Library. The author information can include name variants, as well as biographical information that can help users understand which author's information they are viewing. Links to the viaf.org identifier or Library of Congress authority record are helpful.

Subjects

Most libraries in the United States, and some in other countries, use Library of Congress Subject Headings in their catalogs. These subjects headings have evolved over 100 years and attempt to reflect the full complexity of the contents of libraries. There is acknowledgment in the library community, however, that these headings are rarely fully understood by library catalog users. One attempt to simplify these headings was undertaken by the OCLC Research Division in its development of a modified view of LCSH called Faceted Application of Subject Terminology or, FAST. FAST, emphasizes the faceted nature of LCSH. Many "next generation" user interfaces for library catalogs have also experimented with separating the long LCSH strings into facets that are presented to users as a way to further refine their searches.

The Open Library took a similar approach and stores each of the LCSH facets as a separate subject entry. The facet types are: subject, person, time, place. Facets that appear related to the same Work can be shown as related in search results, and users can follow the topical train of related facets throughout the web of data that these subject facets create.

Identification

As a web-based information resource, it is important that all things are given an identifying URL. It is therefore possible to link to any one of the Open Library things from anywhere on the Internet. These identifiers do not change, although when entries are merged, as does happen, care is taken so that the old URL points to the new one, thus avoiding broken links.

Merging

The world of publishing produces many many copies of each book. Like all catalogs with multiple sources of input, Open Library frequently obtains metadata for a book more than once. One of the ongoing efforts of a catalog is to refine algorithms for merging duplicate entries. Open Library continues to work in this area, but is working on providing a way that users can merge editions, authors and works where the algorithm has not done so. As more web-based catalogs are developed, it is hoped that we can all share this information and therefore improve the quality of bibliographic data on the web as a whole.

Works

Merge Process (in development)

(This feature is in development and is currently on testing only.)

Search

Conduct a standard search.
Screenshot of search results.

Select

Select the works you want to merge by clicking anywhere inside the outlined area. The selections will turn blue.
Screenshot displaying selected works.

Merge Works

Once all are selected, click “Merge Works…” at the bottom right corner of the screen.
Screenshot indicating position of Merge Works button.

Review

You will now see a table view of each selected work and their data. The records are displayed from oldest to newest.
Screenshot of records table.
By default, the oldest (top) record is selected as the merge target (indicated by the radio button). The editions and data from other works will be merged into it and the merged work records will be redirected to the default work. You can de-select anything you do not wish to merge by un-checking it.

Note: Only one description field will be kept. If the default work is already populated with a description, the descriptions of the duplicate works will be lost, otherwise the description from the oldest populated record will be merged into the default.

Commit Merge

Once you have finalized your selections, double check that the details look correct on the bottom row, and click “Do Merge”.

Confirm

You will then see a “saving” message, followed by a list of merged works and affected editions.
Screenshot of merge confirmation.

Shortcut
Sometimes it can be difficult to do a search that includes all the duplicates of a work, for example if there are titles in multiple languages. You can use a comma-separated list of work IDs to bypass steps 1-3 and go straight to the grid view in step 4. Use the following url
https://testing.openlibrary.org/works/merge?records=OL...W,OL...W
replacing the work IDs in this example with the list of IDs to merge.

**Gotchas to look out for:**

Check the number of editions for all works being merged. Only 50 editions will migrate during the merge process. If you are merging a work with over 50 editions, you will first need to run [this](https://colab.research.google.com/drive/1BO0c8aDpfENA8Qsg0-fv_7st6j4AmeAq?usp=sharing) migration script with a bot account until fewer than 50 remain.

If only the target work has over 50, this step is not needed.

**Conflated works**: Look at the editions listed for each work. Make sure the titles are all correct for the work they are associated with. Conflation is common for works that belong to a series or for collected works, i.e. short stores, poetry, anything with the name "Works of". Before merging, you'll want to check that the title of the work they are associated with has not been changed over time. If you look at the [edit history](https://openlibrary.org/works/OL66534W/Northanger_Abbey?m=history) of Northanger Abbey, you can see how things went awry here; many edits were made to "correct" the work details based on the edition the editor was looking at. It's better to resolve conflation issues before merging, sorting out any editions that belong elsewhere.

**Works that aren't really the work**: adaptations, dramatisations, book notes, study guides, criticism, etc. frequently look like they should be merged and are sometimes incorrectly attributed to the author of the original work. Look for titles that include the author's name, for example Mark Twain's Huckleberry Finn, or any mention of reading levels. Examples: [https://openlibrary.org/works/OL1405785W](https://openlibrary.org/works/OL1405785W), [https://openlibrary.org/works/OL15395922W](https://openlibrary.org/works/OL15395922W), and [https://openlibrary.org/works/OL8256625W](https://openlibrary.org/works/OL8256625W).

Partial works should not be merged with the larger whole. For example *The Works of Author A in Twelve Volumes* should not be merged with *The Works of Author A Volume 1 of 12*.

Editions

Merging Editions

Current OL admin functionality.

https://openlibrary.org/books/merge
takes parameters: ?key=OL..M&key=OL..M etc

Authors

Changing an Author of a work

If a work is listed with the wrong Author, a different or new Author can be set via the https://openlibrary.org/works/OL...W/.../edit edit page (e.g. https://openlibrary.org/works/OL1066521W/Caesar_and_Cleopatral/edit). Click on the Author field, type the name of the author and then select the correct author from the dropdown (to associate the work to an existing author in the system), or click the last entry in the list which should be an option to create a new record for this author. Please be very sure that there's not a suitable existing record before creating a new Author record.

Merging Authors

https://openlibrary.org/authors/merge takes parameters ?key=OL..A&key=OL..A etc

Splitting a conflated Author record

For author's with common names and no other distinguishing info (e.g. birth/death dates), it is common for many different authors to be listed under a single record. To rectify this, move the works for each individual author to the appropriate author record (creating a new record if necessary). When they've all been moved, delete the original record. This is one of the few cases in which we delete records, because there's no reasonable place to redirect it to.

Gotchas to watch out for:
* Newly created author records aren't available in search, so can't immediately be used for a second book. Resist the temptation to create multiple new records and instead wait the minutes or hours necessary for the search index to be updated.
* Distinguishing newly created authors can be difficult since they won't have the top subject listed. As a workaround, you can use the OLID (ie OLnnnnA) directly to select the correct author (but it still needs to have been indexed).

Publishers

Merging Publishers (TODO)

This feature is not yet implemented. A request for this feature is documented here: https://github.com/internetarchive/openlibrary/issues/372

Note: Publisher is a string attribute, not a full entity, so this should be relatively straight forward to implement using the openlibray_client for a starter.

History

May 28, 2021 Edited by Agent Sapphire Edited without comment.
May 28, 2021 Edited by Agent Sapphire Edited without comment.
May 28, 2021 Edited by Agent Sapphire add formatting
May 28, 2021 Edited by Agent Sapphire Add "gotchas" section to work merging.
March 4, 2009 Created by webchick creating .en /about/lib page