{"body": {"type": "/type/text", "value": "\r\n\r\n\r\nUntitled Document\r\n\r\n\r\n\r\n\r\n

Work Sets in the Open Library (FRBR-ization)

\r\n

The thing called "FRBR-izing" is really the creation of a set of \r\n records that all represent the same work. So rather than having a display that \r\n shows all of the different editions of the work separately:

\r\n \r\n

... you have a single display for the work, that links to all of the editions. \r\n Different systems display this differently. Here is the OCLC FictionFinder display:

\r\n \r\n

And here is the beginning of the display of all of the many editions:

\r\n\r\n

This creates a two-tired database with Works and Editions. (Note: Editions \r\n are called "Manifestations" in library lingo.) The two big questions \r\n are:

\r\n
    \r\n
  1. What is a work?
  2. \r\n
  3. How is this hierarchy represented in the database in a way that is efficient for searching and for display?
\r\n

Note that FRBR-ization affects only a small percentage of bibliographic records. \r\n OCLC's \r\n statistics show that 78% of the items in WorldCat are unique Works. Only \r\n 1% of Works have up to 7 Editions, and only 30,000 in their database have more than 20 Editions.

\r\n\r\n\r\n

What is a work?

\r\n

There is no definitive answer to what is a work, especially when it comes to changes in format, such as a book that has become a screenplay and then is made into a movie. But since we only have books in the OL database at the moment, the task is somewhat simpler: bring together books that are essentially the same text. Basically, the elements that define a work are:

\r\n\r\n

This isn't quite as simple as it seems because ideally one would also bring together different translations of the same work, and of course those do not have the same title. In some records that we receive from libraries there will be a special "work title" that contains the original title of the work regardless of the language of the translation.

\r\n
\r\n  Mann, Thomas\r\n    [Zauberberg]\r\n\tMagic Mountain.\r\n\t\r\n  Mann, Thomas\r\n    [Zauberberg]\r\n\tLa Montagna incantata.\r\n
\r\n

There are also Works that are the same but have been printed with different \r\n titles at different times or in different countries, such as the works of Shakespeare \r\n and Harry Potter. The work titles (called "uniform titles" in library \r\n lingo) are unfortunately not used consistently even in library records, and \r\n don't exist at all in records from our other sources. At some point we will \r\n have to rely on users to bring together works that do not get identified algorithmically. \r\n We also have a set of ISBNs from LibraryThing to use, and could probably make \r\n some use of the xISBN service from OCLC. This, however, only helps us with works \r\n that have an ISBN.

\r\n

In terms of an algorithm, OCLC's work \r\n set algorithm is available. However, it makes use of some data elements \r\n that we will not have, in particular those that OCLC derived from LC Authority \r\n records.

\r\n

The Work-set display and the Edition display will make use of different fields. A page on the fields and display is here.\r\n\r\n

It is quite possible that the current edition matching algorithm that we use \r\n can be adapted to determine works in a way that approximates the OCLC results. \r\n This won't be as accurate as the OCLC algorithm, but we can use OCLC's FictionFinder \r\n database as a test set against which we can measure our results.

\r\n\r\n \r\n

The Database Design

\r\n\r\n

There are undoubtedly many different ways that we could design a database to support FRBR. Some possible designs are:

\r\n
\r\n
Work-centric access and display
\r\n
In this scenario, there is a work record that contains the primary author(s), \r\n the title, and subject information. General searching goes against this Work \r\n record, which contains links (probably identifiers) to all of the related \r\n Edition records. This is how FictionFinder appears to work. In FictionFinder, \r\n searches on elements specific to a particular edition (e.g. the name of an \r\n illustrator) do not return results. Individual editions can be displayed in \r\n detail from the display, however. This design is very different from what \r\n we have today in infogami and may not be feasible. It also requires a Work \r\n record for each Work in the database, including those that are comprised of \r\n only one Edition.
\r\n
Edition-centric access, Work-centric display
\r\n
This design would index edition records much as they are indexed today, \r\n but would bring together records for the same work in a tiered display. There \r\n could be a minimal Work record that has approximately the same functionality \r\n that the Author record does today. With the retrieval of any record in a Work \r\n set, the Work record would be displayed with all of the editions subordinate \r\n to it on the page. This is usually done with a table that contains an entry \r\n for each edition with its equivalent Work, so that the Work is displayed in \r\n the place of the Edition. This requires de-duplicating the entired retrieved \r\n set so that the Work record is displayed only once even though multiple editions \r\n for the same work are retrieved. The Work record then can be used to retrieve \r\n all of the edition records based on their having the same Work ID. This design \r\n probably requires a Work record for each Work in the database, including those \r\n that are comprised of only one Edition.What this design does not easily provide \r\n is a display of the number of editions on the Work page. It also presents \r\n some potential performance problems which would have to be studied.
\r\n
Edition-centric access, Edition-centric display, Optional Work display
\r\n
This would mimic the current treatment of Editions and Authors. Retrieval \r\n would retrieve Editions and would display Editions. However, there would be \r\n something in the display of retrieved records that would allow you to move \r\n to a Work view if it is available for that record (a displayed link or button). \r\n One advantage of this is that it would only require a Work record for the \r\n books that have more than one Edition (guesstimated at 25% of the database). \r\n The disadvantage is that Works will be less obvious to users and most of the \r\n users will still see multiple editions in displays for that small percentage \r\n of works that has many editions (like all of our Mark Twain examples).
\r\n
\r\n

Note that based on the OCLC statistics, if we create a Work record for each work (even those that have a single edition) we will increase the number of records in the database by about 75%.\tCreating a Work record only when there are multiple editions, however, may add complexities to display.

\r\n\r\n\r\n"}, "title": "FRBRization in the Open Library", "last_modified": {"type": "/type/datetime", "value": "2008-08-17 18:15:28.429732"}, "key": "/about/frbrization", "type": {"key": "/type/page"}, "id": 17867179, "revision": 3}