It looks like you're offline.
Open Library logo
additional options menu
Last edited by Edward Betts
May 20, 2008 | History

Matched Amazon author handling

Trying to load Amazon data that has a matching LC record. Some rough ideas about how should we handle author records.

Use the catalog.merge.names.match_name() method to find if a name from MARC data is the same as a name from Amazon.

Might need to rewrite match_name a bit to handle titles better.

Try each of these steps for each Amazon author:

If name matches a MARC author, nothing to do.

If role == 'Author':

If name matches a MARC contribution:

If MARC contribution has no role, move them to the author list

If MARC contribution has a role, 'editor' for example, they are probably not an author, and shouldn't be moved to the author list

If name doesn't match a MARC author or MARC contribution, add it to open library list of authors. (see 'Adding an author' below)

If role != 'Author'

If name in MARC contribution list:

If MARC contribution has a role, do nothing

If MARC contribution is missing a role, add the role from Amazon

If name not in MARC contribution list, add it with the role

Adding an author

Promoting contributor to author

Matching dates is an interesting question because it is based on string matching. Dates from MARC tend to be just years, but users could've edited them in Open Library into full dates. Infogami like matching only supports wildcards at the end of strings for performance reasons, so it is not possible to search for "*1950". Need to search on just author name, then test the dates on each record to see if they match.

Difficult cases:

  1. The New Whole Foods Encyclopedia by Rebecca Wood. Author has two roles, Author and Illustrator. See http://www.amazon.com/dp/0140250328
  2. Penguin Book of More New Zealand Jokes. Amazon lists three authors, all have 'Editor' as their role. MARC doesn't include any roles. It lists one author and two contributors. Should all three be authors, or all three be contributors? I think authors. See http://www.amazon.com/dp/0140279962
  3. Goatperson and other tales. Amazon lists Michael Leunig as 'Illustrator', LC has 'Michael Leunig' in the author field. Records like this should be left unchanged in the Open Library database. See http://www.amazon.com/dp/0140291407
  4. A-Z common symptom answer guide. Amazon lists two author names that match a single contributor in the MARC data. Need to handle this case.
  5. A rag called happiness. Single author, "Verma, Nirmal" (OL) should match "Verma" (Amazon). No need to create second author.
  6. Improving reading skills. Single author, "Milan Spears, Deanne." (OL) should match "Deanne Spears" (Amazon)
  7. Contemporary topics 1. Amazon repeats authors surnames as extra authors. Should detected and handle this.
  8. Memoirs of a beatnik. "Di Prima, Diane. "(OL) should match "Diane DiPrima" (Amazon)
  9. Life with an idiot. Different author name transliteration: "Erofeev, V. V." (OL) doesn't match "Victor Erofeyev" (Amazon)
  10. A legend in his own mind. OL has "Jones, Bob, 1926- ill.", Amazon has "Jones (Author)". Should be matched as same person and kept as an illustrator.

Bad matches

  1. Teach Yourself Philosophy by Mel Thompson wrongly matches Philosophy by Brooke Noel Moore and Kenneth Bruder
  2. Selected Writings by Gérard de Nerval wrongly matches Selected Writings
    by Sandor Ferenczi

History

May 20, 2008 Edited by Edward Betts add one
May 20, 2008 Edited by 77.101.168.57 another bad match
May 20, 2008 Edited by 77.101.168.57 transliteration
May 20, 2008 Edited by 77.101.168.57 more
May 14, 2008 Created by Edward Betts start page