Trying to load Amazon data that has a matching LC record. Some rough ideas about how should we handle author records.
Use the catalog.merge.names.match_name()
method to find if a name from MARC data is the same as a name from Amazon.
Might need to rewrite match_name a bit to handle titles better.
Try each of these steps for each Amazon author:
If name matches a MARC author, nothing to do.
If role == 'Author':
If name matches a MARC contribution:
If MARC contribution has no role, move them to the author list
If MARC contribution has a role, 'editor' for example, they are probably not an author, and shouldn't be moved to the author list
If name doesn't match a MARC author or MARC contribution, add it to open library list of authors. (see 'Adding an author' below)
If role != 'Author'
If name in MARC contribution list:
If MARC contribution has a role, do nothing
If MARC contribution is missing a role, add the role from Amazon
If name not in MARC contribution list, add it with the role
Adding an author
-
Search for Amazon style author "John Smith", if found use that
-
Search for MARC style author "Smith, John", if a single author found, use that
- Add a new Amazon-style author.
Promoting contributor to author
- Search Openlibrary for author in the MARC style ("Smith, John") include name and dates in match.
Matching dates is an interesting question because it is based on string matching. Dates from MARC tend to be just years, but users could've edited them in Open Library into full dates. Infogami like matching only supports wildcards at the end of strings for performance reasons, so it is not possible to search for "*1950". Need to search on just author name, then test the dates on each record to see if they match.
-
Search Openlibrary for author in the Amazon style ("John Smith"), use if found
- If no match found add author record to openlibrary with name in the MARC-style (?)
Difficult cases:
-
The New Whole Foods Encyclopedia by Rebecca Wood. Author has two roles, Author and Illustrator. See http://www.amazon.com/dp/0140250328
-
Penguin Book of More New Zealand Jokes. Amazon lists three authors, all have 'Editor' as their role. MARC doesn't include any roles. It lists one author and two contributors. Should all three be authors, or all three be contributors? I think authors. See http://www.amazon.com/dp/0140279962
-
Goatperson and other tales. Amazon lists Michael Leunig as 'Illustrator', LC has 'Michael Leunig' in the author field. Records like this should be left unchanged in the Open Library database. See http://www.amazon.com/dp/0140291407
-
A-Z common symptom answer guide. Amazon lists two author names that match a single contributor in the MARC data. Need to handle this case.
-
A rag called happiness. Single author, "Verma, Nirmal" (OL) should match "Verma" (Amazon). No need to create second author.
-
Improving reading skills. Single author, "Milan Spears, Deanne." (OL) should match "Deanne Spears" (Amazon)
-
Contemporary topics 1. Amazon repeats authors surnames as extra authors. Should detected and handle this.
-
Memoirs of a beatnik. "Di Prima, Diane. "(OL) should match "Diane DiPrima" (Amazon)
-
Life with an idiot. Different author name transliteration: "Erofeev, V. V." (OL) doesn't match "Victor Erofeyev" (Amazon)
- A legend in his own mind. OL has "Jones, Bob, 1926- ill.", Amazon has "Jones (Author)". Should be matched as same person and kept as an illustrator.
Bad matches
-
Teach Yourself Philosophy by Mel Thompson wrongly matches Philosophy by Brooke Noel Moore and Kenneth Bruder
-
Selected Writings by Gérard de Nerval wrongly matches Selected Writings
by Sandor Ferenczi
History
- Created May 14, 2008
- 14 revisions
May 20, 2008 | Edited by Edward Betts | add one |
May 20, 2008 | Edited by 77.101.168.57 | another bad match |
May 20, 2008 | Edited by 77.101.168.57 | transliteration |
May 20, 2008 | Edited by 77.101.168.57 | more |
May 14, 2008 | Created by Edward Betts | start page |