Cataloging Guidelines
From Project Gutenberg, the first producer of free electronic books (ebooks).
Possible tasks
- Splitting/merging authors/verifying author attributions
- "eg., yesterday there came in a batch of "R.M. Ballantyne" which the catalog did not attribute to "Ballantyne, R. M. (Robert Michael), 1825-1894" because of the missing space between R. and M.
You could peruse the "Browse by Author" pages and check on all authors with suspect similar names. Make sure they are different persons and check the book attributions agains LoC, Wikipedia etc." (Marcello, November 8, 2007 3:56:56 AM PST )
- "eg., yesterday there came in a batch of "R.M. Ballantyne" which the catalog did not attribute to "Ballantyne, R. M. (Robert Michael), 1825-1894" because of the missing space between R. and M.
- Adding LoC subject classifications
- Some catalogers don't like these, but most do consider them valuable
- Two (external) lists of LoC classes are: http://www.itsmarc.com/crs/LCSO0001.htm and http://www.loc.gov/catdir/cpso/lcco/lcco.html
- You can see a list of all the LoC classes currently defined at PG by typing * into the search box for them. (clarify this?)
- Be careful about PZ's from LC-- PZ1-4 were used in the past to group popular fiction in English (regardless of original language) outside the normal order. This is no longer common usage, and the Library of Congress has stopped using PZ1-4 (but has not updated existing call numbers to be in line with current practice), so treat PZ call numbers from LC with caution, and if it doesn't seem right to you, check in WorldCat for a better alternative.
- Note that Canadian literature should most likely be PS instead of PR.
- Can add more than one LoCC if the work usefully fits into more than one category.
- In case of very broad classes (like PR, PT, PZ), the next step would be using numbers after the two characters, which is already done with the F classes. If you create such a numbered class, please look at the way the LoC distributes the numbers (they use number ranges) and choose the first value. Example: the LoC has F2201-F2300 for general history of South America, so we have F2201 for this topic.
- Adding ToC notes for story collections
- "Another thing that I think is pretty uncontroversial (though I might be wrong) is to add 505 contents notes for things like collections of stories or essays. Put all the components in one 505 note, not each component in a separate one. Tastes differ on the exact format. You can see what I do in this record: http://www.gutenberg.org/etext/20953
I think Marcello favors the style here: http://www.gutenberg.org/etext/2327" (Joyce, November 8, 2007 6:00:19 AM PST ) - "This improves access because 505's are checked in title searches, so if someone is looking for a particular short story but doesn't know the name of the collection, they'll be able to find it by title if it's in a contents note. I don't worry about the non-filing characters field in 505's, but I add MARC indicators 0_ (zero blank). You can read about 505's here (the b-with-a-slash symbol indicates a blank): http://www.oclc.org/bibformats/en/5xx/505.shtm" (Joyce, November 8, 2007 6:00:19 AM PST )
- "Another thing that I think is pretty uncontroversial (though I might be wrong) is to add 505 contents notes for things like collections of stories or essays. Put all the components in one 505 note, not each component in a separate one. Tastes differ on the exact format. You can see what I do in this record: http://www.gutenberg.org/etext/20953
- Add information from REPosted books to the catalog
- Check the archives of the posted list; you have to sign up with the list first, but you can disable having messages sent, so it won't fill up your mailbox.
- Careful check of catalog record details for Librivox audiobooks.
- I think you can select these by choosing category "Audio Book, human-read" and perhaps limiting to items with "librivox" in their full-text.
- Get periodicals and multi-volume sets to sort in order.
- See Punch for instance. One way to work on it (from Marcello)-- Use the "Batch-Edit Titles" link on the "Catalog Admin" page. Enter a regular expression (RegExp) like this: ^Punch,.*1920 (that will let you tackle a year at a time). Things that would need fixing to get Punch sorted in order:
- Change date format from "April 1, 1914" to "1914-04-01".
- Regularize spelling of "Vol."
- Regularize on 3-digit volume numbers (note that some volume numbers may need correction).
- Once Punch is sorting in order, we might also add editor information for issues that don't have it. Note that to avoid interfering with the chronological sort, editors will need to have "No-Heading" status.
- See Punch for instance. One way to work on it (from Marcello)-- Use the "Batch-Edit Titles" link on the "Catalog Admin" page. Enter a regular expression (RegExp) like this: ^Punch,.*1920 (that will let you tackle a year at a time). Things that would need fixing to get Punch sorted in order:
Useful links
- Cataloger's Reference Shelf
- Library of Congress catalog
- WorldCat meta-catalog
- A Summary of Commonly Used MARC 21 Fields
- Portuguese National Library Catalog
- "Select: Catálogo --> Pesquisar em: --> Palavras em autor --> (Author's name)" said Ricardo
Catalog record review notes
Background on catalog record review
When a new etext is posted to Project Gutenberg, a corresponding catalog record is automatically created. This is great because it means there is immediate basic access to newly posted texts. However, sometimes there are things about the automatically created records which need fixing up by a human cataloger for optimal access. These notes are intended to provide a suggested procedure for fixing up a new record.
Some useful websites
- Library of Congress Authorities
- Library of Congress Online Catalog
- WorldCat
- More specialized sites:
- Internet Speculative Fiction Database. Useful when working on science fiction. It can be finicky about finding a match for your search. A name in direct order is more likely to find a match, so try "John Smith" instead of "Smith, John"
- Library of Congress Classification Outline.
- Hollis Catalog Handy when working on Chinese texts.
- Portuguese National Library. Useful for looking up Portuguese authors (search on words in author).
- Univerity of Florida Library. Useful subject source for some juvenile titles.
Catalog record review procedure
Check names of creators, etc., to see if they need merging or modification.
It may not be possible to catch all names that need merging (say if there's an obscure pseudonym for the author already in the PG catalog, one might not know to merge that with the form on the current record), but at least look at the author in the PG author browse view, or use the catalogers' author look-up page to search on the last name to see if there's an obvious merging candidate nearby.
Also be aware that a title can be automatically linked to the wrong author. The easiest way to check may be to look up the title at LC or WorldCat and compare their author entries for the book with ours. You can use your head, too-- someone born in the 1860's didn't edit a work published in the 1850's!
For a newly-added author (with just one associated title, say), check in the LoC name authorities file to see if our form of the name is as complete as it could be (not just "Melville, H." as the author of Moby Dick for instance). Optionally, you may add one or more aliases for significantly different forms of the author's name (see Charles Dickens for an example), and perhaps a link to a Wikipedia entry or other source on the author. Aliases should have "Heading" status if they will file far from the main entry alphabetically, but "No Heading" status if they will file right next to the main entry (to reduce clutter). If our information for the author doesn't include any dates, check the LoC name authorities to see if there are some dates to add (but of course only if a name authority record you find with dates actually corresponds to our author). Note that our date fields don't accept non-numeric data, so date info like "fl. 1600-1615" can't be entered in the date fields. Suggested work-around: include that information in the name field instead. BC dates must be entered as negative numbers, and will display with "BC". If date information is unambiguous, you need only enter the birth and/or death date in the "earliest" fields. Only if there is ambiguity about one of the dates do you need to use both the "earliest" and "latest" fields. Some authors have an uncertain date, like "Du Haillan, Bernard de Girard, seigneur, 1535?-1610" or "Xuan, Ding, 1832-1880?". To make a date display with a following question mark "?" if you don't have a range of dates to enter, enter that date only in the "latest" field. To do occasional checks for authors that would benefit from this treatment (they will look better in the public display), go into the Authors interface and search for "*?"
Look at the creators/editors/translators/illustrators/etc. (if more than one) to see if some (possibly all but one) of them should be linked with "No Heading."
This reduces redundant title entries in the list of search results users get, but does not prevent access through searches on the "No Heading" person. One way to decide who gets a Heading: search the title at the Library of Congress or WorldCat and see who's in the 100 field in a record for the title. That person gets "Heading", everyone else gets "No Heading." Note that when the main author is also listed as an editor you will get an error message when you try to change their editor link from "Heading" to "No Heading" (it's not a problem if the editor is a different person). The only work-around I've found is to remove the editor link and recreate it, this time with "No Heading". Or if you feel it's redundant, just remove it.
Skim record for anything odd
An example would be a malformed autogenerated note field (it will look obviously screwy). If you're not sure how to fix it, mention it on gutcat.
If there is no language code attached to the bib record, it may be because we don't yet have the language in the list of codes. Marcello advises:
In doubt consult:
http://www.ethnologue.com
use the three letter code (ISO 639-3) and the main language name in the header to insert new languages into the database. (So if anybody complains about the name you simply refer them to ethnologue.)
Alternatively refer to this list of ISO 639-2 codes:
http://www.loc.gov/standards/iso639-2/php/code_list.php
Check title for possible typos and correct number of non-filing characters
A warning flag for typos is if you search the title in LC and WorldCat and get no hits (but don't be surprised at no hits for science fiction short stories). If you get no hits, check in the text itself and see if the title in the header matches what's on the "title page". If there is an error in title, correct the 245. Note that you will need to specify a language (if it is not yet specified) to save any changes to the title that you make. If you make any changes at the beginning of the title, make sure that the number of non-filing characters remains correct. The number of non-filing characters is the number of letters in any initial article plus one for the following space, so for "The " it's 4, for "An " it's 3, etc. If a title (or author) typo is found in the header, report to the errata team. A significant author typo would be an actual misspelling of part of the name, not just a different form of the name. So for L. Frank Baum, "L. Fran Baum" would be a typo but "Lyman Frank Baum" wouldn't. If you have reason to suspect a typo, compare the spelling in the header to the spelling in the etext proper. If it doesn't match, you have probably found a typo.
Even if you don't edit the 245, it's worth checking to make sure the number of non-filing characters is correct, especially for a title in a language other than English. If it is a language with which you are unfamiliar, you can find out how many non-filing characters are needed by searching the title in another library catalog and then choosing the MARC display. The second digit after the "245" field label is the number of non-filing characters in the title. If you suspect that the first word of the title is an article, try omitting that word when you do your search. Additionally, if the title begins with punctuation (e.g. " ' or ...), then edit the non-filing character field to take that into account, so that the title will be filed starting at the first non-punctuation and non-space character.
Optional extras
Add LoCC (Library of Congress Class)
- Be careful about PZ's from LC-- PZ1-4 were used in the past to group popular fiction in English (regardless of original language) outside the normal order. This is no longer common usage, and the Library of Congress has stopped using PZ1-4 (but has not updated existing call numbers to be in line with current practice), so treat low PZ call numbers from LC with suspicion, and if possible check in WorldCat for a better alternative.
- Note that Canadian literature should most likely be PS instead of PR.
- Can add more than one LoCC if the work usefully fits into more than one category.
Add appropriate subject heading(s)
Look up the title in the Library of Congress (and possibly WorldCat) to find subject headings to use. For more tips, see the subject cataloging notes.
Add 505 (contents note) if applicable
Include all info in a single contents note. Ignore "non-filing characters" field. See etext:10023 for an example. I usually only include a 505 if I've found one I can copy and paste from another catalog, but if you do that, make sure that the contents of our edition matches what you're pasting in. (OCLC docs)
Add 010 (LCCN) if applicable
If you find a record at the Library of Congress with a 260 which matches the publication info on the title page of our text, you might add an 010 field and paste in the LC Control No. from the LC record. See etext:30674 for an example. If the 010 includes a space (usually the case if it includes any letters), you need not remove it (the link will work either way). Records with 010 beginning with "unk" are poor quality, I wouldn't bother adding an 010 in that case.
Add 240 (uniform title) if applicable
I interpret "language" field for 240 to be the primary language of the text in the 240 field itself. So "Picture of Dorian Gray. French" would get language "English." (OCLC docs)
Add 246 (alternate title) if applicable
Doesn't come up too much except for texts in Chinese, see special cataloging procedures. (OCLC docs)
For more information...
about any of these fields, see the OCLC bibliographic formats and standards document
Other advice
Sometimes it is useful to see the upload message from the text-preparer. For this, you would need to subscribe to the whitewashers' email list. Ask on gutcat about how to subscribe. You can generally skim through the messages searching on the word "note" to see if there are any items that require changes in the bib record for the text. You may also find it useful to subscribe to the posted list, though if you are on the whitewashers' list it may not be necessary.
Useful automation bits
javascript:(function () {for (n=0; n<document.links.length;n++) { if (/etext/.test(document.links[n].href)) { document.links[n].href="http://www.gutenberg.org/catalog/admin/mn_books_loccs?mode=add&step=update&fk_loccs=BS&fk_books="+document.links[n].href.match(/[0-9]+/) } } })()
- This is a bit of javascript, which, if run on a page, will turn all the etext links on the page into links that will put the books into the BS LoC subject class; just change the bit that says "BS" to whatever you want it to be to put things in other categories.
See Also
- Cataloging Progress -- a place to post incomplete projects so they don't get lost and may be furthered by others
- Special cataloging procedures -- notes on reviewing catalog records for texts in Chinese.
- Subject cataloging notes -- notes and tips about adding subjects to catalog records.
- Programming Project Ideas -- a place to list suggestions for things that could be done to improve the software/website