Date: 9-15-93 Files: README (this file), cmudict.0.1.Z (compressed), cmulex.0.1.Z, cmudict.0.2.Z (compressed), cmudict.0.3.Z (compressed), cmulex.0.3.Z, phoneset.0.1, phoneset.0.3. This directory contains a pronunciation dictionaries (cmudict.0.1.Z is the first one we put out, cmudict.0.3.Z is the latest and most up-to-date) containing approximately 100k words and their transcriptions; lists of the words are in cmulex.0.1.Z and cmulex.0.3.Z. We use these dictionaries at CMU in our speech understanding systems. The phone set for this dictionary contains 39 phones, which can be found in phoneset.0.3. Stress is indicated by means of a numeral [012] attached to a vowel: 0 = no stress 1 = primary stress 2 = secondary stress Alternate transcriptions are identified with a numeral in parentheses as part of the lexical entry. We generated this dictionary using the following independent sources: - a 20k+ general English dictionary, built by hand at CMU (extensively proofed and used). - a 200k+ UCLA-proofed version of the shoup dictionary. - a 32k subset of the Dragon dictionary. - a 53k+ dictionary of proper names, synthesiser-generated, unproofed. - a 200k dictionary generated with Orator, unproofed. - a 200k dictionary generated with Mitalk, unproofed. All entries that occur solely in copyrighted sources, like the Dragon dictionary, are not currently included in this dictionary. if you have words and transcriptions that you would like included in this unrestricted resource, please send them to Robert L. Weide (weide@cs.cmu.edu) and we will consider them for an upcoming version. All of the above sources were preprocessed and the transcriptions in the current cmudict.0.1 were selected from the transcriptions in the sources or a combination thereof. We have removed some potentially unreliable transcriptions from this dictionary, including those based on only one source, and will reintroduce them once we have verified the transcriptions. CMU does not guarantee the accuracy of this dictionary, nor its suitablity for any specific purpose. In fact, we expect a number of errors, omissions and inconsistencies to remain in the current result. We intend to continually update the dictionary as we make progress in correcting them. We will make subsequent versions available via anonymous ftp, and those who would like notification when updated versions are available should send email to weide@cs.cmu.edu. We welcome input from users: send e-mail to Robert L. Weide (weide@cs.cmu.edu) for comments and suggestions on the content of the dictionary, or to Peter Jansen (pjj@cs.cmu.edu) for questions regarding the combination process. The Carnegie Mellon Pronouncing Dictionary [cmudict.0.1] is Copyright 1993 by Carnegie Mellon University. Use of this dictionary, for any research or commercial purpose, is completely unrestricted. If you make use of or redistribute this material, we would appreciate acknowlegement of its origin. Finally, if you add words to or correct words in this dictionary, we would like the additions and corrections sent to us (weide@cs) for consideration in a subsequent version. All final entries will be approved by Robert L. Weide and Peter Jansen, editors of the dictionary.