American English Discussion -- Page 3 Modern Vocabulary While many words have been added to our lexical store, the main reason is almost certainly the fact that the body of Shakespeare's writings is intentionally homogeneous as opposed to the wide range of the million-word database. Shakespeare's works do not include, for example, words from scientific, medical, or mathematical texts. One-Million Word Database If you examine the one million word database and all words were statistically equal, then each word form would occur about 16 times and each lemma about 26 times. In actuality, the rate of repetition of individual words, and thus their frequency, is extremely uneven. The overall statistics are quite striking: the use rate of the first 100 most frequent words is so high that they account for a full 47.4 percent of all the text. Of all the running words (tokens) contained in the one million, the 100 most frequent lemmas constitute 49.6 percent of all the text. To account for 80 percent of the entire one-million-word text takes only 2,854 different word forms belonging to 2,124 distinct lemmas. Our 3000 Word List We have listed the 3000 most frequently used English words in alphabetically arranged groups of two hundred. Amazingly our list of lemmas derived from this million-word database consists of 2,126 filtered from a slightly different means from the million-word database. |
|
Language Books |