What do these words have in common?









































Your mind naturally looks for patterns and, as you read, you probably start to put the list into categories. For example, you might look for grammatical classes:

articles: a, the
conjunctions: and
prepositions: away, for, here, in, to
adverbs: up, down, not, where
adjectives: big, red, blue, yellow, funny, little, one, two, three
verbs: can, come, find, go, help, is, jump, look, make, play, run, said, see
pronouns: I, it, me, my, we, you

But this organization raises more questions. Where’s the article an? Why aren’t there any nouns? Why do the pronouns not include he, she, and they?

The answer is historical.

The 40 words are from a 1936 list put together by Edward William Dolch and published in 1948. Dolch favored a whole-word approach to learning language, as opposed to the then current approach of starting to learn to read from knowledge of letter sounds and grammar. It was probably the first attempt to quantify and qualify the words that young learners of English should be expected to acquire. The above words accompany a list of the first 95 nouns a student would be expected to learn, but those 95 feature 15 animal names. The reason for such a high proportion is because Dolch drew his list from popular children’s books of the day. For this reason, a few of the terms are relatively low frequency, for example, Christmas and Santa Claus.

Although the Dolch word list is mostly an academic footnote now, the rationale and collection methodology endure. In fact, they have become both more popular and greatly enhanced by computer-based corpus analysis that allows a quantification of millions of words at a time culled from various publications like newspapers and transcripts of audio and video programs. From the perspective of the language teacher, the aim remains the same: to determine the most frequent words a learner is likely to encounter and preemptively teach them.

You can use free online software to input large collections of text to see the frequency of certain words. Based on their statistical significance—specifically how much more often they appear on a list than other words—a word comes to be known as a keyword. But as there’s no set formula for determining the cutoff for this significance; it’s up to the researcher to decide which words are important. Typically, a corpus analyst (that could be you!) cuts off the top and bottom 10 percent of the words, as they’re likely to be too common on one hand or too obscure on the other.

Preparing for a recent talk on second-language teaching methodologies to Bulgarian teachers at a Pearson-sponsored event in Sofia, I input a 1,400-word version of the Grimm Brothers’ story “Rapunzel” of “let down your long hair” fame. In its simplest form, the story revolves around a pregnant woman who convinces her husband to steal from a neighbor’s garden. When the garden’s owner—a witch—catches the husband, he’s forced to promise her his unborn daughter. When the girl reaches puberty, the jealous witch seals her in a tall, isolated tower. A prince finds the innocent Rapunzel and complications arise.

As expected, the version I selected heavily featured the ten most common keywords in English: the, be, to, of, and, a, in, that, have, I. The word the, for example, occurs 85 times in the Rapunzel story while many other words only appear once or twice. An unexpected find, however, was the appearance of the word son, used a significant eight times. Son? I didn’t remember any son character. But re-reading this version of the story made it clear: Rather than use the word prince, the writer settled on the phrase king’s son.


The writer presumably came across a list compiled by someone that concluded that the word prince was a low-frequency word and king’s son was a reasonable replacement.

This choice points to one of the big issues in keyword identification: Beyond frequency, how do you decide what words should and should not be on a list to be taught to language learners? Four other measures are whether a word is:

  • vital to meaning;
  • challenging to understand or paraphrase;
  • a useful concept;
  • important to a semantic field.

Vital to meaning
In the “Rapunzel” story, there are many low-frequency words that, if omitted, would make the story difficult to understand. These include the name Rapunzel itself, a German word for the salad herb rampion. Other less-frequently used words are tower and thorns.

Challenging to understand or paraphrase
The word pregnant might be considered unsuitable for young learners, and many versions of the story gloss over the fact that Rapunzel is carrying twins shortly after meeting the prince. But where the idea of pregnancy is kept as a part of the story, paraphrasing the word pregnancy with euphemisms such as being with child, in the family way, or in her confinement only leads to misunderstanding among language learners. In this case, it is not a new concept, but rather one that any older learner would mentally translate.

A useful concept
While the word climb and the phrases climb up and climb down might suffice, the author of this story uses the words ascend and descend, both for the father entering the witch’s garden and for the witch and the prince going up and down the tower, although other synonyms are used for variety, including clambered. An editor might narrow these choices to ones that were easier for a novice reader to learn and use.

Important to a semantic field
Some words help populate particular semantic fields. The phrase once upon is not typically found in academic English or business English, but it is extremely common in the fairy-tale genre to mark the start of a story: Once upon a time . . . . Similarly, the height of the tower is stated in ells, a medieval measure of distance; one ell equals 1.143 meters. The use of ells adds authenticity although likely at the expense of intelligibility.

Many corpuses specialize in particular genres, collected for analysis of frequency. The Longman Communication 3000, for example, is based on a review of 390 million words. The 3,000 most frequently used headwords for learners of English is enough for one to understand 86 percent of what is normally read or heard in English. Other lists, like Averil Coxhead’s Academic Word List (AWL), looks at those words most likely to be encountered by university students. However, any such list can seem arbitrary at times. For example, the AWL includes hypothesis and theory, but not the word likely to bridge the two: experiment.

Corpus analysis of keywords can also be used in the opposite way, finding out how a word has been used in different contexts. The word baroque, for example, turns out to have applications in architecture, carpets, city planning, décor, gardens, music, painting, and sculpture.

There are odd categories, such as musical metaphors, as in Angela Carter’s description of the sound of a carriage, as cited in the British National Corpus:

It rattled along the road, waking a baroque concerto of echoes from rotting red-brick gables, and somewhere, in a front-room shrouded with dirty net curtains, a baby began to cry. (Carter, 1993)

By having students explore keywords in a corpus, they can build their schema, learning the subtle applications of a new word. Rapunzel’s father could have done so and learned more about the word refuse.

It would have saved so much trouble.

Tasks for Teachers

1. Collect a large corpus of your students’ writing, preferably already in digital form. Paste the corpus into a corpus analysis generator such as Lextutor KeyWords Extractor (http://www.lextutor.ca/key/). The program will produce a list of how often individual words are used and illustrate the contexts in which each appears. In your students’ writings, look for repeated errors, such as misspelled words and words used incorrectly. Consider ways to teach them.

2. Look at the vocabulary you are currently teaching. Is it useful? Are the words high frequency, vital to meaning, challenging to understand or paraphrase, useful concepts or important to a semantic field? If your vocabulary items touch none of these categories, you may be focusing on low frequency words which students are unlikely to use or remember.

Tasks for Learners

1. Start with a low-frequency keyword from one of your textbooks or assignments. Draw a schemata, or mind map, for all the meanings of the word. When you can’t add any more, switch your mind map with other students and let them add to the schemata while you add to their schemas. See how much you can learn about the multiple meanings of new words.

2. For a week, keep a list of new words you learn and where you learn them. For example, new words may appear in your textbooks, in popular culture, in conversations, and in novels or other books you read. At the end of the week, examine why learning each word might be useful or not.


Dr. Ken Beatty, teacher trainer, writer, and TESOL Professor, has promoted best teaching and learning practices from primary through university levels in 300+ sessions in 27 countries. He’s author of 130+ textbooks, including books in the Pearson series Learning English for Academic Purposes (LEAP).