A New Academic Word List

The most important words for understanding academic text

Introduction to the New Academic Word List

The first version of the New Academic Word List (NAWL 1.0) is finished and available at the following links. The list is available in alphabetical order, with inflected forms, and with Standard Frequency Indices (Carroll, Davies, & Richman, 1971; Carroll, 1971).

  • How many words are in the NAWL?

The  NAWL is a list of 963 words derived from an academic corpus containing about 288 million words.

  • What are the components used in building the corpus?

The first major component of the corpus is the Cambridge English Corpus. This corpus is comprised of academic journals and non-fiction, student essays, and academic discourse. According to Cambridge English Corpus website, the corpus includes “text from academic books and journals from the UK and US covering a wide range of disciplines and topics.” This part of the Corpus contained about 249 million words (see Figure 1) and yielded a single word list with frequencies.

Figure 1

Figure 1

The rest of the corpus was assembled from other sources. The oral part of the corpus used was compiled from two corpora of spoken English, the Michigan Corpus of Academic Spoken English (MICASE), and the British Academic Spoken English (BASE) corpus. These corpora were put together and re-sorted into four categories, namely Arts and Humanities, Life Sciences, Social Sciences, and Physical Sciences, giving four separated word lists and frequencies.

Figure 2: Frequencies taken from the British Academic Spoken English (BASE) corpus and the Michigan Corpus of Academic Spoken English (MICASE).

Figure 2: Frequencies taken from the British Academic Spoken English (BASE) corpus and the Michigan Corpus of Academic Spoken English (MICASE).

The third and final part, Textbooks, was assembled from a corpus of published textbooks including many of the top 100 best-selling textbooks. The textbooks were also divided into the same four categories as the spoken English corpus and generated four word lists with frequencies.

The steps to get the list were as follows. The final corpus was analyzed using procedures outlined in Carroll, Davies, and Richman (1971) and Carroll (1971) to obtain D, an index of dispersion over the nine word lists, U, an adjusted estimation of the occurrence of the word frequency per million, and the SFI, the Standard Frequency Index. The words from the New General Service List (NGSL) were accounted for and a list of candidates was selected based on frequency, dispersion, and appropriateness.

References

Carroll, J. B. (1971). Statistical analysis of the corpus. In The American Heritage Word Frequency Book (pp. xxi – xl). Boston: Houghton Mifflin.

Carroll, J. B., Davies, P., & Richman, B. (1971). Guide to the alphabetical list. In The American heritage word frequency book. Boston: Houghton Mifflin.