The General Service List published by Michael West in 1953, despite its age, is still one of the most influential word lists available. It provides a list with details of the most common 2,000 words in English (depending on how 'word' is defined, it may actually contain 3,372 words). Unlike more recent lists, the GSL includes information about the various senses of words together with the percentage of use of each sense. This allows us to produce a frequency count of the different senses of the common words in English. Such a sense-based list is different from all the other word lists currently available (including the GSL itself) which provide frequency counts for word forms. A sense-based GSL is especially useful when considering polysemous words as the different senses of the words have very different ranks.
Creating a sense-based GSL
Although the GSL has been heavily criticised (e.g. it is based on data from the 1920s so is out of date, it is very inconsistent in its structure, some of the choices made in presenting words are very strange such as combining will and would into a single entry), it is the only data source I am aware of that gives detailed frequency information about the different senses. Making a sense-based GSL, however, requires several adaptations to the original data.
The basic data I used is the full list produced by James Dickins (available
To create the sense list from this:
• Headwords where subcategories of senses exist were deleted
• For words where the GSL gives no frequency data (most annoyingly the and be), proportions of use in the British National Corpus were used
• For words where no percentage data is given for different senses (notably, if), these were calculated from a random sample taken from the BNC
• Where percentages for the senses totalled more than 100%, adaptations were made based on a random sample from the BNC.
The full sense-based GSL, together with the form-based GSL list, as
an Excel file can be downloaded here.