Stats calculators: Word frequency classes

Van Huyssteen, Gerhard B. 2021. “Stats calculators: Word frequency classes.” https://gerhard.pro/software/stats-calculators-word-frequency-classes/.

Introduction

Here I provide two calculators to determine word frequency classes: The one a relative frequency class (N) based on Perkuhn et al. (2012), and the other one a logarithmic Zipfian scale (Z) based on Van Heuven et al. (2014). Both calculators need as input the frequency of the word (or multiword item) F(n) in a corpus. The N calculator also requires:

  1. the frequency of the most frequent word F(m) in that corpus.

break

The Zipfian calculator also requires:

  1. the number of word tokens F(N) in the corpus; and
  2. the number of word types F(V) in the corpus.

break

In another post, you can also find an “Afrikaans version” of the calculators below, plus some additional statistics. These calculators already have the frequency of the most frequent word and the number of word types included, based on the frequency counts in the corpora that are available in the VivA Korpusportaal. These frequencies/numbers are updated regularly.

Generic calculator

Links

A multitude of online calculators for corpus linguistics are available, such as Lancaster Stats Tools online, and Paul Rayson’s Log-likelihood and effect size calculator (to name but a few).

References

  • Perkuhn, R., Keibel, H. & Kupietz, M. 2012. Korpuslinguistik. Paderborn: Wilhelm Fink Verlag.
  • Van Heuven, W. J. B., P. Mandera, E. Keuleers, and M. Brysbaert. 2014. “Subtlex-UK: A new and improved word frequency database for British English.” Quarterly Journal of Experimental Psychology 67: 1176-1190.

 

In addition to the descriptions by the original authors, you can find descriptions in Afrikaans in the following publications:

English: calculator; corpus linguistics; online calculator; statistics; word frequency; word frequency class; Zipf; Zipf scale

Afrikaans: aanlyn berekenaar; berekenaar; korpuslinguistiek; statistiek; woordfrekwensie; woordfrekwensieklas; Zipf; Zipfskaal

English: Two calculators are provided to determine word frequency classes: The one a relative frequency class (N), and the other one a logarithmic Zipfian scale (Z).


Afrikaans: Twee berekenaars om woordfrekwensieklasse te bepaal, word voorsien: Die een is ‘n relatiewe frekwensieklas (N), die ander ene ‘n logaritmiese Zipfskaal (Z).

In: English

On: