English: genre classification, resource scarce languages, text classification algorithm, term frequency, inverse documents frequency
Afrikaans: genreklassifikasie, hulpbronskaars tale, inverse dokumentfrekwensie, teksklassifikasiealgoritme, termfrekwensie
English: In this article we present research on the development of automatic genre classification systems for resource scarce languages. The main approaches to text classification from literature are presented and weighed against each other during an experimental phase; to identify the most appropriate text classification approach to be used as a genre classification system. A fixed feature set is extracted for seven classes from the available training data for each of the six languages under scrutiny and paired with each classification algorithm in order to test the algorithms’ performance. The algorithm showing the best results is support vector machines; in conjunction with term frequency and inverse document frequency features.
On: Afrikaans; Sepedi; Sesotho; Setswana; isiXhosa and isiZulu