Snyman, Van Huyssteen & Daelemans 2011

Snyman, Dirk P., Gerhard B. Van Huyssteen, and Walter Daelemans. 2011. “Automatic genre classification for resource scarce languages.” Proceedings of the 2011 Conference of the Pattern Recognition Association of South Africa, Vanderbijlpark, South Africa.

English: genre classification, resource scarce languages, text classification algorithm, term frequency, inverse documents frequency

Afrikaans: genreklassifikasie, hulpbronskaars tale, inverse dokumentfrekwensie, teksklassifikasiealgoritme, termfrekwensie

English: In this article we present research on the development of automatic genre classification systems for resource scarce languages. The main approaches to text classification from literature are presented and weighed against each other during an experimental phase; to identify the most appropriate text classification approach to be used as a genre classification system. A fixed feature set is extracted for seven classes from the available training data for each of the six languages under scrutiny and paired with each classification algorithm in order to test the algorithms’ performance. The algorithm showing the best results is support vector machines; in conjunction with term frequency and inverse document frequency features. 


Afrikaans: 

In: English

On: Afrikaans; Sepedi; Sesotho; Setswana; isiXhosa and isiZulu