Snyman, Dirk P., Gerhard B. Van Huyssteen, and Walter Daelemans. 2014. “Outomatiese genreklassifikasie vir Afrikaans [Automatic genre classification for Afrikaans].”
Snyman, Van Huyssteen & Daelemans 2014
Abstract
When working in the terrain of text processing; metadata about a particular text plays an important role. Metadata is often generated; using automatic text classification systems which classify a text into one or more predefined classes or categories based on its contents. One of the dimensions by which a text can be can be classified; is its genre. In this study the development of an automatic genre classification system in a resource scarce environment is postulated. This study aimed to investigate the techniques and approaches that are generally used for automatic genre classification systems; and identify the best approach for Afrikaans (a resource scarce language). With the development of an automatic genre classification system; there is a set of variables that must be considered as they influence the performance of machine learning approaches (i.e. the algorithm used; the amount of training data; and data representation as features). If these variables are handled correctly; an optimal combination of them can be identified to successfully develop a genre classification system. In this article a genre classification system is being developed by using the following approach: The implementation of a MNB algorithm with a bag of words approach feature set. This system provides a resultant f-score (performance measure) of 0.929.
Written in:
Afrikaans
Dealing with:
Afrikaans
Keywords
Afrikaans, genre, genre classification, human language technology, machine learning
Afrikaans keywords
Afrikaans, genre, genreklassifikasie, masjienleer, mensetaaltegnologie