Puttkammer, Martin J., and Gerhard B. Van Huyssteen. 2006. “Automatic text segmentation of Afrikaans using memory-based learning.” Proceedings of the 2006 Conference of the Pattern Recognition Association of South Africa. Pretoria: CSIR/Meraka.
Puttkammer, Martin J., and Gerhard B. Van Huyssteen. 2006. “Automatic text segmentation of Afrikaans using memory-based learning.” Proceedings of the 2006 Conference of the Pattern Recognition Association of South Africa. Pretoria: CSIR/Meraka.
English: Afrikaans, human language technology, machine learning, text segmentation
Afrikaans: Afrikaans, masjienleer, mensetaaltegnologie, tekssegmentering
English: A text segmentor for the identification of sentences; named entities; words; abbreviations and punctuation in Afrikaans texts is described in this paper. The task is viewed as an integrated annotation process; and a memory-based classifier is hence trained to perform the task. Compared to baseline results for other languages; the classifier performs quite well (overall f-score of 97.79% on the full tag set); especially in consideration of the relatively small training data set used. The paper con-cludes with directions for future research.
Afrikaans:
In: English
On: Afrikaans