Puttkammer & Van Huyssteen 2006

Puttkammer, Martin J., and Gerhard B. Van Huyssteen. 2006. “Automatic text segmentation of Afrikaans using memory-based learning.” Proceedings of the 2006 Conference of the Pattern Recognition Association of South Africa.  Pretoria: CSIR/Meraka.

English: Afrikaans, human language technology, machine learning, text segmentation

Afrikaans: Afrikaans, masjienleer, mensetaaltegnologie, tekssegmentering

English: A text segmentor for the identification of sentences; named entities; words; abbreviations and punctuation in Afrikaans texts is described in this paper. The task is viewed as an integrated annotation process; and a memory-based classifier is hence trained to perform the task. Compared to baseline results for other languages; the classifier performs quite well (overall f-score of 97.79% on the full tag set); especially in consideration of the relatively small training data set used. The paper con-cludes with directions for future research.


Afrikaans: 

In: English

On: Afrikaans