Puttkammer & Van Huyssteen 2006

Puttkammer, Martin J., and Gerhard B. Van Huyssteen. 2006. “Automatic text segmentation of Afrikaans using memory-based learning.” Proceedings of the 2006 Conference of the Pattern Recognition Association of South Africa.  Pretoria: CSIR/Meraka.


Download PDF

DOI


Abstract

A text segmentor for the identification of sentences; named entities; words; abbreviations and punctuation in Afrikaans texts is described in this paper. The task is viewed as an integrated annotation process; and a memory-based classifier is hence trained to perform the task. Compared to baseline results for other languages; the classifier performs quite well (overall f-score of 97.79% on the full tag set); especially in consideration of the relatively small training data set used. The paper con-cludes with directions for future research.

Written in:

English

Dealing with:

Afrikaans

Keywords

Afrikaans, human language technology, machine learning, text segmentation

Afrikaans keywords

Afrikaans, masjienleer, mensetaaltegnologie, tekssegmentering