English: Afrikaans, compound, compound splitting, human language technology, morphology, spelling checker
Afrikaans: Afrikaans, kompositum, kompositumverdeling, mensetaaltegnologie, morfologie, samestelling, samestellingverdeling, speltoetser
English: Current spelling checkers for Afrikaans still do not provide full access to desired linguistic performance; especially with respect to high lexical recall and error precision. One of the main problems is that Afrikaans is an agglutinative language with a high lexical generative power using concatenative compound formation. This means that the lexicon in an Afrikaans spelling checker can never account for all possible compounds; and other means should therefore be sought to recognise valid compounds. In this article; we investigate two approaches to finding compound boundaries. First; we describe a longest string-matching algorithm; which searches for known words at the beginning and end of the compound. Next; a machine-learning approach using decision trees is implemented. Results of both approaches are presented; indicating that the longest string-matching algorithm outperforms the machine-learning approach. However; the machine-learning approach has many advantages over the longest string-matching algorithm. The article concludes with a discussion of the advantages and disadvantages of the systems; remaining problems and possible solutions.