Van Huyssteen & Puttkammer 2007

Van Huyssteen, Gerhard B., and Martin J. Puttkammer. 2007. “Accelerating the annotation of lexical data for less-resourced languages.” Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007):1505-1508.


Download PDF

DOI


Abstract

The development of digital resources is an expensive and time-consuming endeavor; especially in the case of less-resourced languages. In this paper; we describe a freely  available; open-source system; called TurboAnnotate; for  bootstrapping linguistic data for machine-learning purposes,  or for manually creating gold standards or other annotated  lists. A detailed description of the design and functionalities  of the tool is given, focusing on how the requirements of end-users are being addressed through it. It is indicated that  TurboAnnotate does not only promise to help increase the  accuracy of human annotators, but also to save enormously on  human effort in terms of time.

Written in:

English

Dealing with:

Afrikaans and Setswana

Keywords

machine learning, bootstrapping, linguistic data, TurboAnnotate, Afrikaans, Setswana

Afrikaans keywords

Afrikaans, masjienleer, Setswana, skoenlussteekproefneming, taalkundige data, TurboAnnotate