Schlünz, Georg I., Etienne Barnard, and Gerhard B. Van Huyssteen. 2010. “Part-of-speech effects on text-to-speech synthesis.” Proceedings of the 2010 Conference of the Pattern Recognition Association of South Africa:257-262.

  • Files
  • Keywords
  • Abstract
  • Languages

English: Afrikaans, human language technology, machine learning, POS tagger, POS tagging

Afrikaans: Afrikaans, masjienleer, mensetaaltegnologie, woordsoortetiketteerder, woordsoortetikettering

English: One of the goals of text-to-speech (TTS) systems is to produce natural-sounding synthesised speech. Towards this end various natural language processing (NLP) tasks are performed to model the prosodic aspects of the TTS voice. One of the fundamental NLP tasks being used is the part-of-speech (POS) tagging of the words in the text. This paper investigates the effects of POS information on the naturalness of a hidden Markov model (HMM) based TTS voice when additional resources are not available to aid in the modelling of prosody. It is found that; when a minimal feature set is used for the HMM context labels; the addition of POS tags does improve the naturalness of the voice. However; the same effect can be accomplished by including segmental counting and positional information instead of the POS tags.

In: English

On: Afrikaans and English