Van Huyssteen & Tiberius 2023

Van Huyssteen, Gerhard B., and Tiberius, Carole. 2023. Towards a lexical database of Dutch taboo language. Paper presented at eLex 2023, Brno, Czech Republic. https://elex.link/elex2023/

Design document: Version 2.0.0 available for commenting

English: Dutch, e-lexicography, lexical database, ontology, profanity, swearing, taboo language

Afrikaans: e-Leksikografie, leksikale databasis, Nederlands, ontologie, skel, taboetaal, vloek

English: Over the past 45 years, at least eighteen Dutch paper-based dictionaries of taboo-language (or taboo-related language) have been published (i.e., as visible works of lexicography). However, none of these are available as (linked) lexical data that could be integrated in natural language processing (NLP) tools and applications (i.e., as invisible works of lexicography). In this paper, we describe the development of a comprehensive lexical database of taboo language (LDTL) for Dutch (TaboeLex) that can be integrated in NLP tools and applications. TaboeLex will be made available as open data, i.e., as a freely available, structured, annotated lexicon that can be linked to other data in the future. The paper focusses on the first phase of the project, namely, to define and design TaboeLex.


Afrikaans: Oor die afgelope 45 jaar is ten minste agtien Nederlandse papiergebaseerde woordeboeke van taboetaal (of taboe-verwante taal) gepubliseer (d.w.s. as sigbare werke van leksikografie). Nie een hiervan is egter beskikbaar as (gekoppelde) leksikale data wat in natuurliketaalprosesseringshulpmiddels en -toepassings (d.i. as onsigbare werke van leksikografie) geïntegreer kan word nie. In hierdie referaat beskryf ons die ontwikkeling van ‘n omvattende leksikale databasis van taboetaal (LDTT) vir Nederlands (TaboeLex) wat in natuurliketaalprosesseringshulpmiddels en -toepassings geïntegreer kan word. TaboeLex sal as oop data beskikbaar gestel word, dit wil sê as ‘n vrylik beskikbare, gestruktureerde, geannoteerde leksikon wat in die toekoms aan ander data gekoppel kan word. Die referaat fokus op die eerste fase van die projek, naamlik om TaboeLex te definieer en ontwerp.

In: English

On: Dutch