Blog

  • Eiselen & Van Huyssteen 2023

    Eiselen, Roald, and Van Huyssteen, Gerhard B. 2023. A comparison of statistical tests for Likert-type data: The case of swearwords. Journal of Open Humanties Data 9:1–13. DOI: https://doi.org/10.5334/johd.132.

  • Van Huyssteen et al 2023b

    Van Huyssteen, Gerhard B., Rabé, Monique, and Puttkammer, Martin J. 2023. Ouderdoms- en inhoudsadvies vir Afrikaanse boeke vir kinders: resultate van ’n eerste kwalitatiewe en kwantitatiewe ondersoek [Age and content advisories for Afrikaans children’s books: Results of a first qualitative and quantitative investigation]. LitNet Akademies (Geesteswetenskappe) 20:185–212. https://doi.org/10.56273/1995-5928/2023/j20n1b7.

    Dataset

    Dataset: Age and content advisories for Afrikaans books for children [xlsx]

  • Van Huyssteen et al 2023c

    Van Huyssteen, Gerhard B., Eiselen, Roald, and Du Toit, Jaco. 2023. A dataset of self-reported attitudes to Afrikaans swearwords. Journal of Open Humanties Data 9:1–8. DOI: https://doi.org/10.5334/johd.127.

    Dataset

    Afrikaans swearword scores [zip]

  • Van Huyssteen & Tiberius 2023

    Van Huyssteen, Gerhard B., and Tiberius, Carole. 2023. Towards a lexical database of Dutch taboo language. Paper presented at eLex 2023, Brno, Czech Republic. https://elex.link/elex2023/

    Design document: Version 2.0.0 available for commenting


    Download XLSX

  • Van Huyssteen et al 2023a

    Van Huyssteen, Gerhard B., Breed, Adri, Butler, Anneke, Botha, Lande, Partridge, Maristi, and Pilon, Suléne. 2023. ʼn Metodologie vir die beskrywing van konstruksionaliseringsnetwerke: Konstruksies met [in] as gevallestudie [A methodology for the description of constructionalisation networks: Constructions with [in] as a case study]. Stellenbosch Papers in Linguistics Plus 67:29–42. DOI: 10.5842/67-1-932.

    See downloadable PDF, RIS file, keywords, and abstract at the end of this post (below figures).

    Datastelle

    Datastel 1: [in]-vloekwoordkonstruksies [xlsx]

    Figure in artikel

    Klik op ‘n figuur vir ‘n groter weergawe.


    Figuur 2: Polisemiese netwerk van “in” in HAT

    Figuur 3: Polisemiese netwerk van “in” in WAT

    Figuur 4: Polisemiese netwerk van “in” in korpusdata

    Figuur 5: Netwerk van “in”-konstruksies met vloekwoorde

    Figuur 6: Geamalgameerde polisemiese netwerk van “in”
  • Van Huyssteen, Puttkammer & Rabé 2022

    Van Huyssteen, Gerhard B., Puttkammer, Martin J., and Rabé, Monique. 2022. Do Afrikaans digital parents want content and age advisories for books?  Digital Humanities in precarious times. Vanderbijlpark, South Africa: North-West University.

  • Van Huyssteen 2022d

    Van Huyssteen, Gerhard B. 2022. Drie duisend f-bomme en granate: Temas in vergelykende vloekkunde [Three thousand f-bombs and grenades: Themes in comparative maledictology]. Keynote speaker: Conference of the Suid-Afrikaanse Vereniging vir Neerlandistiek [South African Society for Dutch Studies]. Nijmegen: Radboud University.

  • Van Huyssteen 2022c

    Van Huyssteen, Gerhard B. 2022. Suid-Afrikaanse Taalkunde: Vloek in Afrikaans [South African Linguistics: Swearing in Afrikaans]. Guest lectureship. Leiden: Leiden University.

     

  • Van Huyssteen 2022b

    Van Huyssteen, Gerhard B. 2022. Kan godslastering ooit humoristies wees? [Can blasphemy ever be humourous?]. Vloekcoza research blog. https://vloek.co.za/blogs/navorsing/kan-godslastering-humoristies-wees.

  • Van Huyssteen 2022a

    Van Huyssteen, Gerhard B. 2022. Wat ons van ‘fok’ weet (en nie weet nie) [What we (don’t) know about fok]. LitNet Akademies (Geesteswetenskappe) 19:428-452.

  • Virtuele Instituut vir Afrikaans 2022a

    Virtuele Instituut vir Afrikaans (VivA). 2022. Sintaksisterminologie vir die Algemene Afrikaanse Grammatika [Syntax terminology for the Algemene Afrikaans Grammatika]. 1.0 edn. https://viva-afrikaans.org. Available: https://gerhard.pro/publications/viva2022a.

  • Van Huyssteen & Eiselen 2021a

    Van Huyssteen, Gerhard B, and Roald Eiselen. 2021. “Oor feekse en helleveë [On shrews and harridans].” Tydskrif vir Geesteswetenskappe 61 (4-1):1129-1155. doi: doi.10.17159/2224-7912/2021/v61n4-1a9.

  • Eiselen & Van Huyssteen 2021

    Eiselen, Roald, and Gerhard B. Van Huyssteen. 2021. “Using ordinal logistic regression to analyse self-reported usage of, and attitudes towards swearwords.” International Conference of the Digital Humanities Association of Southern Africa 2021, Virtual, 29 November to 3 December.

  • Van Huyssteen & Pilon 2021

    Van Huyssteen, Gerhard B., and Suléne Pilon. 2021. “Standaardisering as ’n produk van die tydsgees [Standardisation as a product of the Zeitgeist].” Ontlaering – Geworteldheid: Die onderrig van Afrikaans in spesifieke ruimtes, Virtual, 29-30 April.

  • Van Huyssteen 2021a

    Van Huyssteen, Gerhard B. 2021. “Swearing in South Africa: Multidisciplinary research on language taboos.” International Conference of the Digital Humanities Association of Southern Africa 2021, South Africa, 29 November to 3 December.

  • Van Huyssteen & Eiselen 2021b

    Van Huyssteen, Gerhard B. & Eiselen, Roald. 2021. How Afrikaans women became fierce-tempered. Zürich Workshop on Afrikaans Linguistics. Zürich, Switzerland. 4-5 October. https://vloek.co.za/leesstof/kongresmateriaal/fierce-tempered-afrikaans-women.

  • Van Huyssteen 2021b

    Van Huyssteen, Gerhard B. 2021. When as word is befok. Afrikaans Grammar Workshop III. Amsterdam, The Netherlands. 29 September – 1 October. https://vloek.co.za/leesstof/kongresmateriaal/when-a-word-is-befok-agw-2021.

  • Van Huyssteen 2019

    Van Huyssteen, Gerhard B. 2019. “Vloek Afrikaanssprekendes regtig? Betroubaarheid van ’n eerste grootskaalse meningspeiling se resultate.” Vloek.co.za. https://vloek.co.za/blogs/navorsing/vloek-afrikaanssprekendes-regtig.

  • Van Huyssteen 2020

    Van Huyssteen, GB. 2020. Sake voortspruitend: ’n Reaksie op Michael le Cordeur se hervormingstrategie vir die Afrikaans-skoolkurrikulum. Research blog. LitNet Akademies en skole. 04 November. https://www.litnet.co.za/sake-voortspruitend-n-reaksie-op-michael-le-cordeur-se-hervormingstrategie-vir-die-afrikaans-skoolkurrikulum/.

  • What the swearword?!

    Project: What the Swearword?! Multidisciplinary research and scientific
    communication on cursing

    OVERVIEW

    Swearing is a fascinating phenomenon that not only gives us deep insights in human cognition and neurophysiology, but also in social interactions and power dynamics. However, very little multidisciplinary research has been done on swearing in the South African context – a lacuna that this project aims to fill with insights from the Digital Humanities, and with inputs from and implications to linguistics, literary studies, journalism and communication studies, psychology, sociology, law, philosophy and ethics, cultural anthropology and history, pediatrics, neurology and other neurosciences.

    In several sub-projects, we will answer questions like the following:

    • If a website contains swearing, what legal obligations does the owner/developer have?
    • Should parents protect their children from hearing swear words?
    • What is the best way to determine objective offensiveness ratings for swearwords, e.g., to determine advisories for films and/or books?
    • How does it happen that an Afrikaans word like be·fok (a verbalized form of fuck) can mean, among others, both ‘good’ (as in Dit was nou befok gewees! ‘That was really fucking A’), and ‘angry’ (as in Hy is al weer befok! ‘He is once again fucked off!’)?
    • How is swearing used as a linguistic innovation that causes short-term and/or rapid language change?
    • What are the views on swearing of writers, dramatists, poets, TV and film makers, producers, directors, actors, musicians, editors, journalists, podcasters, bloggers?
    • How and why do these content creators apply self-censorship with regards to swearing? What is the impact of cancel culture on their language usage in the content they create?
    • What is the interaction between swearing and societal change?
    • What is the neurological impact when someone hears a racial, homophobic, or sexist slur?

    A secure, technology-rich, end-user facing project website will be set up to create awareness of and cultivate new collaborations on the project, collect usage-based data (both corpus data, and questionnaire/poll data), and to experiment with engaging and contemporary ways (like blogs, podcasts, and webinars) to communicate with the public and the scientific community alike.

    AIMS

    The primary aim of this proposed research project is to address the lacuna in knowledge on and understanding of swearing in the South African context through:

    • linguistic research (focusing foremostly on Afrikaans and languages in its ecosystem); and
    • multidisciplinary research.

    The secondary aim of this proposed project is to investigate alternative, contemporary opportunities of scholarly communication, specifically focussing on podcasts, blogs, videos, and webinars. Specifically, the research focus will be on how to:

    • incorporate and integrate peer-reviewing in such communication channels;
    • utilise such means to stimulate multidisciplinary interest and foster new collaborations;
    • use these channels to enable and fast-track research (e.g. increasing respondent participation); and
    • employ these channels to develop new sources of funding.

    The topic of swearing has been chosen carefully to either support or otherwise hinder some of these secondary goals: It should, on the one hand, be a popular and accessible topic in both expert and non-expert communities (supporting secondary foci (a) to (c)), while on the other hand being potentially contentious to obtain funding (hindering secondary focus (d)).

    Other aims include:

    • to report on the research and development process in the form of:
      • 70 blogs (at least);
      • 50 podcasts (at least, of which 30 will be interviews with content creators);
      • 3 webinars (at least);
      • 1 database: Afrikaans Vloekepedia;
      • 1 corpus: Afrikaans Twitter corpus (but also other corpora that might be developed; made available under an appropriate licence, and to be distributed by the South African Centre for Digital Language Resources);
      • 1 project website;
      • 4 scholarly articles or chapters in books, to be published in relevant South African or international journals and peer-reviewed conference proceedings;
      • 4 conference presentations; and
      • 2 Master’s dissertations;
    • to contribute towards human capital development and growth of the pool of experts in descriptive linguistics and computational linguistics in South Africa by offering bursaries, grants or contract work to:
      • one post-doctoral fellow;
      • two Master’s students;
      • twelve assistants (undergraduate students; four per year) for data collection and annotation;
    • to identify new research issues and problems as they unfold in the research and development process; and
    • to foster new collaborative networks for future multidisciplinary research.

    DURATION

    2019-ongoing

    FUNDED PARTIALLY BY:

    • Suid-Afrikaans Akademie vir Wetenskap en Kuns (South Africa)
    • North-West University (Potchefstroom, South Africa)

    Barter agreements with the following institutions also support the research:

    • BlueTek Computers, Potchefstroom (South Africa)
    • Afrikaans.com (South Africa)
    • WatKykJy.co.za (South Africa)

    The following institution contributed data to the project:

    • Woordeboek van die Afrikaanse Taal (WAT) (Stellenbosch, South Africa)
    • Handwoordeboek van die Afrikaanse Taal (HAT) (Cape Town, South Africa)
    • Centre for Text Technology (CTexT), North-West University (Potchefstroom, South Africa)

    PROJECT URLS

    PROJECT MEMBERS AND COLLABORATORS

    • Project leader
      • Gerhard B van Huyssteen (NWU)
    • Researchers
      • Liesbeth Augustinus (KU Leuven)
      • Lande Botha (NWU)
      • Adri Breed (NWU)
      • Anneke Butler (NWU)
      • Peter Dirix (KU Leuven)
      • Roald Eiselen (NWU)
      • Maristi Partridge (NWU)
      • Suléne Pilon (UP)
      • Martin Puttkammer (NWU)
      • Gerhard van Huyssteen (NWU)
    • (Guest) lecturers
      • Elmarie Claassens (private)
      • Hanlie Degenaar (NWU)
      • Roald Eiselen (NWU)
      • Tanja Gaustad (NWU)
      • Ankebe Kruger (NWU)
      • Greg Lamb (NWU/private)
      • Suléne Pilon (UP)
      • Martin Puttkammer (NWU)
      • Gerhard van Huyssteen (NWU)
    • PhD/MA students
      • Colette Combrink (NWU)
      • Benito Trollip (NWU)
      • Mart-Mari van der Merwe (UP)
    • Honours students
      • Bianca Gouws (NWU)
      • Carla Kershoff (NWU)
      • Corine Raath (NWU)
      • Maroné van Veijeren (UP)
      • Heidi Venter (NWU)
    • Undergraduate students
      • Simoné Koekemoer (NWU)
    • Web and social media editors
      • Colette Combrink (NWU; 2019/2020)
      • Monique Rabie (NWU)
    • Data collection, data analysis, and data processing
      • Willem Botha (WAT)
      • Colette Combrink (NWU)
      • André du Plessis (WAT)
      • Jaco du Toit (NWU)
      • Roald Eiselen (NWU)
      • Griffin (WatKykJy)
      • Jana Luther (NWU)
      • Corine Raath (NWU)
      • Gerhard van Huyssteen (NWU)
    • Website development
      • Eddie Dednam (BlueTek Computers)
      • Cornelius van der Walt (BlueTek Computers)
      • Gerhard van Huyssteen (NWU)
    • Graphic design
      • Sue de Kock (private)
    • Bloggers:
      • Elsabé Brits (private)
      • Griffin (private)
      • Riaan Grobler (private)
      • Mart-Mari van der Merwe (UP)
      • Gerhard van Huyssteen (NWU)
    • Podcasters:
      • Elmarie Claassens (private)
      • Gifford Peché (Decibel Studios)
      • Gerhard van Huyssteen (NWU)

    We would like to acknowledge the inputs of Liesbeth Augustinus and Peter Dirix (Catholic University of Leuven, Belgium) in the initial conceptualization of this project, as well as Suléne Pilon (UP) in the ongoing re-conceptualization of the project.

    ETHICAL CLEARANCE

    Ethical clearance for the research project was obtained through the Language Matters Ethics Committee of the NWU (ethics number: NWU-00632-19-A7). Additional ethics clearance for one of the master’s students was obtained from the Faculty of Humanities (UP), with reference number 16002360 (HUM017/0920).

    OUTPUTS

    PEER-REVIEWED PUBLICATIONS

    1. Van Huyssteen, Gerhard B, and Roald Eiselen. 2021. “Oor feekse en helleveë [On shrews and harridans].”  Tydskrif vir Geesteswetenskappe.
    2.  Van Huyssteen, Gerhard B. 2021. “Swearing in South Africa: Multidisciplinary research on language taboos.” International Conference of the Digital Humanities Association of Southern Africa 2021, South Africa, 29 November to 3 December.
    3. Eiselen, Roald, and Gerhard B. Van Huyssteen. 2021. “Using ordinal logistic regression to analyse self-reported usage of, and attitudes towards swearwords.” International Conference of the Digital Humanities Association of Southern Africa 2021, Virtual, 29 November to 3 December.
    4.  

    CONFERENCE PRESENTATIONS

    1. Van Huyssteen, Gerhard B. 2021. “Swearing in South Africa: Multidisciplinary research on language taboos.” International Conference of the Digital Humanities Association of Southern Africa 2021, South Africa, 29 November to 3 December.
    2. Eiselen, Roald, and Gerhard B. Van Huyssteen. 2021. “Using ordinal logistic regression to analyse self-reported usage of, and attitudes towards swearwords.” International Conference of the Digital Humanities Association of Southern Africa 2021, Virtual, 29 November to 3 December.
    3. Van Huyssteen, Gerhard B. & Eiselen, Roald. 2021. How Afrikaans women became fierce-tempered. Zürich Workshop on Afrikaans Linguistics. Zürich, Switzerland. 4-5 October. https://vloek.co.za/leesstof/kongresmateriaal/fierce-tempered-afrikaans-women.
    4. Van Huyssteen, Gerhard B. 2021. When as word is befok. Afrikaans Grammar Workshop III. Amsterdam, The Netherlands. 29 September – 1 October. https://vloek.co.za/leesstof/kongresmateriaal/when-a-word-is-befok-agw-2021.
    5. Van Huyssteen, Gerhard B. 2020. Wat de vloekwoord?! [What the swearword?!]. Jaarlikse Skrywersdag [Annual Writer’s Day]. Faculty of Education, North-West University, Potchefstroom, South Africa. 19 February.
    6. Van Huyssteen, Gerhard B. 2019. Vloek met flair [Swear with flair]. Virtual Institute for Afrikaans’s simposium on swearing. Pretoria, South Africa. 15 November.

    PEER-REVIEWED BLOGS

    1. Van Huyssteen, Gerhard B. 2019. “Vloek Afrikaanssprekendes regtig? Betroubaarheid van ’n eerste grootskaalse meningspeiling se resultate.” Vloek.co.za. https://vloek.co.za/blogs/navorsing/vloek-afrikaanssprekendes-regtig.
    2.  

    OTHER BLOGS

    1. Van Huyssteen, Gerhard B. 2019. “Waar kom die woord “testikel” vandaan?” Vloekcoza-blog. https://vloek.co.za/blogs/navorsing/waar-kom-die-woord-testikel-vandaan.

    PODCASTS

    1. Wat de vloekwoord?!

    RESOURCES

    1.  
  • Van Huyssteen & Verhoef 1998

    Van Huyssteen, Gerhard B., and Marlene Verhoef. 1998. “Die diachronie van antesedentherhaling in Afrikaans [The Diachronics of Antecedent Repetition in Afrikaans].” Southern African Society for Dutch Studies Conference, University of Cape Town, Cape Town.

  • Van Huyssteen 2021

    Van Huyssteen, Gerhard B. 2021. Afrikaans stoplist for corpus research 1.0. Potchefstroom: Centre for Text Technology (CTexT), North-West University.

  • Stats calculators: Frequency information in VivA’s Afrikaans corpus collection

    Van Huyssteen, Gerhard B. 2021. “Stats calculators: Frequency information in VivA’s Afrikaans corpus collection.” https://gerhard.pro/software/stats-calculators-frequency-info-viva/.

    Introduction

    Here I provide a number of word frequency calculators for some of the Afrikaans corpora in the Virtual Institute for Afrikaans’ (VivA) corpus portal. These calculators already have the frequency of the most frequent word and the number of word types included, based on the frequency counts in the corpora that are available in the VivA Korpusportaal. These frequencies/numbers are updated regularly.

    All of these calculators require as input the frequency of the word (or multiword item) F(n) in one or more of the corpora. For the tf-idf (term frequency–inverse document frequency) the number of documents in which the word occurs F(d) is also required. All these numbers can be obtained easily from VivA’s corpus portal.

    Based on the input, the following results are calculated automatically:

    1. Relative frequency class (N) based on Perkuhn et al. (2012), plus its interpreted frequency category based on Van Huyssteen’s (2017b) proposal.
    2. Zipfian scale (Z) based on Van Heuven et al. (2014)
    3. Frequency per million words (fpmw)
    4. Frequency per thousand words (fptw)
    5. Frequency relative to most frequent word (f(n)”) (also called strengthened frequency)
    6. Term frequency–inverse document frequency (tf-idf)

    break

    In another post, you can also find generic versions of the first two calculators (N and Z), which you can use with corpora not included in the list below.

    Instructions

    1. In VivA’s corpus portal, obtain the frequency (F(n)) of your search string (e.g. word form, lemma, etc.) in each of the corpora that you are interested in. If you want to calculate the tf-idf, also get the number of documents (F(d)) in which the word occurs in each corpus.
    2. For each corpus in the table below, enter the F(n) and/or F(d) in the white cells.
      • You don’t need to enter these frequencies for all the corpora – only for those you are interested in.
      • However, if you enter the frequencies of your search string in all the corpora, the total for that corpus collection will be calculated (in the row Total:). The two totals in the coloured rows are for either:
        • orange: corpora in VivA’s comprehesive corpus collection (omvattende versameling), excluding transcriptions of speech corpora; or
        • yellow: corpora in VivA’s exclusive corpus collection (eksklusiewe versameling), excluding corpora of historical texts.
      • The two corpora at the bottom (VSK and THT) are excluded from the calculations for the totals, since these copora contains texts that are fundamentally different from the other corpora.
    3. Copy table or relevant sections to your article.


    Abbreviations

    Links

    A multitude of online calculators for corpus linguistics are available, such as Lancaster Stats Tools online, and Paul Rayson’s Log-likelihood and effect size calculator (to name but a few).

    References

    • Perkuhn, R., Keibel, H. & Kupietz, M. 2012. Korpuslinguistik. Paderborn: Wilhelm Fink Verlag.
    • Van Heuven, W. J. B., P. Mandera, E. Keuleers, and M. Brysbaert. 2014. “Subtlex-UK: A new and improved word frequency database for British English.” Quarterly Journal of Experimental Psychology 67: 1176-1190.

    In addition to the descriptions by the original authors, you can find descriptions in Afrikaans in the following publications:

  • Stats calculators: Word frequency classes

    Van Huyssteen, Gerhard B. 2021. “Stats calculators: Word frequency classes.” https://gerhard.pro/software/stats-calculators-word-frequency-classes/.

    Introduction

    Here I provide two calculators to determine word frequency classes: The one a relative frequency class (N) based on Perkuhn et al. (2012), and the other one a logarithmic Zipfian scale (Z) based on Van Heuven et al. (2014). Both calculators need as input the frequency of the word (or multiword item) F(n) in a corpus. The N calculator also requires:

    1. the frequency of the most frequent word F(m) in that corpus.

    break

    The Zipfian calculator also requires:

    1. the number of word tokens F(N) in the corpus; and
    2. the number of word types F(V) in the corpus.

    break

    In another post, you can also find an “Afrikaans version” of the calculators below, plus some additional statistics. These calculators already have the frequency of the most frequent word and the number of word types included, based on the frequency counts in the corpora that are available in the VivA Korpusportaal. These frequencies/numbers are updated regularly.

    Generic calculator



    Links

    A multitude of online calculators for corpus linguistics are available, such as Lancaster Stats Tools online, and Paul Rayson’s Log-likelihood and effect size calculator (to name but a few).

    References

    • Perkuhn, R., Keibel, H. & Kupietz, M. 2012. Korpuslinguistik. Paderborn: Wilhelm Fink Verlag.
    • Van Heuven, W. J. B., P. Mandera, E. Keuleers, and M. Brysbaert. 2014. “Subtlex-UK: A new and improved word frequency database for British English.” Quarterly Journal of Experimental Psychology 67: 1176-1190.

    In addition to the descriptions by the original authors, you can find descriptions in Afrikaans in the following publications:

  • Breed et al. 2021

    Breed, Adri, Nadine Fouché, Nina Brink, Marlie Coetzee, Cecilia Erasmus, Sophia Kapp, Suléne Pilon, Gerhard B. Van Huyssteen, and Roné Wierenga. 2021. “Content developers as stakeholders in the blended learning ecosystem: The Virtual Institute for Afrikaans’ Language Education Portal as a case study.” In Re-Envisioning and Restructuring Blended Learning for Underprivileged Communities, edited by Chantelle Bosch, Dorothy Joy Laubscher and Lydia Kyei-Blankson, 124-144. Hershey: IGI Global.

  • MorfAf (Morfologie van Afrikaans): Deel 2 (2020)

    MorfAf (Morfologie van Afrikaans) is ‘n reeks oor die struktuur van komplekse Afrikaanse woorde. Deel 1 (2018) begin met ‘n verkenning van basisbegrippe (soos morfeme, affikse, stamme en wortels), terwyl verskeie moeiliker kwessies in deel 2 (2020) in meer besonderhede ondersoek word. Die doel is om nie net ‘n oorsig oor Afrikaanse woordbou te kry nie, maar ook om bekendgestel te word aan algemene begrippe in die morfologie, spesifiek konstruksiemorfologie. Analises van woorde is gebaseer op die skrifbeeld van die woorde (ortografie), asook op die herkoms van woorde (etimologie).

  • Strata in the development of Afrikaans

    The grammar and lexicon of Afrikaans can be divided into two primary strata (i.e. diachronic layers or etymological tiers), viz:

    1. a primary native stratum, i.e. Germanic, specifically Low Saxon-Low Franconian; and
    2. a primary non-native stratum, i.e. Classic, specifically Latin and Ancient Greek.

    However, a number of secondary strata could also be identified, viz:

    1. a secondary native stratum, namely an English stratum; and
    2. a secondary non-native strata, including:

    Lexical items from languages like French, Spanish, Italian, Russian, Japanese, Mandarin, Hindi, etc. normally entered the Afrikaans lexicon either via Dutch or English. For more information and references, see Taalportaal.

    For purposes of this post, I have compiled a sortable Excel spreadsheet (see below), listing the most important (but not only) languages in the development of Afrikaans. For each language, a general abbreviation that is often used in dictionaries, is provided, together with ISO codes (where available).

    I have also constructed a time line, stating an estimated beginning and end date for each language/variant. These languages and dates represent my own interpretation (and amalgamation) of various literature sources, as listed below.

    The following diagram provides a bird’s eye-view of the data.


    • Abbreviations are followed by a full-stop; language codes don’t have a full-stop.
    • Click on image to view it in more detail.  You can then also right-click on image to download it in high resolution.
    • It is also available here as SVG (for best results, right click on link and open in new tab/window).
    • Please reference this image as:
    • Van Huyssteen, Gerhard B. 2021. Strata in die ontwikkeling van Afrikaans. Version 1.0. Available: https://gerhard.pro/teaching/Afrikaans-strata
    • An English version could be made available per request.
    • Please send comments or suggestions to me via the Contact Me page. 

    Spreadsheet

    References

    I have consulted numerous articles on Wikipedia (English, Dutch, Afrikaans, German), starting with the article on Indo-European languages – Wikipedia.

    Beekes, Robert S. P. 2011. Comparative Indo-European Linguistics: An introduction. 2nd ed. Amsterdam: John Benjamins.

    Bloemhoff, Henk, and Nanne Streekstra. 2013. Basisboek historische taalkunde. Groningen: Kleine Uil.

    Campbell, Lyle. 2004. Historical Linguistics: An Introduction. 2nd ed. Edinburgh: Edinburgh University Press.

    Durkin, Philip. 2009. The Oxford Guide to Etymology. Oxford: Oxford University Press.

    Philippa, M., F. Debrabandere, A. Quak, Tanneke Schoonheim, and Nicoline Van der Sijs. 2003-2009. Etymologisch Woordenboek van het Nederlands. 4 vols. Amsterdam: Amsterdam University Press.

    Van Bree, Cor. 1987. Historische grammatica van het Nederlands. 2nd ed. Dordrecht: Foris Publications.

    Van Bree, Cor. 2016. Leerboek voor de historische grammatica van het Nederlands: Deel 1 – Gotische grammatica, Inleiding, Klankleer. 2nd ed. Leiden: Universiteit Leiden.

    Van Bree, Cor. 2020. Leerboek voor de historische grammatica van het Nederlands: Deel 2 – Flexie, Woordvorming. 2nd ed. Leiden: Universiteit Leiden.

    Van der Sijs, Nicoline. 2004. Taal als mensenwerk: het ontstaan van het ABN. Den Haag: SDU.

    Van Veen, P. A. F., and Nicoline Van der Sijs. 1997. Etymologisch woordenboek: de herkomst van onze woorden. 2e ed. Utrecht/Antwerpen: Van Dale Lexicografie.

    Van der Sijs, Nicoline. 2019. 15 eeuwen Nederlandse taal. Gorredijk: Sterck & De Vreese.

  • Etymology of food names

    GrootFM’s Know-vember 2020

    The following four soundclips were produced originally for GrootFM’s 2020 Know-vember campaign. I added some visuals and text to accompany the sound clips.

    https://youtu.be/myVFKI0_OUo

    Have you perhaps also grown up with one of the most delicious desserts out there? Thick milk (amasi) with sugar! Learn more about how thick milk leads to cottage cheese, and cottage cheese to triangular cottage cheese freezer tart.

    Is a koeksister and koesister the same thing? And is it spelled koeksister or actually koeksuster? And where does boerkaiing and koekmakranka fit into the picture?

    A few milk derivatives || (c) 2020 Van Huyssteen

    The 2017 AWS (Afrikaans word-list and spelling rules) acknowledged the words brownie (next to sjokoladebruintjie), and smoothie (next to gladdejantjie) for the first time as Standard Afrikaans words. Where do these words come from?

    Two South African favourites are Hertzoggies and Smutsies – two kinds of jam tartlets with a history as rich as the South African political history.

  • Van Huyssteen 2020b

    Van Huyssteen, Gerhard B. 2020. Afrikaans morphology. Taalportaal. Taalportaal Consortium: http://bit.ly/taalportaal-afr-morphology.

  • Van Huyssteen (ed.) 2020a

    Van Huyssteen, Gerhard B., ed. 2020. Taalportaal: Afrikaans phonology, morphology, and syntax. 1.0 ed. Taalportaal Consortium: http://bit.ly/taalportaal-afr.

  • MorfAf (Morfologie van Afrikaans): Deel 1 (2018)

    MorfAf (Morfologie van Afrikaans) is ‘n reeks oor die struktuur van komplekse Afrikaanse woorde. Ons skop in deel 1 af met ‘n verkenning van basisbegrippe (soos morfeme, affikse, stamme en wortels), waarna verskeie woordvormingsprosesse in deel 2 in meer besonderhede ondersoek word. Die doel is om nie net ‘n oorsig oor Afrikaanse woordbou te kry nie, maar ook om bekendgestel te word aan algemene begrippe in die morfologie, spesifiek konstruksiemorfologie. Analises van woorde is gebaseer op die skrifbeeld van die woorde (ortografie), asook op die herkoms van woorde (etimologie).



  • Taxonomy of Afrikaans part-of-speech categories


    This taxonomy represents all part of speech categories and sub-categories in Afrikaans, together with examples for each (sub-)category. The categorisation is based on the morphosyntactic features of a word, and not on semantics.

    • Click on image to view it in more detail.  You can then also right-click on image to download it in high resolution.
    • The English version is available here as SVG (for best results, right click on link and open in new tab/window).
    • An Afrikaans version is available here as PNG and here as SVG.
    • Please reference this image as:
      Van Huyssteen, Gerhard B. 2022. Part-of-speech categories. Version 1.3. Taalportaal. Available: https://taalportaal.org/taalportaal/topic/pid/topic-20200506074902722
    • Please send comments or suggestions to me via the Contact Me page.

    POS tagger tagset

    • The tagset below is widely used in corpora that has been tagged automatically with CTexT’s POS tagger.
    • During 2022/23, this tagset will be revised and aligned with the above taxonomy, so that end-users have a unified set of tags and terminology. Results will be updated here.
    AOA B.NW.oortreffend.attributief
    AOP B.NW.oortreffend.predikatief
    ASA B.NW.stellend.attributief
    ASP B.NW.stellend.predikatief
    AVA B.NW.vergrotend.attributief
    AVP B.NW.vergrotend.predikatief
    BO BW.oortreffend
    BS BW.stellend
    BV BW.vergrotend
    KN VG.neweskikkend
    KO VG.onderskikkend
    LB LID.bepaald
    LO LID.onbepaald
    NA S.NW.abstrak
    NEE EIE.eienaam.enkelvoud.basis
    NEED EIE.eienaam.enkelvoud.diminutief
    NEM EIE.eienaam.meervoud.basis
    NEMD EIE.eienaam.meervoud.diminutief
    NM S.NW.massanaam
    NME S.NW.maatnaam.enkelvoud.basis
    NMED S.NW.maatnaam.enkelvoud.diminutief
    NMM S.NW.maatnaam.meervoud.basis
    NMMD S.NW.maatnaam.meervoud.diminutief
    NSE S.NW.soortnaam.enkelvoud.basis
    NSED S.NW.soortnaam.enkelvoud.diminutief
    NSM S.NW.soortnaam.meervoud.basis
    NSMD S.NW.soortnaam.meervoud.diminutief
    NVE S.NW.versamelnaam.enkelvoud.basis
    NVED S.NW.versamelnaam.enkelvoud.diminutief
    NVM S.NW.versamelnaam.meervoud.basis
    NVMD S.NW.versamelnaam.meervoud.diminutief
    PA VNW.aanwysend
    PB VNW.betreklik
    PDHEB VNW.derde.manlik.enkelvoud.besitlik
    PDHEDP VNW.derde.manlik.enkelvoud.gemarkeerd.persoonlik
    PDHENP VNW.derde.manlik.enkelvoud.ongemarkeerd.persoonlik
    PDHEW VNW.derde.manlik.enkelvoud.wederkerend
    PDMB VNW.derde.meervoud.besitlik
    PDMP VNW.derde.meervoud.persoonlik
    PDMW VNW.derde.meervoud.wederkerend
    PDOENP VNW.derde.onsydig.enkelvoud.ongemarkeerd.persoonlik
    PDOEW VNW.derde.onsydig.enkelvoud.wederkerend
    PDVEB VNW.derde.vroulik.enkelvoud.besitlik
    PDVEDP VNW.derde.vroulik.enkelvoud.gemarkeerd.persoonlik
    PDVENP VNW.derde.vroulik.enkelvoud.ongemarkeerd.persoonlik
    PDVEW VNW.derde.vroulik.enkelvoud.wederkerend
    PEEB VNW.eerste.enkelvoud.besitlik
    PEEDP VNW.eerste.enkelvoud.gemarkeerd.persoonlik
    PEENP VNW.eerste.enkelvoud.ongemarkeerd.persoonlik
    PEEW VNW.eerste.enkelvoud.wederkerend
    PEMB VNW.eerste.meervoud.besitlik
    PEMP VNW.eerste.meervoud.persoonlik
    PEMW VNW.eerste.meervoud.wederkerend
    PO VNW.onbepaald
    PTEB VNW.tweede.enkelvoud.besitlik
    PTEDP VNW.tweede.enkelvoud.gemarkeerd.persoonlik
    PTENP VNW.tweede.enkelvoud.ongemarkeerd.persoonlik
    PTEW VNW.tweede.enkelvoud.wederkerend
    PTMB VNW.tweede.meervoud.besitlik
    PTMP VNW.tweede.meervoud.persoonlik
    PTMW VNW.tweede.meervoud.wederkerend
    PV VNW.vraend
    PW VNW.wederkerig
    RA R.afkorting
    RF R.formule
    RK R.akroniem.letterklankwoord
    RL R.akroniem.letternaamwoord
    RO R.ongeklassifiseerd
    RPF U.prefiks
    RS R.simbool
    RSF U.suffiks
    RV R.vreemdetaalwoord
    RWD U.Woorddeel
    SVS VS.voorsetsel
    THAB TW.hooftelwoord.adjektief.bepaald
    THAO TW.hooftelwoord.adjektief.onbepaald
    THBB TW.hooftelwoord.bywoord.bepaald
    THBO TW.hooftelwoord.bywoord.onbepaald
    THNB TW.hooftelwoord.naamwoord.bepaald
    THPB TW.hooftelwoord.voornaamwoord.bepaald
    THPO TW.hooftelwoord.voornaamwoord.onbepaald
    TRAB TW.rangtelwoord.adjektief.bepaald
    TRAO TW.rangtelwoord.adjektief.onbepaald
    TRBB TW.rangtelwoord.bywoord.bepaald
    TRBO TW.rangtelwoord.bywoord.onbepaald
    TRPB TW.rangtelwoord.voornaamwoord.bepaald
    TRPO TW.rangtelwoord.voornaamwoord.onbepaald
    UDS U.dis
    UE U.enklities
    UPB U.partikel.betreklik
    UPD U.partikel.deel
    UPE U.partikel.skakel
    UPG U.partikel.graad
    UPI U.partikel.infinitief
    UPO U.partikel.ontkenning
    UPS U.partikel.genitief
    UPV U.partikel.vergelyking
    UPW U.partikel.ww
    UXD U.eks/daar
    VTHOG WW.ongemarkeerd.hoof.onskeibaar.oorganklik
    VTHOK WW.ongemarkeerd.hoof.onskeibaar.koppel
    VTHOO WW.ongemarkeerd.hoof.onskeibaar.onoorganklik
    VTHOV WW.ongemarkeerd.hoof.onskeibaar.voorsetsel
    VTHSG WW.ongemarkeerd.hoof.skeibaar.oorganklik
    VTHSO WW.ongemarkeerd.hoof.skeibaar.onoorganklik
    VTUOA WW.teenwoordig.hulp.onskeibaar.aspek
    VTUOM WW.teenwoordig.hulp.onskeibaar.modaliteit
    VTUOP WW.teenwoordig.hulp.onskeibaar.modus
    VUOT WW.hulp.onskeibaar.tyd
    VVHOG WW.gemarkeerd.hoof.onskeibaar.oorganklik
    VVHOK WW.gemarkeerd.hoof.onskeibaar.koppel
    VVHOO WW.gemarkeerd.hoof.onskeibaar.onoorganklik
    VVHOV WW.gemarkeerd.hoof.onskeibaar.voorsetsel
    VVUOA WW.verlede.hulp.onskeibaar.aspek
    VVUOM WW.verlede.hulp.onskeibaar.modaliteit
    VVUOP WW.verlede.hulp.onskeibaar.modus
    W TSW.tussenwerpsel
    ZE U.sinseinde
    ZM U.sinmiddel
    ZPL U.links-parentese
    ZPR U.regs-parentese

    Acknowledgement

    • This taxonomy is the result of many years of discussions between me, Suléne Pilon, and Adri Breed; although we still disagree on aspects of the image (will two people ever agree on a full taxonomy of part-of-speech categories?), the intellectual property of this taxonomy is as much theirs as mine.
    • Over many years, but especially also in recent times, I have had many conversations with students and colleagues about these categories. I’m in debt to all of them, but specifically (alphabetically): Johanita Kirsten, and Bertus van Rooy.
    • All errors and fallacies remain mine.
  • Van Huyssteen 2019

    Van Huyssteen, Gerhardus B. 2019. Website: vloek.co.za. Available: https://vloek.co.za. Pretoria: Viridevert NPC.

  • Bibliography of Afrikaans morphology

    Last updated: 30 May 2022

    This bibliography contains research publications on Afrikaans morphology that was published since July 1989.

    For a comprehensive bibliography of the period prior to this, see Combrink (1990:427-435). 

    You can also search for ⟨morf*⟩ or ⟨morph*⟩ in the field [Enige Woord] in the Digital Bibliography of Afrikaans Linguistics. This will result in a rather comprehensive (but not necessarily complete) list of bibliographic references.

    Please send suggestions for changes and additions to the list below, or to morphology-related references in the the Digital Bibliography of Afrikaans Linguistics to Gerhard van Huyssteen.

    Section 1: Descriptive and theoretical morphology

    Section 2: Computational morphology

    Descriptive and theoretical morphology

    1. Beyer, Herman Louis. 1995. “Die leksikografiese hantering van morfologies gemerkte geslagsopposisie pare in Afrikaanse woordeboeke, met spesifieke verwysing na die Verklarende Handwoordeoek van die Afrikaanse Taal.” MA, Universiteit Stellenbosch.
    2. —. 1997. “Aard en leksikografiese hantering van sogenaamde geslagtelik neutrale lede van Afrikaanse geslagsopposisiepare.” South African Journal of Linguistics = Suid-Afrikaanse Tydskrif vir Taalkunde 15 (4): 107-115. https://doi.org/https://doi.org/10.1080/10118063.1997.9724119.
    3. Breed, Adri. 2012. “Die grammatikalisering van aspek in Afrikaans: ʼn Semantiese studie van perifrastiese progressiewe konstruksies [The grammaticalisation of aspect in Afrikaans: a semantic study of periphrastic progressive constructions].” PhD, Department of Afrikaans, North-West University, Local.
    4. —. 2016. “Aspek in Afrikaans: ‘n Teoretiese beskrywing.” Tydskrif vir Geesteswetenskappe 56 (1): 62-80. https://doi.org/10.17159/2224-7912/2016/v56n1a5. http://www.scielo.org.za/scielo.php?script=sci_arttext&pid=S0041-47512016000100005&lang=pt.
    5. Butler, Anneke. 2016. “Die deelwoord as ‘n ánder vorm van die werkwoord.” Tydskrif vir Geesteswetenskappe 56 (1): 81-101. http://www.scielo.org.za/scielo.php?script=sci_arttext&pid=S0041-47512016000100006&lang=pt.
    6. Carstens, Adelia. 1990. “Komposisionaliteitsbeginsel en die beskrywing van geleentheidskomposita.” South African Journal of Linguistics = Suid-Afrikaanse Tydskrif vir Taalkunde 8 (3): 99-106.
    7. —. 1992. “Kollokasies: vrye verbindings of lekseme?” South African Journal of Linguistics = Suid-Afrikaanse Tydskrif vir Taalkunde 10 (1): 1-11.
    8. Carstens, Adelia, and Piet H. Swanepoel. 1993. “Komposisionaliteit, motivering en die grammatika van Afrikaans.” South African Journal of Linguistics = Suid Afrikaanse Tydskrif vir Taalkunde 14: 1-78.
    9. Coetzee, Anna E. 1995. “Kaboems, kabolder, kerjakker, karbonkel, karfoefel: vanwaar die hele kaboedel? [Kaboems, kabolder, kerjakker, karbonkel, karfoefel: whence the whole caboodle?].” South African Journal of Linguistics = Suid-Afrikaanse Tydskrif vir Taalkunde, no. 28: 27-44. https://doi.org/http://dx.doi.org/10.1080/10118063.1995.9724028.
    10. —. 1999. “Woordstatus, spelling en betekenis.” Tydskrif vir Taalonderrig = Journal for Language Teaching 33 (4): 305-315.
    11. —. 2000. “Reduksieprosesse: produktiewe woordvormingsmiddele in Afrikaans.” Tydskrif vir Taalonderrig = Journal for Language Teaching 34 (4): 311-322. https://collections.nwu.ac.za/dbtw-wpd/textbases/bibliografie-afrikaans/documents-dbat/tydsktaalonderrig_des2000_311-322.pdf.
    12. —. 2001. “Van morfologie tot sintaksis: taalvorme van die Bolandse wingerdwerkers.” edited by A. Carstens and H. Grebe, 21-29. Pretoria: Van Schaik.
    13. —. 2002. “Die 2002-AWS en grammatikale subkategorieë.” Tydskrif vir Taalonderrig = Journal for Language Teaching 36 (3-4): 377-385.
    14. Coetzee, Anna E., and Joan Kruger. 2004. “Die Afrikaanse verkleinwoord 1 : ‘ n Morfo-semantiese grammatika.” Journal for Language Teaching 38 (2): 316-332. http://hdl.handle.net/10520/EJC59852.
    15. Combrink, Johan G. H. 1989. “Afrikaanse morfologie: ‘n oorsig.” In Inleiding tot die Afrikaanse taalkunde, edited by T. J. R. Botha, Fritz A. Ponelis, J. G. H. Combrink and F. F. Odendal, 220-254. Pretoria: Academica.
    16. —. 1990. Afrikaanse morfologie: Capita Exemplaria [Afrikaans morphology: Capita Exemplaria]. Pretoria: Academica.
    17. —. 1990. Inkorting van naamwoorde en naamwoordstukke. Pretoria: Linguistevereniging van Suider-Afrika.
    18. —. 1992. “Die kader vir beskrywing van inkorting: repliek op Prinsloo se ander perspektief.” South African Journal of Linguistics = Suid-Afrikaanse Tydskrif vir Taalkunde 10 (2): 81-89.
    19. Conradie, C. Jac. 1992. “Tempus in Afrikaanse Bybelvertalings.” South African Journal of Linguistics = Suid-Afrikaanse Tydskrif vir Taalkunde 10 (2): 60-67.
    20. —. 1998. “Preteritumverlies in vroeë Afrikaans.” Tydskrif vir Geesteswetenskappe 38 (1): 6-20.
    21. —. 1998. “Tempusgebruik in Afrikaanse narratiewe.” South African Journal of Linguistics = Suid-Afrikaanse Tydskrif vir Taalkunde 16 (2): 37-43.
    22. —. 2003. “Voltooide deelwoorde in verklarende Afrikaanse woordeboeke.” edited by W. F. Botha, 174-182. Stellenbosch: Buro van die WAT.
    23. —. 2004. “Ikonisiteit en Afrikaanse reduplikasie.” Tydskrif vir Taalonderrig/Journal for Language Teaching 38 (2): 334-348.
    24. —. 2010. “Het Nederlands en de standaardisering van het Afrikaanse werkwoordsysteem.” edited by M. Van der Wal and E. Francken, 67-83. Amsterdam: Stichting Neerlandistiek VU.
    25. —. 2011. “Is ‘regtig’ rêrig Duits ‘richtig? Is ‘regtig really German ‘richtig’?” Tydskrif vir Geesteswetenskappe 51 (4): 716-729. http://www.scielo.org.za/pdf/tvg/v51n4/v51n4a16.pdf.
    26. —. 2012. “The Dutch-Afrikaans participial prefixe ge-: a case of degrammiticalization?”, edited by A. Kemenade and N. De Haas, 129-154. Amsterdam: John Benjamins.
    27. Den Besten, Hans. 1989. “From Khoekhoe foreignertalk via Hottentot Dutch to Afrikaans: the creation of a novel grammar.” edited by Martin Pütz and Rene Dirven, 207-249. Frankfurt am Main: Peter Lang.
    28. —. 1996. “Associative DPs.” Linguistics in the Netherlands 13: 13-24. https://doi.org/10.1075/avt.13.04bes. https://benjamins.com/catalog/avt.13.04bes/fulltext.
    29. —. 2001. “The complex ancestry of the Afrikaans associative constructions.” edited by Adelia Carstens and Heinrich Grebe, 49-58. Pretoria: Van Schaik.
    30. —. 2012. “On the “verbal suffix” -UM of Cape Dutch Pidgin: morphosyntax, pronunciation and origin.” In Roots of Afrikaans: selected writings of Hans den Besten, edited by Ton Van der Wouden, 123-132. Amsterdam: John Benjamins.
    31. —. 2013. “Hulle weet nie eers hoe om ‘n baba te abba nie. Opmerkingen over het Afrikaans.” edited by Nicoline Van der Sijs, Jan Stroop and Fred Weerman, 237-245. Amsterdam: Uitgeverij Bert Bakker.
    32. Den Besten, Hans, Carla Luijks, and P.T. Roberge. 2003. “Reduplication in Afrikaans.” In Twice as meaningful. Reduplication in Pidgins, Creoles and other contact languages, edited by S. Kouwenberg, In Westminster Creolistics Series 8, 271–287. London: Battlebridge.
    33. Hantson, André. 2001. “Casusaanduiding in het Afrikaans en andere (Indo-) Germaanse talen : een vergelijkende schets.” Tydskrif vir Geesteswetenskappe 41 (1): 1-20.
    34. Jansen, Carel, Robert Schreuder, and Anneke Neijt. 2007. “The influence of spelling conventions on perceived plurality in compounds: a comparison of Afrikaans and Dutch.” Written Language & Literacy 10 (2): 185-194.
    35. Jenkinson, Alf G. 1991. “Strukturalistiese en generatiewe benaderings tot die morfologie : ‘n metodologiese vergelyking.” Acta Academica 23 (3): 120-138.
    36. —. 1993. “Die probleem van fleksie en afleiding in Afrikaans.” South African Journal of Linguistics = Suid-Afrikaanse Tydskrif vir Taalkunde Supplement 18: 100-122. http://collections.nwu.ac.za/dbtw-wpd/textbases/bibliografie-afrikaans/documents-dbat/sattaalkunde_nov1993_100-122.pdf.
    37. Klopper, R. M. 1989. “Attributiewe adjektiewe in Afrikaans.” South African Journal of Linguistics = Suid-Afrikaanse Tydskrif vir Taalkunde 7 (3): 105-111.
    38. Kotzé, Ernst. 2009. “Adjektiwiese verbuiging in Afrikaans herbesoek.” In Afrikaans. Een drieluik, edited by Hans Den Besten, Frans Hinskens and Jerzy Koch, 125-132. Amsterdam: Stichting Neerlandistiek VU.
    39. Kürschner, Sebastian. 2009. “Morphological non-blocking in Dutch plural allomorphy: a contrastive approach.” Language Typology and Universals 62 (4): 285-306. https://doi.org/https://doi.org/10.1524/stuf.2009.0022.
    40. Lamont, Andrew. 2017. “The small matter of the Afrikaans diminutive.” Proceedings of the Linguistic Society of America 2. https://doi.org/10.3765/plsa.v2i0.4076.
    41. Le Roux, Cecile. 1989. “On the interface of morphology and syntax: evidence from verb-particle combinations in Afrikaans.” MA, Department of General Linguistics, University of Stellenbosch.
    42. Lee, A. S. 1991. Die voegsel ‘-s’ in Afrikaans. Johannesburg: SAUK.
    43. Lubbe, Hendrik Johannes. 1993. “Die klempatrone van Afrikaans.” South African Journal of Linguistics = Suid-Afrikaanse Tydskrif vir Taalkunde 11 (2): 49-59.
    44. —. 1993. “Klem in Afrikaans en die superswaar rym.” South African Journal of Linguistics = Suid-Afrikaanse Tydskrif vir Taalkunde 18: 78-99.
    45. —. 1994. “Leksikale Fonologie as ‘n beskrywingsraamwerk van fonologiese en morfologiese interaksie.” South African Journal of Linguistics = Suid-Afrikaanse Tydskrif vir Taalkunde 12 (4): 124-134.
    46. —. 1995. “Die organisasie van die Afrikaanse leksikon [The organization of the Afrikaans lexicon].” South African Journal of Linguistics = Suid-Afrikaanse Tydskrif vir Taalkunde 13 (3): 108-119. https://doi.org/https://doi.org/10.1080/10118063.1995.9723986.
    47. —. 1996. “Fonologiese en morfologiese interaksie in Afrikaans.” South African Journal of Linguistics = Suid-Afrikaanse Tydskrif vir Taalkunde 31: 113-137.
    48. Neijt, Anneke, Robert Schreuder, and Carel Jansen. 2013. “Van boekenbonnen en feëverhale: De tussenklank e(n) in Nederlandse en Afrikaanse samenstellingen: vorm of betekenis?” Nederlandse Taalkunde 15 (2): 125-147. https://doi.org/http://dx.doi.org/10.5117/NEDTAA2010.2.VAN_436.
    49. Otterman, S. 1993. “Semantic interpretation of Afrikaans dimunitives.”
    50. Perold, M. 1990. “Die grammatikale funksie en semantiese waarde van die voorvoegsel be- in Afrikaans.”
    51. Ponelis, Fritz A. 1994. “Uitgebreide stamme.” In Nuwe perspektiewe op die geskiedenis van Afrikaans: opgedra aan Edith H. Raidt, edited by Gerrit Olivier and Anna E. Coetzee, 84-89. Halfweghuis: Southern Book.
    52. Pretorius, Erin. 2017. Spelling out P: a unified syntax of Afrikaans adpositions and V-particles.LOT dissertation series. Utrecht: Netherlands Graduate School of Linguistics / Landelijke (LOT).
    53. Prinsloo, Anton F. 1992. “Combrink se inkortingsreëls: ‘n ander perspektief.” South African Journal of Linguistics = Suid-Afrikaanse Tydskrif vir Taalkunde 10 (2): 75-81.
    54. Richter, Cornelia Magdalena. 2020. “‘n Morfologies-sistematiese analise van die komposisiestrukture in “vier pogings in linguistiese sinaps-opsporing” in Mede-wete (2014) deur Antjie Krog.” MA, Departement Afrikaans en Nederlands, Duits en Frans, Universiteit van die Vrystaat.
    55. Robbers, K. B. M. 1997. “Non-finite verbal complements in Afrikaans: a comparative approach.”
    56. Savini, Marina. 1992. “On the (non-) discreteness of morphological categories with special reference to affix categories in Afrikaans.” DLitt et Phil, Unisa.
    57. —. 2012. “Phrasal compounds in Afrikaans: agenerative analysis.” Stellenbosch Papers in Linguistics 12 (0). https://doi.org/10.5774/12-0-102.
    58. Savini-Beck, Marina. 1993. “The old and the new: changing perspectives on the distinction between inflection and derivation in Afrikaans.” Acta Academica 25 (1): 30-48.
    59. —. 1995. “Constraints on affixation in Afrikaans: some preliminary findings.” South African Journal of Linguistics 13 (1): 38-44. https://doi.org/https://doi.org/10.1080/10118063.1995.9723973.
    60. Slomanson, P. 2005. “The verbal morphosyntax of non-canonical contact languages: Malay-derived constraints and the inflectional domain in Afrikaans and Sri Lankan Malay.” PhD, University of New York.
    61. Southwood, Frenette. 2005. “A comparison of the responses to three comprehension and three production tasks assessing the morpho-syntactic abilities of Afrikaans-speaking preschoolers.” Per Linguam : a Journal of Language Learning = Per Linguam : Tydskrif vir Taalaanleer 21 (1): 36-59. https://doi.org/https://hdl.handle.net/10520/EJC86963.
    62. —. 2006. “The comprehension and production of plural forms of nouns by 6-year-old Afrikaans-speaking children with and without specific language impairment.” Per Linguam : a Journal of Language Learning = Per Linguam : Tydskrif vir Taalaanleer 22 (2): 29-39. https://doi.org/http://dx.doi.org/10.5785/22-2-65.
    63. Stanisław, Prędota. 2012. “On the Morphology of Proverbs in Afrikaans and Dutch.” Academic Journal of Modern Philology 1: 99-106. https://doi.org/bwmeta1.element.desklight-7f1d708d-8e4d-4772-9772-3e0e5f92e3a1. http://cejsh.icm.edu.pl/cejsh/element/bwmeta1.element.desklight-7f1d708d-8e4d-4772-9772-3e0e5f92e3a1.
    64. Trollip, Benito. 2016. “‘n Beskrywing van die valensiemorfeem in Afrikaans vanuit ‘n kognitiewe gebruiksgebaseerde beskrywingsraamwerk [A description of the valence morpheme in Afrikaans from a cognitive usage-based descriptive framework].” MA, Department of General Linguistics, North-West University, Local.
    65. Trollip, Eddie Benito. 2020. “Denominal adjectives in Afrikaans: The cases of ·agtig and ·e·rig.” SKASE Journal of Theoretical Linguistics 17 (5): 27-41. http://www.skase.sk/Volumes/JTL47/pdf_doc/02.pdf.
    66. Trollip, Eddie Benito, and Gerhard B. Van Huyssteen. 2018. “The linking morpheme in Afrikaans: a Cognitive Grammar description.” SKASE Journal of Theoretical Linguistics [online] 15 (3): 37-70. http://www.skase.sk/Volumes/JTL38/pdf_doc/03.pdf.
    67. Van der Walt, J. L. 1991. “The acquisition of English morphemes by Afrikaans primary school pupils.” Tydskrif vir Taalonderrig = Journal for Language Teaching 25 (2): 1-13.
    68. Van Huyssteen, Gerhard B. 2000. “Die reduplikasiekonstruksie in Afrikaans: aspekte van `n kognitiewe gebruiksgebaseerde beskrywingsraamwerk vir Afrikaans [The reduplication construction in Afrikaans: aspects of a cognitive usage-based model for Afrikaans].” PhD, Department of General Linguistics, Potchefstroom University for CHE, Local.
    69. —. 2004. “Motivating the composition of Afrikaans reduplications: a cognitive grammar analysis.” In Studies in Linguistic Motivation, edited by Günter Radden and Klaus U. Panther, 269-292. Berlin: Mouton de Gruyter.
    70. —. 2010. “(Re)defining component structures in morphological constructions: a cognitive grammar perspective.” In Cognitive Approaches to Word-Formation, edited by Alexander Onysko and Sascha Michel, 97-126. Berlin: Mouton de Gruyter.
    71. —. 2014. “Morfologie [Morphology].” In Kontemporêre Afrikaanse Taalkunde [Contemporary Afrikaans Linguistics], edited by W. A. M. Carstens and Nerina Bosman. Pretoria: Van Schaik.
    72. —. 2016. “Die ortografiese realisering van komposita met en afleidings van multiwoordeiename [The orthographical realisation of compounds with and derivations of multiword proper names].” LitNet Akademies (Geesteswetenskappe) 13 (3). https://doi.org/http://www.litnet.co.za/die-ortografiese-realisering-van-komposita-met-en-afleidings-van-multiwoordeiename/.
    73. —. 2017. “Morfologie [Morphology].” In Kontemporêre Afrikaanse Taalkunde [Contemporary Afrikaans Linguistics], edited by W. A. M. Carstens and Nerina Bosman, 177-214. Pretoria: Van Schaik Uitgewers.
    74. —. 2018. “The ‘hulle’ and ‘goed’ constructions in Afrikaans.” In The construction of words: Advances in construction morphology, edited by Geert Booij, In Studies in Morphology, 399-437. New York: Springer.
    75. —. 2018. “‘n Korpusondersoek na ‘huidiglik’ [A corpus exploration of ‘huidiglik’].” Literator 39 (2): a1527. https://doi.org/https://doi.org/10.4102/lit.v39i2.1527.
    76. —. 2018. “Norme vir ‘huidiglik’ [Norms for ‘huidiglik’].” Literator 39 (2): a1526. https://doi.org/https://doi.org/10.4102/lit.v39i2.1526.
    77. Van Huyssteen, Gerhard B., and Daan P. Wissing. 2007. “Datagebaseerde aspekte van Afrikaanse reduplikasies [Data-based aspects of Afrikaans reduplications].” Southern African Linguistics and Applied Language Studies 25 (3): 419-439.
    78. Van Marle, Jaap. 1994. “De derivationele morfologie: een vergeten hoofdstuk uit de geschiedenis van het Afrikaans.” In Nuwe perspektiewe op die geskiedenis van Afrikaans: opgedra aan Edith H. Raidt, edited by Gerrit Olivier and Anna E. Coetzee, 90-101. Halfweghuis: Southern Book.
    79. Van Niekerk, Angelique. 2005. “Handelsname: ’n vorm van leksikale vernuwing teen die agtergrond van globalisering.” Southern African Linguistics and Applied Language Studies 23 (1): 39-58. https://doi.org/10.2989/16073610509486373.
    80. Van Niekerk, A. E. 1989. “Die leksikografiese hantering van komposita.”
    81. Van Niekerk, A. E., and P. Harteveld. 1991. “Die leksikografiese hantering van neo-klassieke en pseudo-sintaktiese komposita.” Lexikos 1: 281-297.
    82. Van Niekerk, Lariza. 2002. “‘n Korpusanalise van Afrikaanse eksosentriese komposita.”
    83. —. 2006. “Funksionele aspekte van Afrikaanse eksosentriese komposita.”
    84. Van Rensburg, F. I. J. 1994. “Die gebruik van die verboë en die onverboë adjektief in Van Wyk Louw se poësie.” Tydskrif vir Geesteswetenskappe 34 (2): 77-89.
    85. Waher, Hester. 1994. “Vernederlandsing en die Afrikaanse afleidingsaffikse: getuienis uit Kaapse Moeslim-Afrikaans.” edited by Chris Van der Merwe, Hester Wahler and Joan Hambidge, 112-122. Cape Town: Rondebosch: Universiteit van Kaapstad.
    86. Wissing, Daan P. 1989. “Die klempatrone van Afrikaanse en Nederlandse simplekse: ‘n vergelyking.” Literator 10 (2): 50-65. https://collections.nwu.ac.za/dbtw-wpd/textbases/bibliografie-afrikaans/documents-dbat/literator_aug1989_50-65.pdf.
    87. —. 1996. “Meervoudsvorme in Afrikaans: 25 jaar later.” South African Journal of Linguistics = Suid-Afrikaanse Tydskrif vir Taalkunde 14, no. 31: 95-112. https://doi.org/https://doi.org/10.1080/10118063.1996.9724378.
    88. —. 2019. “Herbesoek aan Afrikaanse klemtoon: Is dit (nog) ’n inisiëleklemtoontaal? [Revisiting stress in Afrikaans: is it (still) a stress-initial language?].” LitNet Akademies (Geesteswetenskappe) 16 (2). https://www.litnet.co.za/herbesoek-aan-afrikaanse-klemtoon-is-dit-nog-n-inisieleklemtoontaal/.
    89. Zwart, Jan-Wouter. 2017. “A note on the periphrastic past in Afrikaans.” Stellenbosch Papers in Linguistics 48: 1-8. https://doi.org/https://doi.org/10.5774/48-0-276.

    Computational morphology

    1. Daelemans, Walter, Hendrik J. Groenewald, and Gerhard B. Van Huyssteen. 2009. “Prototype-based active learning for lemmatization.” Proceedings of Recent Advances in Natural Language Processing (RANLP).
    2. De Stadler, L. G., and M. W. Coetzer. 1992. “A morphological parser for Afrikaans.” edited by R. P. Botha, M. Sinclair and W. Winckler, 439-448. Stellenbosch: Departement Algemene Taalwetenskap.
    3. Fick, M., and C. J. Swanepoel. 2010. “Afrikaanse lettergreepverdelingspatrone / Afrikaans syllabification patterns.” SuidAfrikaanse Tydskrif vir Natuurwetenskap en Tegnologie 29 (2): 48-65.
    4. Groenewald, Hendrik J., and Gerhard B. Van Huyssteen. 2008. “Outomatiese lemma-identifisering vir Afrikaans [Automatic lemmatisation for Afrikaans].” Literator 29 (1): 65-91.
    5. Pilon, Suléne. 2005. “Outomatiese Afrikaanse woordsoortetikettering (Automatic Afrikaans part-of-speech] tagging.” MA (cum laude), Department of General Linguistics, North-West University, Local.
    6. Pilon, Suléne, Martin J. Puttkammer, and Gerhard B. Van Huyssteen. 2008. “Die ontwikkeling van ‘n woordafbreker en kompositumanaliseerder vir Afrikaans [The development of a hyphenator and compound analyser for Afrikaans].” Literator 29 (1): 21-41.
    7. Puttkammer, Martin J. 2006. “Outomatiese Afrikaanse tekseenheididentifisering [Automatic Afrikaans tokenisation, sentencisation and named-entity recognition].” MA (cum laude), Department of General Linguistics, North-West University, Local.
    8. Puttkammer, Martin J., and Gerhard B. Van Huyssteen. 2006. “Automatic text segmentation of Afrikaans using memory-based learning.” Proceedings of the 2006 Conference of the Pattern Recognition Association of South Africa, Pretoria.
    9. Savini, Marina. 2012. “Phrasal compounds in Afrikaans: agenerative analysis.” Stellenbosch Papers in Linguistics 12 (0). https://doi.org/10.5774/12-0-102.
    10. Van Huyssteen, Gerhard B., and Marelie H. Davel. 2010. “Learning rules and categorization networks for language standardization.” Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL) Workshop on Extracting and Using Constructions in Computational Linguistics.
    11. Van Huyssteen, Gerhard B., and Suléne Pilon. 2009. “Rule-based conversion of closely-related languages: A Dutch-to-Afrikaans convertor.” Proceedings of the 2009 Conference of the Pattern Recognition Association of South Africa.
    12. Van Huyssteen, Gerhard B., and Menno M. Van Zaanen. 2004. “Learning compound boundaries for Afrikaans spelling checking.” Proceedings of 1st Workshop on  International Proofing Tools and Language Technologies, Patras.
    13. Van Huyssteen, Gerhard B., and Ben Verhoeven. 2014. “A taxonomy for Afrikaans and Dutch compounds.” Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014): The First Workshop on Computational Approaches to Compound Analysis (ComAComA), Dublin, Ireland.
    14. Verhoeven, Ben, Walter Daelemans, and Gerhard B. Van Huyssteen. 2012. “Classification of noun-noun compound semantics in Dutch and Afrikaans.” Proceedings of the 23rd Annual Symposium of the Pattern Recognition Association of South Africa, Pretoria, South Africa.
    15. Verhoeven, Ben, and Gerhard B. Van Huyssteen. 2013. “More than only noun-noun compounds: towards an annotation scheme for the semantic modelling of other noun compound types.” Proceedings of 9th Joint ACL – ISO Workshop on Interoperable Semantic Annotation.
    16. Verhoeven, Ben, Menno M Van Zaanen, Walter Daelemans, and Gerhard B. Van Huyssteen. 2014. “Automatic compound processing: compound splitting and semantic analysis for Afrikaans and Dutch.” Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014): The First Workshop on Computational Approaches to Compound Analysis (ComAComA), Dublin, Ireland.

  • Van Huyssteen 2019a

    Van Huyssteen, Gerhard B. 2019. “Wat gaan word van geskrewe Standaardafrikaans? [What is going to happen to Standard Afrikaans?].” In SA Akademie vir Wetenskap en Kuns: Verlede, hede toekoms (1909-2019), edited by Jacques Van der Elst, 86-89. ISBN: 978-0-949976-97-0. Pretoria: SAAWK.

  • Genre Classification

    Project: Genre Classification for South African Languages

    DURATION

    2011-2012

    FUNDED BY:

    • Dutch Language Union (Belgium, The Netherlands)
    • Department of Arts and Culture (South Africa)

    PROJECT URLS

    • http://gcsal.sf.net/
    • https://gerhard.pro/genre-classification/

    PROJECT LEADERS

    • GERHARD B VAN HUYSSTEEN – PROJECT LEADER AND LINGUISTICS
      • CTexT (Centre for Text Technology), North-West University, South Africa
    • WALTER DAELEMANS – COMPUTATIONAL LINGUISTICS
      • CLiPs (Computational Linguistics and Psycholinguistics), University of Antwerp, Belgium

    PROJECT COLLABORATORS

    • DIRK SNYMAN
      • CTexT (Centre for Text Technology), North-West University, Potchefstroom, South Africa

    OVERVIEW

    During 2011/2, the Department of Arts and Culture of the South African Government funded a small-scale project on genre classification for document management.

    During the project, the following tasks were undertaken:

    • We investigated appropriate ontologies and optimal supervised and unsupervised machine learning methods for the development of genre classifiers, specifically for resource-scarce languages (information captured in a master’s dissertation, and in a scholarly publication);
    • We developed genre classifiers (and its associated resources) for the ten official indigenous languages of South Africa (available here);
    • We implemented these classifiers as a web-based demo, where users can either upload a document or provide a URL for classification (depending on the chosen genre classification ontology); and
    • We organised a training event on “New Applications of Automatic Text Categorization”, presented by Prof Walter Daelemans on 25 January 2012 at the CSIR, Pretoria).

    The project was executed and managed by Trifonius, in collaboration with partners, including:

    • Prof Walter Daelemans (University of Antwerp; Belgium)
    • Centre for Text Technology (CTexT) (North-West University; Potchefstroom)
    • Human Language Technology Competence Area (Council for Scientific and Industrial Research; Pretoria)

    AIMS

    The primary aim of this project was to develop resources (including annotation protocols, and training and testing data) for the development of:

    • automatic genre classifiers for ten South African languages.

    Other secondary aims included:

    • to report on the research and development process in the form of:
      • one Master’s degree dissertation;
      • at least two scholarly papers, to be published in relevant journals or peer-reviewed conference proceedings;
      • various annotation protocols, made available publicly;
    • to contribute towards human capital development and growth of the pool of experts in descriptive linguistics and computational linguistics in South Africa, Belgium and The Netherlands by offering bursaries, grants or contract work to undergraduate and post-graduate students;
    • to extend the collaboration network between Trifonius, North-West University (NWU) and University of Antwerp (UA), by introducing young scholars and students to each other;
    • to identify new research issues as they unfold in the research and development process; and
    • to contribute to the HLT-enabling of the languages of South Africa.

    OUTPUTS

    PEER-REVIEWED PUBLICATIONS

    1. Snyman, DP, Van Huyssteen, GB & Daelemans, W. 2014. Outomatiese Genreklassifikasie vir Afrikaans [Automatic genre classification for Afrikaans]. DOI: 10.4102/satnt.v33i1.759. Suid-Afrikaanse Tydskrif vir Natuurwetenskap en Tegnologie. 33(1): 12 pp.
    2. Snyman, D, Van Huyssteen, GB & Daelemans, W. 2012. Cross-Lingual Genre Classification for Closely Related Languages. In: Proceedings of the Twenty-Third Annual Symposium of the Pattern Recognition Association of South Africa. ISBN: 978-0-620-54601-0. 29-30 November. Pretoria, South Africa. pp. 133-137.
    3. Snyman, DP, Van Huyssteen, GB & Daelemans, W. 2011. Automatic genre classification for resource scarce languages. In: Proceedings of the 2011 Conference of the Pattern Recognition Association of South Africa. ISBN: 978-0-620-51914-4. 22-25 November. Vanderbijlpark, South Africa. pp. 132-137.

    RESOURCES

    • Genre Classification Corpora for South African Languages 1.0. (Project leader, with Walter Daelemans as co-project leader, and Dirk Snyman as main collaborator and scientific programmer). Potchefstroom: NWU.
      • Corpora that can be used to train genre classifiers for South African languages.
        • Afrikaans Genre Classification Corpus  (ISLRN: 666-908-651-526-7)
        • isiNdebele Genre Classification Corpus  (ISLRN: 248-916-003-745-6)
        • isiXhosa Genre Classification Corpus  (ISLRN: 418-998-894-930-1)
        • isiZulu Genre Classification Corpus  (ISLRN: 457-135-629-106-1)
        • Sesotho Genre Classification Corpus  (ISLRN: 469-495-440-934-0)
        • Sesotho sa Leboa Genre Classification Corpus  (ISLRN: 676-872-880-082-8)
        • Setswana Genre Classification Corpus  (ISLRN: 921-735-738-409-8)
        • Siswati Genre Classification Corpus  (ISLRN: 718-674-341-027-9)
        • Tshivenda Genre Classification Corpus  (ISLRN: 098-827-706-093-4)
        • Xitsonga Genre Classification Corpus  (ISLRN: 210-849-527-713-3)
      • Cite as: Snyman, D, Van Huyssteen, GB & Daelemans, W. 2012. Genre classification corpora for South African languages 1.0. Potchefstroom: North-West University. Available at gcsal.sf.net.

    TUTORIAL

    Tutorial:          New Applications of Automatic Text Categorization

    Presenter:    Prof Walter Daelemans (University of Antwerp, Belgium)

    Date:               25 January 2012

    Time:              09:00-16:00

    Place:              Knowledge Commons, CSIR, Pretoria

    Cost:                Free

    Automatic text categorization is a mature language technology that is able to sort documents into different categories on the basis of examples. Its applications range from e-mail routing and spam filtering to topic detection and text genre assignment. A text categorization system incorporates an approach to document representation (mostly a set of relevant terms or n-grams of words found in the document), and a machine learning method. In the first part of the tutorial, this basic architecture has been described at an introductory level, and an overview of state of the art document representation and machine learning methods have been presented.

    In the second part of the tutorial, we focused on more technical detail about a new application area of this technology: automatic profiling of text. In this application, we are interested in which metadata we can infer from a document. More specifically we are interested in how far we can get with text categorisation techniques in tasks like the following:

    (i) Text profiling: predicting age, gender, personality, and region of the author of the text.

    (ii) Intrinsic plagiarism detection: finding passages in text not written by the author.

    (iii) Deception detection: finding out whether reviews and reports are truthful, detecting pedophile grooming in social networks etc.

    In order to achieve this, we need document representations that are different from other applications, instead of (patterns of) content words we need other linguistic categories, and special purpose machine learning algorithms for some of the tasks, such as Koppel et al.’s unmasking algorithm. 

    This workshop was hosted and organised by Trifonius, and was made possible through funding by the National Centre for Human Language Technologies of the Department of Arts and Culture, and a financial contribution by the Human Language Technology Competence Area of the CSIR. The workshop was attended by eleven scholars and students.

    DISSERTATIONS (UNPUBLISHED)

    Snyman, DP. 2012. Outomatiese genreklassifikasie vir hulpbronskaars tale [Automatic genre classification for resource-scarce languages]. MA Thesis. Potchefstroom: North-West University.

    DEMO

    Final version of a web-based demonstrator.

  • Resources for Closely-related Languages (RCRL)

    Project: Resources for Closely-related Languages

    Long name

    Human Language Technology Resources for Closely-Related Languages

    Abbreviation

    RCRL

    DURATION

    2008-2010

    FUNDED BY:

    National Research Foundation

    PROJECT URLS

    PROJECT LEADERS

    • GERHARD B VAN HUYSSTEEN
      • CTexT (Centre for Text Technology), North-West University, South Africa
    • FEBE DE WET
      • Stellenbosch University, South Africa

    PROJECT COLLABORATORS

    • Suléne Pilon (North-West University, South Africa)
    • Martin Puttkammer (North-West University, South Africa)
    • Martin Schlemmer (North-West University, South Africa)
    • Handré Groenewald (North-West University, South Africa)
    • Linsen Loots (Stellenbosch University, South Africa)
    • Thomas Niesler (Stellenbosch University, South Africa)
    • Marelie Davel (Council for Scientific and Industrial Research, South Africa)
    • Georg Schlünz (Council for Scientific and Industrial Research, South Africa)
    • Etienne Barnard (Council for Scientific and Industrial Research, South Africa)
    • Wilbert Heeringa (Meertens Institute, The Netherlands)
      Liesbeth Augustinus (University of Leuven, Belgium)
    • Walter Daelemans (University of Antwerp, Belgium)

    OVERVIEW

    Two kinds of resources are considered fundamental for the development of Human Language Technology (HLT) applications (such as dictation software, automatic machine translation systems, or intelligent search engines), viz.:

    Core Technologies (also sometimes called “lingware”; i.e. reusable, efficient natural language processing modules, integrated in end-user applications); and
    Data (i.e. language data (corpora and lexica), formal descriptions of language structures (grammars), and language models) (Daelemans & Strik, 2002).

    As these resources are quintessential in the development of most HLT applications, it is vital to develop sophisticated, reusable resources for all the South African languages, before venturing into the development of advanced HLT applications for these languages. Although both deep and shallow methods are used for most of the processes involved in developing these resources, these methods are, on the one hand, far from perfect (Gaustad & Bouma, 2001), and, on the other hand, have mostly been developed for the commercially-more important languages of the world (e.g. English, Dutch, Spanish, German, Japanese, etc.) (Cole, 1995: 111). To develop such resources for the indigenous South African languages (all of which could be considered so-called resource-scarce languages), these methods should therefore either be adapted for these languages, or alternatively, new methods must be sought to deal with the idiosyncrasies of these languages.

    One method to fast-track the development of resources for resource-scarce languages is to re-cycle (port/transfer/re-engineer) existing technologies from one language L1 to another, closely-related language L2. The basic hypothesis, which is also the hypothesis of this project, is that “[if] the languages L1 and L2 are similar enough, then it should be easier [and quicker] to recycle software applicable to L1 than to rewrite it from scratch for L2”, thereby taking care of “most of the drudgery before any human has to become involved” (Rayner et al., 1997: 65). Scannell (2006) argues that resource-scarce languages could benefit from such an approach, especially where L1 is a global, well-resourced language.

    To illustrate this hypothesis in real terms, let us assume that Dutch (L1) and Afrikaans (L2) are similar enough for purposes of technology transfer. The hypothesis then is that it would be easier and quicker to use and adapt, for example, an existing Dutch syntactic parser to parse Afrikaans sentences, than to develop an Afrikaans syntactic parser from scratch. One could therefore use the Dutch parser to parse an Afrikaans corpus, and afterwards only correct systematic errors manually or semi-automatically.

    Until 2007, only a few projects that have exploited this approach were conducted for, amongst others, Irish – Scottish Gaelic (Scannell, 2006), Spanish – Catalan, Spanish – Galician (Corbí-Bellot et al., 2005), English – French, Swedish – Danish (Rayner et al., 1997), etc. Almost all of these projects were conducted within the context of either automatic text-based machine translation, or speech-to-speech translation.

    Although the idea to re-cycle technologies between closely-related languages is not a novel one, numerous questions and opportunities for research remain. For instance, what “similar enough” in the above hypothesis entails, is completely unclear from the literature. Also, almost all the above-mentioned projects were conducted using rule-based methods (e.g. finite-state grammars); the question remains whether technologies that have been developed using machine learning would also be appropriate for this approach. Moreover, with the exception of some linguistic and lexicographic studies (e.g. Jansen & Olivier, 1986; Prinsloo, 2006), very little similar research has been done involving South African languages.

    Hence, the central problem addressed in this project concerns an investigation of the possibilities of technology re-cycling between closely-related languages. The focus in this project is on Afrikaans, with Dutch as the closely-related, well-sourced language.

    In the period 2008-2010, we directed our attention on the experimental development of various re-usable resources (core technologies and data) for Afrikaans, including:

    • Annotated wideband speech data for large-vocabulary continuous speech recognition;
    • Pronunciation dictionary;
    • Afrikaans-Dutch/Dutch-Afrikaans convertor, including a bilingual translation dictionary;
    • High-accuracy chunker (i.e. shallow parser);
    • Improved part-of-speech tagger and lemmatiser for Afrikaans.

    REFERENCES

    • Cole, R.A. (editor in chief). 1995. Survey of the State of the Art in Human Language Technology. Available at: http://cslu.cse.ogi.edu/HLTsurvey/HLTsurvey.html. Accessed on: May 15, 2002.
    • Corbí-Bellot, A.M. et al. 2005. An open-source shallow-transfer machine translation engine for the Romance languages of Spain. In: Proceedings of the 10th Annual EAMT Conference. Budapest, Hungary, 30-31 May 2005.
    • Daelemans, W. & Strik, H. 2002. Actieplan voor het Nederlands in de taal- en spraaktechnologie: Prioriteiten voor basisvoorzieningen. [Action Plan for Dutch Language and Speech Technology: Priorities for Basic Resources]. Report for the Nederlandse Taalunie. Available at: http://cnts.uia.ac.be/~walter/TST/. Accessed on: April 30, 2004.
    • Gaustad, T. & Bouma, G. 2001. Accurate Stemming of Dutch for Text Classification. Computational Linguistics in the Netherlands 2001. Amsterdam: Rodopi. pp. 104-117.
    • Jansen, E. & Olivier G. 1986. Praktiese Nederlands. Pretoria: Academica.
    • Prinsloo, D.J. 2006. Compiling a Bidirectional Dictionary Bridging English and the Sotho Languages: A Viability Study. Lexikos. 16: 193-204.
    • Rayner, M. et al. 1997. Recycling Lingware in a Multilingual MT System. In: Burstein, J. & Leacock, C. From Research to Commercial Applications: Making NLP Work in Practice. Somerset, New Jersey: Association for Computational Linguistics. pp. 65-70.
    • Scannell, K. 2006. Machine translation for closely related language pairs. In: Proceedings of the LREC2006 Workshop on Strategies for developing machine translation for minority languages. European Language Resources Association: Paris.
    • Vandeghinste, V., Schuurman, I., Carl, M., Markantonatou, S. and Badia, T. 2006. METIS-II: Machine Translation for Low Resource Languages. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006). Genoa, Italy. May 24-26. European Language Resources Association: Paris.

    OUTPUTS

    PUBLICATIONS

    1. Daelemans, W, Groenewald, HJ & Van Huyssteen, GB. 2009. Prototype-based Active Learning for Lemmatization. In: Angelova, G, Bontcheva, K, Mitkov, R, Nikolov, N & Nikolov, N. (eds.). Proceedings of Recent Advances in Natural Language Processing 2009. ISSN: 1313-8502. 14-16 September 2009. Borovets, Bulgaria. pp 65-70.
    2. Davel, MH & De Wet, F. 2010. Verifying pronunciation dictionaries using conflict analysis. In: Proceedings of Interspeech. ISBN: 1990-9772. 26-30 September. Makuhari, Chiba, Japan. pp 1898-1901.
    3. De Wet, F, De Waal, A & Van Huyssteen, GB. 2011. Developing a broadband automatic speech recognition system for Afrikaans. In: Proceedings of the 12h Annual Conference of the International Speech Communication Association (Interspeech 2011). ISSN: 1990-9772. 27-31 August. Florence, Italy. pp. 3185-3188.
    4. Heeringa, W & De Wet, F. 2008. The origin of the Afrikaans pronunciation: a comparison to West Germanic languages and Dutch dialects. In: Proceedings of the 19th Annual Symposium of the Pattern Recognition Association of South Africa. ISBN 978-0-7992-2350-7. 27-28 November. Cape Town, South Africa. pp 159-164.
    5. Heeringa, W, De Wet, F & Van Huyssteen, GB. submitted. Afrikaans and Dutch as Closely-related Languages: A Comparison to West Germanic Languages and Dutch Dialects.
    6. Loots, L, De Wet, F & Niesler, T. 2010. Extending an Afrikaans pronunciation dictionary using Dutch resources and P2P/GP2P. In: Proceedings of the 21st Annual Symposium of the Pattern Recognition Association of South Africa. ISBN: 978-0-7992-2470-2. 22-23 November. Stellenbosch, South Africa. [no page numbers].
    7. Pilon, S, Van Huyssteen, GB & Augustinus, L. 2010. Converting Afrikaans to Dutch for technology recycling. In: Proceedings of the 21st Annual Symposium of the Pattern Recognition Association of South Africa. ISBN: 978-0-7992-2470-2. 22-23 November. Stellenbosch, South Africa. pp 219-224.
    8. Schlünz, GI, Barnard, E & Van Huyssteen, GB. 2010. Part-of-Speech Effects on Text-to-Speech Synthesis. In: Proceedings of the 21st Annual Symposium of the Pattern Recognition Association of South Africa. ISBN: 978-0-7992-2470-2. 22-23 November. Stellenbosch, South Africa. pp 257-262.
    9. Van Huyssteen, GB & Davel, M. 2010. Learning Rules and Categorization Networks for Language Standardization. In:Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL) Workshop on Extracting and Using Constructions in Computational Linguistics. 6 June. Los Angeles, USA. pp 39-46.
    10. Van Huyssteen, GB & Pilon, S. 2009. Rule-based Conversion of Closely-related Languages: A Dutch-to-Afrikaans Convertor. In: Nicolls, F. (ed.). Proceedings of the 20th Annual Symposium of the Pattern Recognition Association of South Africa. ISBN: 978-0-7992-2356-9. 30 November – 01 December. Stellenbosch, South Africa. pp 23-28.

    RESOURCES

    1. Van Niekerk, D, Van Huyssteen, GB & Puttkammer, MJ. 2015. Closely-related Languages Convertor V2.0.0. Potchefstroom: Centre for Text Technology (Ctext), North-West University.
      • Language independent convertor for converting text from one language to another, closely related language
      • Written in Python
      • Includes Afrikaans-to-Dutch and Dutch-to-Afrikaans wordlists (including false friends) and rules for orthographic conversion.
      • Includes METIS II test data, also with Afrikaans translations.
      • Includes code for web demo, available here
    2. 2009. Dutch-2-Afrikaans Convertor/Afrikaans-2-Dutch Convertor. (GB van Huyssteen, S Pilon, L Augustinus, MJ Puttkammer, and M Schlemmer). Potchefstroom: NWU.
      • A freely available, open-source rule-based system for converting Dutch text to Afrikaans, and vice versa.
      • Could be used as a pre-processing step in machine translation from Dutch to Afrikaans.
    3. 2010. Convertor v1.2.0. (GB Van Huyssteen, S Pilon, MJ Puttkammer, and M Schlemmer). Potchefstroom: NWU.
      • Language independent convertor for converting text from one language to another closely related langues.
      • Written in Perl.
    4. 2010. Afrikaans and Dutch Lists and Rules v1.0.2. (GB van Huyssteen and S Pilon). Potchefstroom: NWU.
      • Afrikaans-to-Dutch and Dutch-to-Afrikaans wordlists (including false friends) and rules for orthographic conversion.
      • Includes METIS II test data, also with Afrikaans translations.
    5. 2010. Resources for Closely-related Languages: Afrikaans Pronunciation Dictionary (RCRL APD) v1.4.1. (F de Wet, M Davel, L Loots, T Niesler). Potchefstroom: NWU.
      • Contains more than 24,000 Afrikaans words.
      • Developed in collaboration with Stellenbosch University and CSIR Meraka Institute.
    6. 2010. Afrikaans Radio News Corpus v1.0.0. (F de Wet). Potchefstroom: NWU.
      • 330 bulletins; circa 27 hours of audio data.
      • SABC radio news bulletins from 2001-2004, as well as from 2010 onwards.
      • Manually transcribed.
      • Available for research purposes.

    TALKS AND POSTERS

    1. Heeringa, W & De Wet, F. 2009. The origin of Afrikaans pronunciation: a comparison to west Germanic languages and Dutch dialects. Presentation given at the 30th TABU Dag. University of Groningen, GRONINGEN, The Netherlands. 11-12 June.
    2. Heeringa, W, De Wet, F & Van Huyssteen, GB. 2011. Afrikaans and Dutch as Closely‑related Languages: a Comparison to West Germanic Languages and Dutch Dialects. Methods in Dialectology 14. University of Western Ontario, LONDON, Ontario, Canada. 2-6 August.
    3. Van Huyssteen, GB & Pilon, S. 2010. A Dutch-to-Afrikaans Convertor. 20th Meeting of Computational Linguistics in the Netherlands (CLIN) 2010. Utrecht University, UTRECHT, The Netherlands. 5 February.
    4. Pilon, S & Van Huyssteen, GB. 2011. Technology recycling for closely related languages: Dutch and Afrikaans. 21th Meeting of Computational Linguistics in the Netherlands (CLIN) 2011. University College Ghent, GHENT, Belgium. 11 February.
    5. Van Huyssteen, GB & Pilon, S. 2010. Some thoughts on a Dutch-to-Afrikaans convertor. Guest lecture, University of Antwerp, ANTWERP, Belgium. 02/02/2010.

    DISSERTATIONS (UNPUBLISHED)

    1. Schlünz, GI. 2010. The effects of part-of-speech tagging on text-to-speech synthesis for resource-scarce languages. Unpublished MSc Eng dissertation. Potchefstroom: North West University.

    RELATED PROJECTS AND EVENTS

    • Project: Mutual Intelligibility of Closely Related Languages
    • Workshop on comparing approaches to measuring linguistic differences
  • Automatic Compound Processing (AuCoPro)

    Project: Automatic Compound Processing  

    Abbreviation

    AuCoPro

    DURATION

    2012-2014

    FUNDED BY:

    • Dutch Language Union (Belgium, The Netherlands)
    • Department of Arts and Culture (South Africa)
    • National Research Foundation (South Africa) (Grant number: 81794)
    • European Network on Word Structure (NetWordS) (European Science Foundation) (Grant number: 5570)

    PROJECT URLS

    PROJECT LEADERS

    • GERHARD B VAN HUYSSTEEN – PROJECT COORDINATOR & LINGUISTICS 
      CTexT (Centre for Text Technology), North-West University, South Africa
    • WALTER DAELEMANS – COMPOUND SEMANTICS
      CLiPs (Computational Linguistics and Psycholinguistics), University of Antwerp, Belgium
    • MENNO VAN ZAANEN – COMPOUND SPLITTING
      TiCC (Tilburg Centre for Cognition and Communication), University of Tilburg, The Netherlands

    PROJECT COLLABORATORS

    • BEN VERHOEVEN 
      CLiPS (Computational Linguistics and Psycolinguistics), University of Antwerp, Belgium
    • NORTH-WEST UNIVERSITY (SOUTH AFRICA)
      Roald Eiselen, Benito Trollip, Joani Liversage, Zandre Botha, Martin Puttkammer, Martin Schlemmer, Carli de Wet, Nadia Schultz, Nanette Van Den Berg, Sansi Eiselen 
    • TILBURG UNIVERSITY (THE NETHERLANDS)
      Rick Smetsers, Nanne van Noord, Vincent Lichtenberg, Bas Goris, Sylvie Bruys, Suzanne Aussems
    • UNIVERSITY OF ANTWERP (BELGIUM)
      Natasja Loyens, Maxim Baetens

    OVERVIEW

    In many human language technology applications (e.g. machine translators, spelling checkers), it often happens that concatenatively written compounds (e.g. “skrywerspen”/”schrijverspen” ‘writer’s pen’) are processed incorrectly (e.g. not found in a lexicon). From a technological perspective, deficiencies related to automatic compound segmentation are particularly problematic, since concatenative compounding is a highly productive process in many languages, including Dutch and Afrikaans. Although a compound splitter has already been developed for Afrikaans (Van Huyssteen and Van Zaanen, 2004), the reported accuracy of circa 90% could be improved, and the annotation protocol and data need to be revised.

    More importantly, no stand-alone compound splitter for Dutch is available; research that has been done in this field is more than ten years old (e.g. Pohlmann and Kraaij, 1996), uses expensive resources (e.g. Ordelman et al., 2003), does complete morphological analysis (e.g. De Pauw et al., 2004), and/or has not been released for re-use in the open-source domain. In subproject 1, we will therefore attempt to develop robust compound splitters for both Afrikaans and Dutch through a combination of technology recycling (Pilon et al., 2010) and data pooling (i.e. joining (converted) training material for the two languages in one training set), as well as experimentation with sequence classification (Van Zaanen & Gaustad, 2010; Van Zaanen et al., 2011).

    In addition to segmentation, another subpart of this proposed project will also focus on the semantic analysis of compounds – i.e. to determine that “boekrak” construes ‘case for books’, while “houtrak” means ‘case made of wood’. For more advanced HLT applications like information extraction, question answering and machine translation systems, proper semantic analysis of compounds is required. Internationally, research on automatic compound analysis has focused almost exclusively on English; no work in this regard has been done for either Afrikaans or Dutch, and this proposed project will therefore do pioneering work in this regard.

    Although linguistic research on the topic has been done for both these languages, a uniform, cross-lingual framework does not exist yet, neither does an understanding of how compounding in these two languages differs systematically (see examples above). An attempt will therefore be made to consolidate existing research on both these languages (and other languages), and to postulate a cross-lingual annotation scheme compatible with the work of Ó Séaghdha (2008).

    Since no semantic analyser exists for either languages, in subproject 2 we will then develop first-generation analysers for Afrikaans and Dutch simultaneously, using bootstrapping and data pooling (i.e. first develop a small training set of Afrikaans data, then train an Afrikaans analyser, then analyse Dutch data with the Afrikaans analyser, and subsequently join the data to train a next Afrikaans and/or Dutch analyser; this process continues in small increments until desired performance has been reached). We will start with techniques that work well for English (based on distributional semantics and machine learning); see Hendrickx et al. (2010) for an overview of the current state of the art. We will try to improve these techniques and adapt them to the specific requirements of Afrikaans and Dutch.

    REFERENCES

    • Daelemans, W., Buchholz, S. and Jorn Veenstra. 1999. Memory-Based Shallow Parsing. Proceedings of CoNLL-99, Bergen, Norway. June 12, 1999.
    • Davel. M. and Barnard, E. 2004. A default-and-refinement approach to pronunciation prediction”. In: Proceedings of PRASA. South Africa, November 2004, pp. 119–123.
    • De Knop, S. and Dirven, R. 2008. Motion and location events in German, French and English: A typological, contrastive and pedagogical approach. In:
    • De Knop, S. and De Rycker, T. (eds.) Cognitive Approaches to Pedagogical Grammar: A Volume in Honour of René Dirven. Berlin: Mouton de Gruyter.
    • De Pauw, G., Laureys, T., Daelemans, W. and Van Hamme, H. 2004. A Comparison of Two Different Approaches to Morphological Analysis of Dutch. In: Proceedings of the Workshop of the ACL Special Interest Group on Computational Phonology (SIGPHON). Barcelona, Spain. pp. 62-69.
    • Gast, V. forthcoming. Contrastive analysis: Theories and methods. In: Kortmann, B. and Kabatek, J. (eds.). Dictionaries of Linguistics and Communication Science: Linguistic theory and methodology. Berlin: Mouton de Gruyter.
    • González, M. D. L. Á. G., Mackenzie, J. L. and Álvarez, E. M. G. 2008. Current Trends in Contrastive Linguistics: Functional and cognitive perspectives, Amsterdam, John Benjamins.
    • Hendrickx, I, Kim, SM, Kozareva, Z, Nakov, P, Ó Séaghdha, D, Padó, S, Pennacchiotti, M, Romano, L & Szpakowicz, S. 2010. SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals. In: Proceedings of the SemEval-2 Workshop. Uppsala, Sweden.
    • Hüning, M. 2009. Semantic niches and analogy in word formation: Evidence from contrastive linguistics. Languages in Contrast. 9(2): 183-201.
    • Hüning, M. 2010. Diachronie in de synchronie. Over contrastieve taalkunde en taal(veranderings)theorie. In: Fenoulhet, J. and Renkema, J. (eds.) Internationale neerlandistiek: een vak in beweging. Gent: Academia Press.
    • Mitchell, T.M. 1997. Machine learning. Boston: MacGraw-Hill.
    • Ó Séaghdha, D. 2008. Learning compound noun semantics. Technical report 735. Cambridge: University of Cambridge.
    • OECD. 2002. Proposed standard practice for surveys on research and experimental development (Frascati Manual). Eurostat.
    • Ordelman, R., Van Hessen, A. and De Jong, F. 2003. Compound decomposition in Dutch large vocabulary speech recognition. In: Proceedings of Eurospeech 2003. Geneva, Switzerland. 225–228.
    • Pilon, S, Van Huyssteen, GB and Augustinus, L. 2010. Converting Afrikaans to Dutch for technology recycling. In: Proceedings of the 21st Annual Symposium of the Pattern Recognition Association of South Africa. ISBN: 978-0-7992-2470-2. 22-23 November. Stellenbosch, South Africa. pp 219-224.
    • Pohlmann, R and Kraaij, W. 1996. Improving the precision of a text retrieval system with compound analysis. In: Proceedings of the 7th Computational Linguistics in the Netherlands (CLIN 1996). pp. 115-129.
    • Quinlan, J.R. 1987. Generating production rules from decision trees. In: McDermott, J. Proceedings of the Tenth International Joint Conference on Artificial Intelligence (IJCAI-87): 304–307.
    • Van Huyssteen, GB and Van Zaanen, MM. 2004. Learning Compound Boundaries for Afrikaans Spelling Checking. In: Proceedings of First Workshop on International Proofing Tools and Language Technologies. Patras, Greece. pp. 101-108.
    • Van Huyssteen, GB. 2005. ’n Kognitiewe gebruiksgebaseerde beskrywingsmodel vir die Afrikaanse grammatika. [A Cognitive Usage-Based Description Model for Afrikaans Grammar]. Southern African Linguistics and Applied Language Studies. 23(2): pp. 125-137.
    • Van Zaanen, M & Gaustad T. 2010. Grammatical Inference as Class Discrimination. In: Sempere, J & García, P. (eds.). Grammatical Inference: Theoretical Results and Applications. 6339, 245–257.
    • Van Zaanen, M, Gaustad T & Feijen J. 2011. Influence of Size on Pattern-based Sequence Classification. In: Van der Putten, P, Veenman, C, Vanschoren, J, Israel, M & Blockeel, H. (eds.). Proceedings of the 20th Belgian-Dutch Conference on Machine Learning. The Hague, The Netherlands. pp 53–60.
    • Veenstra, J., Van den Bosch, A., Buchholz, S., Daelemans, W. and Zavrel, J. 2000. Memory-Based Word Sense Disambiguation. Computers and the Humanities. 34(1-2): 171-177.

    AIMS

    The primary aim of this project was to develop resources (including annotation protocols, and training and testing data) for the development of:

    • robust compound splitters (subproject 1); and
    • first-generation compound analysers (subproject 2);

    for Afrikaans and Dutch, through a combination of cross-language transfer (i.e. technology recycling), data pooling, and various machine learning approaches.

    Other secondary aims included:

    • to report on the research and development process in the form of:
      • one Master’s degree dissertation;
      • two fourth-year student’s projects (mini-dissertation);
      • at least two scholarly papers, to be published in relevant journals or peer-reviewed conference proceedings;
      • various annotation protocols, made available publicly; and
    • to contribute towards human capital development and growth of the pool of experts in descriptive linguistics and computational linguistics in South Africa, Belgium and The Netherlands by offering bursaries, grants or contract work to undergraduate and post-graduate students.
    • to extend the collaboration network between North-West University (NWU), Tilburg University (TU) and University of Antwerp (UA), by introducing young scholars and students to each other (i.e. extending the existing collaboration beyond Van Huyssteen–Van Zaanen–Daelemans);
    • to identify new research issues as they unfold in the research and development process; and
    • to contribute to the HLT-enabling of the languages of South Africa.

    OUTPUTS

    PEER-REVIEWED PUBLICATIONS

    1. Aussems, S, Goris, B, Lichtenberg, V, Van Noord, N, Smetsers, R, & Van Zaanen, M. 2013. Unsupervised identification of compounds. In: Proceedings of the 22nd Annual Belgian-Dutch Conference on Machine Learning (Benelearn). 3 June. Nijmegen, The Netherlands.
    2. Botha, Z., Eiselen, R., & Van Huyssteen, G. 2013. Automatic Compound Semantic Analysis using Wordnets. In: Proceedings of the Twenty-Fourth Annual Symposium of the Pattern Recognition Association of South Africa. ISBN: 978-0-86970-771-5. 3 December. Pretoria, South Africa. pp. 1-6.
    3. Van Zaanen, M, Van Huyssteen, GB, Aussems, S, Emmery, C, & Eiselen, R. 2014. The Development of Dutch and Afrikaans Language Resources for Compound Boundary Analysis. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014). May. Reykjavik, Iceland.
    4. Van Huyssteen, GB. 2014. Morfologie. In: Carstens, WAM & Bosman, N. (reds.). Kontemporêre Afrikaanse Taalkunde. ISBN 978-0-62703-019-2. Pretoria: Van Schaik Uitgewers. pp. 171-208.
      Preprint
    5. Van Huyssteen, GB & Verhoeven, B. 2014. A Taxonomy for Afrikaans and Dutch compounds. In: Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014): The First Workshop on Computational Approaches to Compound Analysis (ComAComA). ISBN: 978-1-873769-43-0. 21-22 August. Dublin, Ireland. pp. 31-40.
    6. Verhoeven, B., & Daelemans, W. 2013. Semantic Classification of Dutch Noun-Noun Compounds: A Distributional Semantics Approach. In: CLIN Journal, 3: 2-18. ISSN: 2211-4009.
    7. Verhoeven, B., Daelemans, W., & van Huyssteen, G.B. 2012. Classification of Noun-Noun Compound Semantics in Dutch and Afrikaans. In: Proceedings of the Twenty-Third Annual Symposium of the Pattern Recognition Association of South Africa. ISBN: 978-0-620-54601-0. 29-30 November. Pretoria, South Africa. pp. 121-125.
    8. Verhoeven, B, & Van Huyssteen, GB. 2013. More Than Only Noun-Noun Compounds: Towards an annotation scheme for the semantic modelling of other noun compound types. In: Proceedings of the Ninth Joint ACL – ISO Workshop on Interoperable Semantic Annotation. 19-20 March. Potsdam, Germany.
    9. Verhoeven, B, Van Zaanen, MM, Daelemans, W & Van Huyssteen, GB. 2014. Automatic compound processing: Compound splitting and semantic analysis for Afrikaans and Dutch. In: Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014): The First Workshop on Computational Approaches to Compound Analysis (ComAComA). ISBN: 978-1-873769-43-0. 21-22 August. Dublin, Ireland. pp. 20-30.

    RESOURCES

    ANNOTATION GUIDELINES FOR COMPOUND ANALYSIS
    1. Verhoeven, B., van Huyssteen, G., van Zaanen, M., & Daelemans, W. 2014. Annotation Guidelines for Compound Analysis. In: CLiPS Technical Report Series (CTRS), 5. ISSN: 2033-3544.
    2. Annotation Guidelines for Compound Segmentation.
      Annotation Guidelines for the Semantic Analysis of Noun-Noun Compounds in English, Dutch and Afrikaans. Including: Decision Tree and Paraphrasing Table
    3. Annotation Guidelines for the Semantic Analysis of Other Nominal Compounds in Dutch and Afrikaans. Specifically: Adjective-Noun, Verb-Noun, Quantifier-Noun and Preposition-Noun
    COMPOUND SEMANTICS DATASET (COMPOUNDS WITH SEMANTIC ANNOTATION)

    Afrikaans

    1. Afr-NN-FirstRound (1449 compounds) 
    2. Afr-NN-SecondRound (2328 compounds)
    3. Afr-XN (4553 compounds)

    Dutch

    1. Ned-NN-FirstRound (1766 compounds)
    2. Ned-NN-SecondRound (2000 compounds)
    3. Ned-XN (600 compounds)
    COMPOUND SPLITTING DATASET (COMPOUNDS ANNOTATED WITH CONSTITUENT BOUNDARIES AND LINKING ELEMENTS)
    1. Afrikaans (25,266 compounds)
    2. Dutch (26,000 compounds)

    TALKS

    1. Aussems, S, Bruys, S, Goris, B, Lichtenberg, V, Van Noord, N, Smetsers, R, & Van Zaanen, M. 2013. Automatically Identifying Compounds. Presentation presented at the 23rd Meeting of Computational Linguistics in the Netherlands (CLIN 2013), Enschede, The Netherlands. 18 January 2013.
    2. Liversage, J, & Van Huyssteen, GB. 2013. Verifiëring van semantiese verhoudings in Afrikaanse naamwoord-naamwoordsamenstellings. [Verification of semantic relations in Afrikaans noun-noun compounds.] Presentation presented at South African Microlinguistics Workshop (SAMWOP 2013), Vanderbijlpark, South Africa. 1 November 2013.
    3. Trollip, B, & Van Huyssteen, G.B. 2013. Herbeskouing van die interfiks in Afrikaans. [Reconsideration of the interfix in Afrikaans.] Presentation presented at South African Microlinguistics Workshop (SAMWOP 2013), Vanderbijlpark, South Africa. 1 November 2013.
    4. Van den Berg, N, & Van Huyssteen, GB. 2013. Samestellings met en afleidings van meerledige eiename. [Compounds of and derivations with multi-part proper names.] Presentation presented at South African Microlinguistics Workshop (SAMWOP 2013), Vanderbijlpark, South Africa. 1 November 2013.
    5. Van Huyssteen, GB, Verhoeven, B, & Daelemans, W. 2013. Bringing together interdisciplinary perspectives on compound semantics: Examples from Afrikaans and Dutch in the CompoNet database. Presentation presented at South African Microlinguistics Workshop (SAMWOP 2013), Vanderbijlpark, South Africa. 1 November 2013.
    6. Van Huyssteen, GB & Verhoeven, B. 2014. A Taxonomy for Afrikaans and Dutch compounds. Presented at the 25th International Conference on Computational Linguistics (COLING 2014): The First Workshop on Computational Approaches to Compound Analysis (ComAComA). 21-22 August. Dublin, Ireland.
    7. Van Zaanen, M. 2012. Automatic Compound Processing (AuCoPro) – Identification for Segmentation. Presentation presented at ATILA 2012, Groesbeek, The Netherlands. 23 November 2012.
    8. Van Zaanen, M, Van Huyssteen, GB, Aussems, S, Emmery, C & Eiselen, R. 2014. The development of Dutch and Afrikaans language resources for compound boundary analysis. Presented at the 9th International Conference on Language Resources and Evaluation (LREC 2014). 26-31 May. Reykjavik, Iceland.
    9. Verhoeven, B, & Daelemans, W. 2012. Automatic Compound Processing (AuCoPro) – Semantic Analysis. Presentation presented at ATILA 2012, Groesbeek, The Netherlands. 23 November 2012.
    10. Verhoeven, B, Daelemans, W, & Van Huyssteen, GB. 2013. Semantic Classification of Dutch and Afrikaans Noun-Noun Compounds. Presentation presented at the 5th Workshop on African Language Technology (AfLaT 2013), Ghent, Belgium. 6 December 2013.
    11. Verhoeven, B, Daelemans, W, & Van Huyssteen, GB. 2013. Semantic Classification of Dutch and Afrikaans Noun-Noun Compounds. Presentation presented at the 23rd Meeting of Computational Linguistics in the Netherlands (CLIN 2013), Enschede, The Netherlands. 18 January 2013.
    12. Verhoeven, B, Van Huyssteen, GB, & Daelemans, W. 2013. Samenstellingen in het Afrikaans en Nederlands: Automatische semantische analyse en taalkundige implicaties. [Compounding in Afrikaans and Dutch: Automatic semantic analysis and linguistic implications.] Presentation presented at Graduate Conference of the Departement of Linguistics, University of Antwerp, Belgium. 2 October 2013.
    13. Verhoeven, B, Van Huyssteen, GB, & Daelemans, W. 2012. AuCoPro: Project Presentation and Recent Developments. Presented at Centre for Text Technology (CTexT), North-West University. Potchefstroom, South Africa. 7 September 2012.
    14. Verhoeven, B, Van Zaanen, MM, Daelemans, W & Van Huyssteen, GB. 2014. Automatic compound processing: Compound splitting and semantic analysis for Afrikaans and Dutch. Presented at the 25th International Conference on Computational Linguistics (COLING 2014): The First Workshop on Computational Approaches to Compound Analysis (ComAComA). 21-22 August. Dublin, Ireland.

    DISSERTATIONS (UNPUBLISHED)

    MASTERS
    1. Verhoeven, B. 2012. A Computational Semantic Analysis of Noun Compounds in Dutch. MA Thesis, University of Antwerp, Belgium.
    HONOURS
    1. Liversage, J. 2013. Verifiëring van semantiese verhoudings in Afrikaanse naamwoord-naamwoordsamestellings [Verification of semantic relations in Afrikaans noun-noun compounds]. Potchefstroom: North-West University. 
    2. Trollip, B. 2013. Herbeskouing van die interfiks in Afrikaanse komposita [Reconsideration of the interfix in Afrikaans compounds]. Potchefstroom: North-West University.
    3. Van den Berg, N. 2013. Samestellings met en afleidings van meerledige eiename in Afrikaans en Nederlands [Compounds with and derivations of multiword proper names in Afrikaans and Dutch]. Potchefstroom: North-West University.
    BACHELORS
    1. Trollip, B. 2012. Die klassifikasiemoontlikhede van nie-prototipiese samestellings. [The classification possibilities of non-prototypical compounds]. BA Dissertation, North-West University, Potchefstroom, South Africa.
    2. De Wet, C. 2012. Semantiese ontleding van Afrikaanse NN-samestellings. [Semantic analysis of Afrikaans NN-compounds]. BA Dissertation, North-West University, Potchefstroom, South Africa.
    3. Schultz, N. 2012. Die ontwikkeling van ‘n verteenwoordigende verwysende datastel van Afrikaanse samestellings. [The development of a representative referential dataset of Afrikaans compounds]. BA Dissertation, North-West University, Potchefstroom, South Africa.
    4. Liversage, J. 2012. Voorgestelde protokol vir die verwerking van X+N samestellings. [Proposed protocol for the processing of X+N compounds]. BA Dissertation, North-West University, Potchefstroom, South Africa.

    RELATED PROJECTS/LINKS

    • Scalise, S. CompoNet. University of Bologna, Italy.
      CompoNet is a descriptive compound database for 27 languages, including Dutch and Afrikaans.
    • Ó Séaghdha, D. Compound Noun Bibliography. University of Cambridge, United Kingdom.
      Bibliography of computational and linguistic literature relating to compound nouns.
  • Taxonomy of Afrikaans constructicon-formation processes


    This taxonomy represents all word-formation processes and other processes to form constructions in the Afrikaans constructicon.

    • Click on image to view details. You can then right-click on image to download in high resolution. You can also access an SVG version here (for best results, right click on link and open in new tab/window).
    • Image is based on Van Huyssteen (2017a).
    • Please reference this image as:
      Van Huyssteen, G.B. 2025. Taxonomy of Afrikaans constructionalisation processes. Version 1.1.0. Available: https://gerhard.pro/teaching.
    • Please send comments or suggestions to me via the Contact Me page. A version in English is available on request.

  • Taxonomy of Afrikaans morphemes


    This taxonomy represents all morphemes and other elements in Afrikaans word-formation processes.

    • Click on image to view details. You can then right-click on image to download in high resolution. You can also access an SVG version here (for best results, right click on link and open in new tab/window).
    • Image is based on Van Huyssteen (2017a).
    • Please reference this image as:
      Van Huyssteen, G.B. 2020. Taxonomy of Afrikaans morphemes. Version 1.0.7. Available: https://gerhard.pro/teaching.
    • Please send comments or suggestions to me via the Contact Me page. If needs be, I can also provide a version in English.
  • Recipe for morpheme analysis

    This step-wise process is used for orthography-based, etymology-driven morpheme analysis and annotation in Afrikaans.

    • Click on image to view details. You can then right-click on image to download in high resolution. You can also access an SVG version here (for best results, right click on link and open in new tab/window).
    • Please reference this image as:
    • Van Huyssteen, G.B. 2020. Recipe for morpheme analysis. Version 1.0.1. Available: https://gerhard.pro/teaching.
    • Please send comments or suggestions to me via the Contact Me page. If needs be, I can also provide a version in English.

  • Trollip & Van Huyssteen 2018

    Trollip, Eddie Benito, and Gerhard B. Van Huyssteen. 2018. “The linking morpheme in Afrikaans: a Cognitive Grammar description.” SKASE Journal of Theoretical Linguistics [online] 15 (3):37-70.

  • Van Huyssteen 2018a

    Van Huyssteen, Gerhard B. 2018. “The ‘hulle’ and ‘goed’ constructions in Afrikaans.” In The construction of words: Advances in construction morphology, edited by Geert Booij, 399-437. New York: Springer.

  • Van Huyssteen 2018b

    Van Huyssteen, Gerhard B. 2018. “Norme vir ‘huidiglik’ [Norms for ‘huidiglik’].” Literator 39 (2):a1526. doi: https://doi.org/10.4102/lit.v39i2.1526.

  • Van Huyssteen 2018c

    Van Huyssteen, Gerhard B. 2018. “‘n Korpusondersoek na ‘huidiglik’ [A corpus exploration of ‘huidiglik’].” Literator 39 (2):a1527. doi: https://doi.org/10.4102/lit.v39i2.1527.

  • AWS 2017

    Taalkommissie van die Suid-Afrikaanse Akademie vir Wetenskap en Kuns. 2017. Afrikaanse woordelys en spelreëls [Afrikaans wordlist and spelling rules]. Eleventh ed. Cape Town: Pharos.

  • Van Huyssteen 2017a

    Van Huyssteen, Gerhard B. 2017. “Morfologie [Morphology].” In Kontemporêre Afrikaanse Taalkunde [Contemporary Afrikaans Linguistics], edited by W. A. M. Carstens and N. Bosman, 177-214. Pretoria: Van Schaik Uitgewers.

  • Van Huyssteen 2017b

    Van Huyssteen, Gerhard B. 2017. “Die aard, doel en omvang van die Afrikaanse woordelys en spelreëls. Deel 1 [The nature, goal and scope of the Afrikaanse woordelys en spelreëls. Part 1].” Tydskrif vir Geesteswetenskappe 57 (2-1):323-345. doi: doi.10.17159/2224-7912/2017/v57n2-1a7.

  • Van Huyssteen 2017c

    Van Huyssteen, Gerhard B. 2017. “Opname- en elimineringskriteria vir die Afrikaanse woordelys en spelreëls: Die geval emeritus. Deel 2 [Inclusion and elimination criteria for the Afrikaans wordlist and spelling rules: The case of emeritus. Part 2].” Tydskrif vir Geesteswetenskappe 57 (2-1):346-368. doi: doi.10.17159/2224-7912/2017/v57n2-1a8.

  • Van Huyssteen 2017d

    Van Huyssteen, Gerhard B. 2017. “Voorwoord [Preface].” In Afrikaanse woordelijs en spelreëls. Faksimilee-uitgawe [Afrikaans wordlist and spelling rules. Facsimile edition], edited by Suid-Afrikaanse Akademie vir Wetenskap en Kuns. Pretoria: Protea Boekhuis.

  • Augustinus et al 2016

    Augustinus, Liesbeth, Peter Dirix, Daniel Van Niekerk, Ineke Schuurman, Vincent Vandeghinste, Frank Van Eynde, and Gerhard B. Van Huyssteen. 2016. “AfriBooms: an online treebank for Afrikaans.” Tenth International Conference on Language Resources and Evaluation, Portorož, Slovenia.

  • Van Huyssteen 2016a

    Van Huyssteen, Gerhard B. 2016. “Die ortografiese realisering van komposita met en afleidings van multiwoordeiename [The orthographical realisation of compounds with and derivations of multiword proper names].” LitNet Akademies (Geesteswetenskappe) 13 (3). doi: http://www.litnet.co.za/die-ortografiese-realisering-van-komposita-met-en-afleidings-van-multiwoordeiename/.

  • Van Huyssteen 2016b

    Van Huyssteen, Gerhard B. 2016. “Apps of toeps, los of vas: wenke vir die klaskamer [‘Apps’ or ‘toeps’, one word or two: tips for the classroom].”

  • Van Huyssteen, Botha & Antonites 2016

    Van Huyssteen, Gerhard B., Melodi Botha, and Alex Antonites. 2016. “Die Virtuele Instituut vir Afrikaans (VivA) en markbehoeftes in die Afrikaanse gemeenskap [The Virtual Institute for Afrikaans (VivA) and market needs of the Afrikaans community].” Tydskrif Vir Geesteswetenskappe 56 (2-1):410-437. doi: http://doi.org/doi.10.17159/2224-7912/2016/v56n2-1a8.

  • Breed & Van Huyssteen 2015

    Breed, Adri, and Gerhard B. Van Huyssteen. 2015. “Aan die en besig in Afrikaanse progressiwiteitskonstruksies: ‘n korpusondersoek (2) [‘Aan die’ and ‘besig’ in Afrikaans progressive constructions: a corpus investigation (2)].” Tydskrif vir Geesteswetenskappe 55 (2):251-269. doi: 10.17159/2224-7912/2015/v55n2a7.

  • GridLine & CTexT 2015

    GridLine, and CTexT. 2015. Afrikaanse klinkende taal. Web demo (prototype). Amsterdam: GridLine.

  • Heeringa, De Wet & Van Huyssteen 2015

    Heeringa, Wilbert, Febe De Wet, and Gerhard B. Van Huyssteen. 2015. “Afrikaans and Dutch as closely-related languages: A comparison to West Germanic languages and Dutch dialects.” Stellenbosch Papers in Linguistics Plus 47:1-18. doi: 10.5842/47-0-649.

  • Puttkammer & Van Huyssteen 2015

    Puttkammer, Martin J., and Gerhard B. Van Huyssteen. 2015. Afrikaanse werkwoorde met verledetydvorme [Afrikaans verbs with past-tense forms]. Potchefstroom: Centre for Text Technology (CTexT), North-West University.

  • Van Huyssteen & Griessel 2015

    Van Huyssteen, Gerhard B., and Marissa Griesel. 2015. “Translation technology in South Africa.” In Routledge Encyclopedia of Translation Technology, edited by S. W. Chan. New York: Routledge.

  • Van Huyssteen 2015a

    Van Huyssteen, Gerhard B. 2015. “Nuwe digitale ontwikkelinge in Afrikaans: praktiese hulpmiddels [New digital developments in Afrikaans: practical resources].” University of Pretoria, PRETORIA, South Africa.

  • Van Huyssteen 2015b

    Van Huyssteen, Gerhard B. 2015. “Afrikaans negentig – Afrikaans digitaal [Afrikaans ninety – Afrikaans digital].” Gents Colloquium Over Het Afrikaans.

  • Van Huyssteen et al 2015

    Van Huyssteen, Gerhard B., Marlie Coetzee, E. Roald Eiselen, Wildrich Fourie, Ismael Lavangee, Martin J. Puttkammer, and Cornelius. Van der Walt. 2015. VivA interfaces 1.0. Johannesburg: Virtual Institute for Afrikaans (VivA).

  • Van Niekerk, Van Huyssteen & Puttkammer 2015

    Van Niekerk, Daniel, Gerhard B. Van Huyssteen, and Martin J. Puttkammer. 2015. Closely related languages convertor v2.0.0. Potchefstroom: Centre for Text Technology (CTexT), North-West University.

  • Breed & Van Huyssteen 2014

    Breed, Adri, and Gerhard B. Van Huyssteen. 2014. “Aan die en besig in Afrikaanse progressiwiteitskonstruksies: die ontstaan en ontwikkeling (1) [‘Aan die’ and ‘besig’ in Afrikaans progressive sonstructions: origin and development (1)].” Tydskrif vir Geesteswetenskappe 54 (4):708-725.

  • CTexT 2014

    CTexT. 2014. Afrikaans NCHLT Annotated Text Corpora 1.0. Potchefstroom: Resource Management Agency.

  • Snyman, Van Huyssteen & Daelemans 2014

    Snyman, Dirk P., Gerhard B. Van Huyssteen, and Walter Daelemans. 2014. “Outomatiese genreklassifikasie vir Afrikaans [Automatic genre classification for Afrikaans].”

  • Van Huyssteen & Coetzee 2014

    Van Huyssteen, Gerhard B., and Marlie Coetzee. 2014. “The Virtual Institute for Afrikaans: digital language resources and technologies in a customer-facing web service.” PRASA 2014, CAPE TOWN, South Africa.

  • Van Huyssteen & Verhoeven 2014

    Van Huyssteen, Gerhard B., and Ben. Verhoeven. 2014. “A taxonomy for Afrikaans and Dutch compounds.” Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014): The First Workshop on Computational Approaches to Compound Analysis (ComAComA), Dublin, Ireland.

  • Van Huyssteen 2014a

    Van Huyssteen, Gerhard B. 2014. “Morfologie [Morphology].” In Kontemporêre Afrikaanse Taalkunde [Contemporary Afrikaans Linguistics], edited by W. A. M. en Bosman N. Carstens. Pretoria: Van Schaik.

  • Van Huyssteen 2014b

    Van Huyssteen, Gerhard B. 2014. “Hoekom Afrikaans in die digitale era? [Why Afrikaans in the digital era?].”

  • Van Huyssteen et al 2014

    Van Huyssteen, Gerhard B., Walter Daelemans, Menno M. Van Zaanen, and Ben. Verhoeven. 2014. Resources for compound processing. North-West University: Potchefstroom, South Africa; University of Antwerp: Antwerp, Belgium; University of Tilburg: Tilburg, The Netherlands.

  • Van Zaanen et al 2014

    van Zaanen, Menno M., Gerhard B. Van Huyssteen, Suzanna Aussems, Chris Emmery, and Roald Eiselen. 2014. “The development of Dutch and Afrikaans language resources for compound boundary analysis.” Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik, Iceland.

  • Verhoeven et al 2014a

    Verhoeven, Ben, Menno M Van Zaanen, Walter Daelemans, and Gerhard B. Van Huyssteen. 2014. “Automatic compound processing: compound splitting and semantic analysis for Afrikaans and Dutch.” Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014): The First Workshop on Computational Approaches to Compound Analysis (ComAComA), Dublin, Ireland.

  • Verhoeven et al 2014b

    Verhoeven, Ben, Gerhard B Van Huyssteen, Menno M. Van Zaanen, and Walter Daelemans. 2014. Annotation guidelines for compound analysis. In CLiPS Technical Report Series (CTRS). Number 5. Antwerp: University of Antwerp.

  • Botha, Eiselen & Van Huyssteen 2013

    Botha, Zandré, E. Roald Eiselen, and Gerhard B. Van Huyssteen. 2013. “Improving automatic compound semantic analysis using Wordnets.” PRASA 2013.

  • Sharma Grover, Van Huyssteen & Calteaux 2013

    Sharma Grover, Aditi, Gerhard B. Van Huyssteen, and Calteaux K. 2013. “Towards an information ecosystem for animal disease surveillance using voice services.” Proceedings of the 3rd Annual Symposium on Computing for Development (DEV 2013), Bangalore, India.

  • Verhoeven & Van Huyssteen 2013

    Verhoeven, Ben, and Gerhard B. Van Huyssteen. 2013. “More than only noun-noun compounds: towards an annotation acheme for the semantic modelling of other noun compound types.” Proceedings of Ninth Joint ACL – ISO Workshop on Interoperable Semantic Annotation.

  • Butler & Van Huyssteen 2012

    Butler, Anneke, and Gerhard B. Van Huyssteen. 2012. 5000 Afrikaans woorde, gekategoriseer volgens spellingprobleme [5000 Afrikaans words, categorised according to spelling problems]. Ongepubliseer. Potchefstroom: Noordwes Universiteit.

  • Calteaux, Sharma Grover & Van Huyssteen 2012

    Calteaux, Karen, Aditi Sharma Grover, and Gerhard B. Van Huyssteen. 2012. “Business drivers and design choices for multilingual IVRs: A government service delivery case study.” Proceedings of the 3rd International Workshop on Spoken Languages Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa.

  • Davel & Van Huyssteen 2012

    Davel, Marlie, and Gerhard B. Van Huyssteen. 2012. Construction morphology toolkit (cx-morph-toolkit) 0.1. Potchefstroom: North-West University.

  • Sharma Grover et al 2012a

    Sharma Grover, Aditi, Annamart Nieman, Gerhard B. Van Huyssteen, and Justus Roux. 2012. “Aspects of a legal framework for language resource management.” Proceedings of the Eighth Language Resources and Evaluation Conference (LREC’12), Istanbul, Turkey.

  • Sharma Grover et al 2012b

    Sharma Grover, Aditi, Karen Calteaux, Etienne Barnard, and Gerhard B. Van Huyssteen. 2012. “A voice service for user feedback on school meals.” Proceedings of the Second Annual Symposium on Computing for Development (ACM DEV 2012), Atlanta, USA.

  • Snyman, Van Huyssteen & Daelemans 2012

    Snyman, Dirk, Gerhard B. Van Huyssteen, and Walter Daelemans. 2012. “Cross-lingual genre classification for closely related languages.” Proceedings of the Twenty-Third Annual Symposium of the Pattern Recognition Association of South Africa, Pretoria, South Africa.

  • Van Huyssteen, Sharma Grover & Calteaux 2012

    Van Huyssteen, Gerhard B., Aditi Sharma Grover, and Karen. Calteaux. 2012. “Voice user interface design for emerging multilingual markets.” In Language Science and Language Technology in Africa: A Festschrift for Justus C. Roux., edited by H. S. Ndinga-Koumba-Binza and S. E. Bosch, 291-308. Stellenbosch: SUN Press.

  • Verhoeven, Daelemans & Van Huyssteen 2012

    Verhoeven, Ben, Walter Daelemans, and Gerhard B. Van Huyssteen. 2012. “Classification of noun-noun compound semantics in Dutch and Afrikaans.” Pretoria, South Africa, 2012.

  • De Wet, De Waal & Van Huyssteen 2011

    De Wet, F, A De Waal, and Gerhard B. Van Huyssteen. 2011. “Developing a broadband automatic speech recognition system for Afrikaans.” Proceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011):3185-3188.

  • Heeringa, De Wet & Van Huyssteen 2011

    Heeringa, Wilbert, Febe De Wet, and Gerhard B. Van Huyssteen. 2011. “Afrikaans and Dutch as closely related languages: a comparison to West Germanic languages and Dutch dialects.” Methods in Dialectology, University of Western Ontario, London, Ontario, Canada.

  • Pilon & Van Huyssteen 2011

    Pilon, Suléne, and Gerhard B. Van Huyssteen. 2011. “Technology recycling for closely related languages: Dutch and Afrikaans.” 21th Meeting of Computational Linguistics in the Netherlands (CLIN) 2011, University College Ghent, Ghent, Belgium.

  • Roux et al 2011

    Roux, Justus, C., Gerhard B. Van Huyssteen, Tebogo Gumede, and Mampaka Mojapelo, L. 2011. “The South African National HLT resource management agency.” 3rd European Language Resources and Technologies (FLaReNet ) Forum: Language Resources in the Sharing Age – the Strategic Agenda Forum, Università Ca’ Foscari, VENEZIA, Italy.

  • Sharma Grover, Van Huyssteen & Pretorius 2011a

    Sharma Grover, Aditi, Gerhard B. Van Huyssteen, and M. W. Pretorius. 2011. “The South African human language technology audit.” Language Resources and Evaluation.

  • Sharma Grover, Van Huyssteen & Pretorius 2011b

    Sharma Grover, Aditi, Gerhard B. Van Huyssteen, and M. W. Pretorius. 2011. “A technology audit: Human language technologies (HLT) R&D in South Africa.” Proceedings of the Portland International Conference on Management of Engineering and Technology (PICMET 2011), Portland, Oregon, USA.

  • Sharma Grover, Van Huyssteen & Pretorius 2011c

    Sharma Grover, Aditi, Gerhard B. Van Huyssteen, and M. W. Pretorius. 2011. “The South African human language technology audit: Supplemental online material.” Language Resources and Evaluation 45 (3):271-288. doi: 10.1007/s10579-011-9151-2.

  • Snyman, Van Huyssteen & Daelemans 2011

    Snyman, Dirk P., Gerhard B. Van Huyssteen, and Walter Daelemans. 2011. “Automatic genre classification for resource scarce languages.” Proceedings of the 2011 Conference of the Pattern Recognition Association of South Africa, Vanderbijlpark, South Africa.

  • Barnard, Davel & Van Huyssteen 2010

    Barnard, Etienne, Marlie H. Davel, and Gerhard B. Van Huyssteen. 2010. “Speech technology for information access: A South African case study.” AAAI Spring Symposium – Technical Report SS-10-01 (Nasfors 2007):8-13.

  • Bilingual Terminology List: Computational Linguistics

    Van Huyssteen, Gerhard B. 2010. Bilingual word list: computational linguistics/Tweetalige woordelys: rekenaarlinguistiek.

  • Pilon, Van Huyssteen & Augustinus 2010

    Pilon, Suléne, Gerhard B. Van Huyssteen, and Liesbeth Augustinus. 2010. “Converting Afrikaans to Dutch for technology recycling.” Proceedings of the 2010 Conference of the Pattern Recognition Association of South Africa:219-224.

  • Schlünz, Barnard & Van Huyssteen 2010

    Schlünz, Georg I., Etienne Barnard, and Gerhard B. Van Huyssteen. 2010. “Part-of-speech effects on text-to-speech synthesis.” Proceedings of the 2010 Conference of the Pattern Recognition Association of South Africa:257-262.

  • Sharma Grover et al 2010a

    Sharma Grover, Aditi, Gerhard B. Van Huyssteen, and M. W. Pretorius. 2010. “An HLT profile of the official South African languages.” 2nd Workshop on African Language Technology: AfLaT 2010:3-7.

  • Sharma Grover et al 2010b

    Sharma Grover, Aditi, Karen Calteaux, Gerhard B. Van Huyssteen, and M. W. Pretorius. 2010. “An overview of HLTs for South African Bantu languages 978-1-60558-950-3.” Proceedings of the 2010 Annual Research Conference of the South African Institute for Computer Scientists and Information Technologists (SAICST):370-375.

  • Sharma Grover, Van Huyssteen & Pretorius 2010a

    Sharma Grover, Aditi, Gerhard B. Van Huyssteen, and M. W. Pretorius. 2010. “The South African human language technologies audit.” Proceeding of the 7th Language Resource and Evaluation Conference:2847-2850.

  • Sharma Grover, Van Huyssteen & Pretorius 2010b

    Sharma Grover, Aditi, Gerhard B. van Huyssteen, and Marthinus W. Pretorius. 2010. “An HLT profile of the official South African languages.” The National HLT network (NHN), CSIR International Conference Centre, PRETORIA, South Africa.

  • Van Huyssteen & Davel 2010

    Van Huyssteen, Gerhard B., and M. Davel. 2010. “Learning rules and categorization networks for language standardization.” Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL) Workshop on Extracting and Using Constructions in Computational Linguistics:39-46.

  • Van Huyssteen & Pilon 2010a

    Van Huyssteen, Gerhard B., and Suléne Pilon. 2010. “Some thoughts on a Dutch-to-Afrikaans convertor.” Guest lecture, University of Antwerp, ANTWERP, Belgium.

  • Van Huyssteen & Pilon 2010b

    Van Huyssteen, Gerhard B., and Suléne Pilon. 2010. “A Dutch-to-Afrikaans convertor.” 20th Meeting of Computational Linguistics in the Netherlands (CLIN) 2010, Utrecht University, UTRECHT, The Netherlands.

  • Van Huyssteen 2010

    Van Huyssteen, Gerhard B. 2010. “(Re)defining component structures in morphological constructions: a cognitive grammar perspective.” In Cognitive Approaches to Word-Formation, edited by Alexander Onysko and Sascha Michel, 97-126. Berlin: Mouton de Gruyter.

  • AWS 2009

    Taalkommissie van die Suid-Afrikaanse Akademie vir Wetenskap en Kuns. 2009. Afrikaanse woordelys en spelreëls [Afrikaans wordlist and spelling rules]. Tenth ed. Cape Town: Pharos Dictionaries.

  • Daelemans, Groenewald & Van Huyssteen 2009

    Daelemans, Walter, Hendrik J. Groenewald, and Gerhard B. Van Huyssteen. 2009. “Prototype-based active learning for lemmatization.” Proceedings of Recent Advances in Natural Language Processing 2009.

  • Van Huyssteen & Pilon 2009

    Van Huyssteen, Gerhard B., and Suléne Pilon. 2009. “Rule-based conversion of closely-related languages: A Dutch-to-Afrikaans convertor.” Proceedings of the 2009 Conference of the Pattern Recognition Association of South Africa:23-28.

  • Bilingual Terminology List: Cognitive Linguistics

    Van Huyssteen, Gerhard B. 2008. Bilingual word list: cognitive linguistics/Tweetalige woordelys: kognitiewe linguistiek.

  • Groenewald & Van Huyssteen 2008

    Groenewald, Hendrik J, and Gerhard B. Van Huyssteen. 2008. “Outomatiese lemma-identifisering vir Afrikaans [Automatic lemmatisation for Afrikaans].” Literator 29 (1):65-91.

  • Pilon, Puttkammer & Van Huyssteen 2008

    Pilon, Suléne, Martin J. Puttkammer, and Gerhard B. Van Huyssteen. 2008. “Die ontwikkeling van ‘n woordafbreker en kompositumanaliseerder vir Afrikaans [The development of a hyphenator and compound analyser for Afrikaans].” Literator 29 (1):21-41.

  • Van Huyssteen & Bosch 2008

    Van Huyssteen, Gerhard B., and Sonja E. Bosch. 2008. “Voorwoord: Mensetaaltegnologie vir Suid-Afrikaanse tale [Preface: Human language technology for South African languages].” Literator 29 (1):xi-xvii.

  • Van Huyssteen 2008a

    Van Huyssteen, Gerhard B. 2008. “Oor modelle [On models].” Inaugural lecture, North-West University, POTCHEFSTROOM, South Africa.

  • Van Huyssteen 2008b

    Van Huyssteen, Gerhard B. 2008. Bilingual word list: computational linguistics [Tweetalige woordelys: rekenaarlinguistiek]. In Literator.

  • Van Huyssteen 2008c

    Van Huyssteen, Gerhard B. 2008. “Bluff your way through cognitive grammar (aka cognitive grammar for dummies).” Special 6 hour tutorial organised by ‘Centrum voor Nederlandse Taal en Spraak’, University of Antwerp, ANTWERP, Belgium.

  • Groenewald & Van Huyssteen 2007

    Groenewald, Hendrik J., and Gerhard B. Van Huyssteen. 2007. “Work-in-progress: end-user requirements for machine-aided translation tools.” LSSA Conference, North-West University, POTCHEFSTROOM, South Africa.

  • Groenewald, Van Huyssteen & Puttkammer 2007

    Groenewald, Hendrik J., Gerhard B. Van Huyssteen, and Martin J. Puttkammer. 2007. “Evaluating wrapped progressive sampling for automatic algorithmic parameter optimisation.” International Conference Recent Advances in Natural Language Processing, RANLP:251-255.

  • Groenewald, Van Huyssteen & Van Den Bosch 2007

    Groenewald, Hendrik J., Antal Van den Bosch, and Gerhard B. Van Huyssteen. 2007. “Feature selection and parameter optimisation for effective Afrikaans lemmatisation.” Computational Linguistics in the Netherlands, University of Leuven, LEUVEN, Belgium.

  • Puttkammer, Schlemmer & Van Huyssteen 2007

    Puttkammer, Martin J., Martin Schlemmer, and Gerhard B. Van Huyssteen. 2007. “Developing web-based word-translators.” LSSA Conference, North-West University, POTCHEFSTROOM, South Africa.

  • Van Huyssteen & Groenewald 2007

    Van Huyssteen, Gerhard B., and Hendrik J. Groenewald. 2007. “‘n Heroorweging van fleksie in Afrikaans: klasse vir outomatiese Afrikaanse lemma-identifisering [Afrikaans inflection revisited: classes for automatic Afrikaans lemmatisation].” LSSA Conference, North-West University, POTCHEFSTROOM, South Africa.

  • Van Huyssteen & Puttkammer 2007

    Van Huyssteen, Gerhard B., and Martin J. Puttkammer. 2007. “Accelerating the annotation of lexical data for less-resourced languages.” Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007):1505-1508.

  • Van Huyssteen & Wissing 2007

    Van Huyssteen, Gerhard B., and Daan P. Wissing. 2007. “Datagebaseerde aspekte van Afrikaanse reduplikasies [Data-based aspects of Afrikaans reduplications].” Southern African Linguistics and Applied Language Studies 25 (3):419-439.

  • Van Huyssteen 2007

    Van Huyssteen, Gerhard B. 2007. “Designing an e-Learning system for language learning: a case study.” Innovations in E-learning, Instruction Technology, Assessment, and Engineering Education:105-110.

  • Van Huyssteen, Puttkammer, Pilon & Groenewald 2007

    Van Huyssteen, Gerhard B., Martin J. Puttkammer, Suléne Pilon, and Hendrik J. Groenewald. 2007. “Using machine learning to annotate data for NLP tasks semi-automatically.” Proceedings of International Workshop on Computer-Aided Language Processing.

  • Puttkammer & Van Huyssteen 2006

    Puttkammer, Martin J., and Gerhard B. Van Huyssteen. 2006. “Automatic text segmentation of Afrikaans using memory-based learning.” Proceedings of the 2006 Conference of the Pattern Recognition Association of South Africa.  Pretoria: CSIR/Meraka.

  • Van Huyssteen 2006

    Van Huyssteen, Gerhard B. 2006. “eLearning for language learning.” IST-Africa Conference, CSIR International Convention Centre, PRETORIA, South Africa.

  • Pilon, Van Huyssteen & Van Rooy 2005

    Pilon, Suléne, Gerhard B. Van Huyssteen, and Bertus Van Rooy. 2005. “Teaching Language Technology at the North-West University.” Proceedings of the Second ACL-TNLP Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics.:57 – 61.

  • Brits, Pretorius & Van Huyssteen 2005

    Brits, J. C., Rigardt S. Pretorius, and Gerhard B. Van Huyssteen. 2006. “Automatic Lemmatisation in Setswana: Towards a Prototype.” South African Journal of African Languages 25:37-47.

  • Van Huyssteen & Janke 2005

    Van Huyssteen, Gerhard B., and Ulrike Janke. 2005. “Developing spelling checkers for South African languages.” International Conference of the African Languages Association of Southern Africa, University of Johannesburg, JOHANNESBURG, South Africa.

  • Van Huyssteen & Pretorius 2005

    Van Huyssteen, Gerhard B., and Laurette. Pretorius. 2005. “Afrikaanse tekstegnologie: ‘n inventaris [Afrikaans text technology: an inventory].” Annual Symposium of ‘Suid-Afrikaanse Akademie vir Wetenskap en Kuns’, PRETORIA, South Africa.

  • Van Huyssteen 2005a

    Van Huyssteen, Gerhard B. 2005. “‘n Kognitiewe gebruiksgebaseerde beskrywingsmodel vir die Afrikaanse grammatika [A cognitive usage-based description model for Afrikaans grammar].” Southern African Linguistics and Applied Language Studies 23 (2):125-137.

  • Van Huyssteen 2005b

    Van Huyssteen, Gerhard B. 2005. Bilingual word list: cognitive linguistics. In Southern African Linguistics and Applied Language Studies.

  • Van Huyssteen 2005c

    Van Huyssteen, Gerhard B. 2005. “On valence morphemes in Afrikaans.” 9th International Cognitive Linguistics Conference, Yonsei University, SEOUL, South Korea.

  • Van Huyssteen, Pilon & Puttkammer 2005

    Van Huyssteen, Gerhard B., Suléne Pilon, and Martin J. Puttkammer. 2005. “Using machine learning in the development of modules for proofing tools.” International Conference of the African Languages Association of Southern Africa, University of Johannesburg, JOHANNESBURG, South Africa.

  • Van Huyssteen & Van Zaanen 2004

    Van Huyssteen, Gerhard B., and Menno M. Van Zaanen. 2004. “Learning compound boundaries for Afrikaans spelling checking.” Proceedings of First Workshop on  International Proofing Tools and Language Technologies . Patras: University of Patras.:101-108.

  • Van Huyssteen 2004

    Van Huyssteen, Gerhard B. 2004. “Motivating the composition of Afrikaans reduplications: a cognitive grammar analysis.” In Studies in Linguistic Motivation, edited by G. Radden and K. U. Panther, 269-292. Berlin: Mouton de Gruyter.

  • Van Huyssteen, Eiselen & Puttkammer 2004

    Van Huyssteen, Gerhard B., E. Roald Eiselen, and Martin J. Puttkammer. 2004. “Re-evaluating evaluation metrics for spelling checker evaluations.” Proceedings of First Workshop on International Proofing Tools and Language Technologies. Patras: University of Patras.:91-99.

  • Els & Van Huyssteen 2003

    Els, Christo J., and Gerhard B. Van Huyssteen. 2003. “Outomatiese let-ter-greep-ver-de-ling en woord-afbreking in Afrikaans [Automatic hyphenation in Afrikaans].” LSSA Conference, Rand Afrikaans University, JOHANNESBURG, South Africa.

  • Pilon & Van Huyssteen 2003

    Pilon, Suléne, and Gerhard B Van Huyssteen. 2003. “‘n Etiketstel vir ‘n woordsoortetiketteerder vir Afrikaans [A Tagset for a Part-of-Speech Tagger for Afrikaans].” LSSA Conference, Rand Afrikaans University, JOHANNESBURG, South Africa.

  • Van Huyssteen & Van Zaanen 2003

    Van Huyssteen, Gerhard B., and Menno M. Van Zaanen. 2003. “A spellchecker for Afrikaans, based on morphological analysis.” 6th International Terminology in Advanced Management Applications Conference: Conference Proceedings:189-194.

  • Van Huyssteen 2003a

    Van Huyssteen, Gerhard B. 2003. “An introduction to human language technology in Southern Africa: resources and applications.” Southern Africa Linguistics and Applied Language Studies (21).

  • Van Huyssteen 2003b

    Van Huyssteen, Gerhard B. 2003. “Applying embodied construction grammar: a description of some Afrikaans morphological constructions.” 8th International Cognitive Linguistics Conference, University of La Rioja, LOGROÑO, Spain.

  • Van Huyssteen 2003c

    Van Huyssteen, Gerhard B. 2003. “Cognitive Afrikaans morphology.” Cognitive Linguistics in South Africa Colloquium, University of South Africa, PRETORIA, South Africa.

  • Van Huyssteen & Pilon 2003

    Van Huyssteen, Gerhard B., and Suléne Pilon. 2003. “The Afrikaans plural construction: an embodied construction grammar account.” Cognitive Linguistics in South Africa Colloquium, University of South Africa, PRETORIA, South Africa.

  • Van Zaanen & Van Huyssteen 2003a

    Van Zaanen, Menno M., and Gerhard B. Van Huyssteen. 2003. “Improving a spelling checker for Afrikaans.” Computational Linguistics in the Netherlands 2002: Selected Papers from the Thirteenth CLIN Meeting (47):143-156.

  • Van Zaanen & Van Huyssteen 2003b

    Van Zaanen, Menno M., and Gerhard B. Van Huyssteen. 2003. “Various uses of spelling checkers: learning, teaching, and practical experiences.” Southern African Linguistics and Applied Language Studies 21 (3):327-340.

  • Van Huyssteen 2002a

    Van Huyssteen, Gerhard B. 2002. “Teoretiese vooronderstellings van ‘n kognitiewe gebruiksgebaseerde beskrywingsmodel vir die Afrikaanse grammatika [Theoretical Assumptions of a Cognitive Usage-based Model for Afrikaans Grammar].” Southern African Linguistics and Applied Language Studies 20 (4):303-323.

  • VanHuyssteen 2002b

    Van Huyssteen, Gerhard B. 2002. “Desiderata of spellchecking/spell-checking/spell checking: towards an intelligent spellchecker for Afrikaans.” Potchefstroom, 2002.