Linguistic corpus enrichment for conjunctively written South African languages

Project Type: Node
Start Date: 1 October 2017
Project Status: Completed and delivered

 

Project Aims:

This project was developed under the Nodes Specialisation Project, makes linguistically enriched corpora available for the four official South African languages with a conjunctive orthography, i.e. isiNdebele, isiXhosa, isiZulu, and Siswati.

The parallel corpora consist of approximately 50,000 tokens each, aligned between all four languages and English and annotated for morphology, part of speech and lemmas. Based on the annotated corpora, we also developed core technologies, namely lemmatisers, POS taggers and morphological analysers for these four languages. 

 

Project Deliverables:

Contact details:

Please contact ctext@nwu.ac.za