Project Type: Node
Start Date: 1 October 2017
Project Status: Completed and delivered
Project Aims:
This project was developed under the Nodes Specialisation Project, makes linguistically enriched corpora available for the four official South African languages with a conjunctive orthography, i.e. isiNdebele, isiXhosa, isiZulu, and Siswati.
The parallel corpora consist of approximately 50,000 tokens each, aligned between all four languages and English and annotated for morphology, part of speech and lemmas. Based on the annotated corpora, we also developed core technologies, namely lemmatisers, POS taggers and morphological analysers for these four languages.
Project Deliverables:
- 50,000 token parallel corpus for four languages
- Lemmatisers, POS taggers and morphological analysers for four languages
Contact details:
Please contact ctext@nwu.ac.za