Digitisation of Language resources

Project Type: Node
Project Start Date: 1 April 2017
Project Status: Ongoing

Project Aims and Motivation:

The UP digitisation node focuses on the preservation of invaluable language (and cultural) resources for the African languages by digitising textual, video and audio material, and providing language communities with access to these digital resources via the SADiLaR repository. Digitised content delivered by the node forms the basis for HLT and NLP applications such as machine translation systems, writing support tools, compilation of (electronic) dictionaries and text production verification tools for the African languages. It also forms the basis for any kind of corpus-based research, whether it be linguistic, lexicographic or terminological research. The availability of digitised resources furthermore increases the status of these languages as languages of higher functions with an end goal of empowering speakers to exercise their citizenship in their strongest language, and increases these languages’ digital footprint.

The project aims at building language resources for the indigenous South African languages through digitisation of language and language related text, audio, online and video data. This project entails the continuation of mass digitisation of all 11 official languages of South Africa. Digitisation will also include digital resources for specific needs and projects.

Contact persons:

Prof. ElsabĂ© Taljard, Co-Node Manager: elsabe.taljard@up.ac.za

Prof. Danie Prinsloo, Co-Node Manager: danie.prinsloo@up.ac.za