Technology offers an opportunity for South Africans to cross the many language barriers that exist in a country with 11 official languages, and to communicate more effectively with one another. The Centre of Text Technology (CTexT®) focuses on the research and development of Human Language Technology (HLT) within the South African context. Their goal is to improve the interaction between human and computer by developing modern technological programs and applications. This in turn can improve the communication between people of different language backgrounds.
Based at the North-West University, Potchefstroom Campus, CTexT® combines research expertise and the essential technical and administrative support for expanding the much-needed resources for HLT. They conduct cutting-edge research in text technology and use that as the basis for the development of innovative and relevant technological applications for resource-scarce languages.
As the official text node of SADiLaR, CTexT® focuses on the advancement of multilingualism and building indigenous South African languages. For under-resourced languages, text data isn’t as available as it is for English, for example, and yet this is needed if new technologies within big data and artificial intelligence (AI), responsive to the unique South African context, are to be developed.
Key projects
Linguistic corpus enrichment project
In this long-term project CTexT® sources, collects, and processes resources which include corpora, i.e., larger collections of texts, and develops core technologies, such as morphological analysers or part-of-speech taggers, which make up the building blocks for language technologies for these indigenous languages.
In our other key long-term project, we focus on parallel corpus development for use in machine translation systems. This series of projects which fall under the Autshumato umbrella funded by SADiLaR and the Department of Sports, Arts and Culture, has provided easy-to-use, open-source technologies that simplify the translation process, promote terminology standardisation and shorten translation time. This provides the public with improved access to information in their mother-tongue and aids effective public service delivery.
Other key technologies
Examples of the noteworthy technologies that CTexT® has developed include spelling checkers for 10 official SA languages (excluding English), automatic machine translation systems, various machine learning-based core technologies, and collections of text data and tools (i.e. technology for optical character recognition, identification of languages or parts of speech etc.) for future development.