South Africa, with its rich diversity of eleven official languages, is a potential emerging market where language technology (LT) applications can contribute to the promotion of multilingualism and language development, and as such have a positive impact on the South African community. But for effective LT applications in South Africa’s indigenous languages key language resources are needed.
The University of South Africa (UNISA) node of SADiLaR, hosted by UNISA's Department of African Languages, specialises in language resource development for South Africa’s official languages.
One of the fundamental resources required for the development of a large number of core language technologies (LTs) and LT applications, is a wordnet. A wordnet is a lexical database consisting of words that are grouped into sets of synonyms called synsets. Various conceptual-semantic and lexical relations are indicated between the synsets contained in a wordnet.
Wordnets are not only useful, but also indispensable components of large automatic language understanding systems being developed and tested in academia and industry. Adding the South African languages to the wordnet web enables many such applications for each of these languages in isolation. Moreover, linking the South African wordnets to one another and to the many global wordnets makes cross-linguistic information retrieval and question answering possible, and significantly aids machine translation, an important contribution to the empowerment of the African languages.
African languages have important and specialised terminology in specific fields. The UNISA node works to make this multilingual terminology, specifically in the fields of linguistics and literary studies freely available in a large database so that these resources can contribute positively to the teaching and learning domain as well as to other forms of language practice such as language learning and interpretation. It is our aim to provide students in a linguistics or literature classroom with an easy to access, free termbank, including definitions and usage examples for each term, in all of the South African languages. We hope that by enabling students to access complex terminology in any language they are better able to engage with the subject matter and enhance their understanding of this field of specialisation. Creating a central termbank, in consultation with other institutions of higher learning such as the University of Limpopo, North-West University and University of the Free-State, also makes it possible to standardise the terms in the South African languages.
The resources created in both the African Wordnet project as well as the Multilingual Linguistic and Literary Terminology project are provided open access through SADiLaR’s repository and webservices.
The UNISA node is further committed to human capital development in the digital humanities sphere and hosts regular informal meetings as well as formalised training workshops. These opportunities afford linguists working on the two subprojects the unique opportunity to not only advance their language practice, translation and linguistic skills, but to also see what the resources they are developing are used for. The impressive list of multidisciplinary and collaborative research outputs (as found on the project website), speaks to this. Young researchers are also encouraged to interact with experienced project members and thereby grow into the project leaders of tomorrow.