African Wordnet and Multilingual Linguistic Terminology

Project Type: Node
Project Start Date: 1 October 2017
Project Status: Two development phases completed (end of 2017 and end of 2019) with a third currently ongoing (ending in 2024)

Project Aims:

The UNISA SADiLaR node project is two-pronged and concerns the development of language resources in the form of wordnets as well as the development of linguistics terminology for nine South African languages. The African Wordnet (AfWN) project was initiated in 2010 with linguists for isiXhosa, isiZulu, Setswana and Sesotho sa Leboa starting work on the basic structure of wordnets for each of these languages and following the expand approach from the English Princeton WordNet. Other South African languages were added to the fold in phased development cycles and first wordnets for Tshivenḓa, Siswati, Sesotho, Xitsonga and isiNdebele are now available for download from the SADiLaR repository.

In the second subproject within the node, the Multilingual South African Linguistic Terminology project, the aim is on creating a freely available, open educational resource (OER) for South African linguistics students and scholars. Termbanks in the Lexonomy interface were populated with linguistic terms in nine South African languages (Tshivenḓa, Siswati, Sesotho, Xitsonga, isiNdebele, isiXhosa, isiZulu, Setswana and Sesotho sa Leboa) with a definition and usage example to better illustrate the meaning of the terms often used in Linguistics classrooms and textbooks. This resource is available for download as XML files from the SADiLaR repository or can easily be browsed via the Lexonomy interface.

 Project Deliverables:

  1. Setup and maintenance of a dedicated SADiLaR server to host the AfWN
  2. Publications (one per annum as minimum), including articles in peer reviewed journals and conference proceedings by project team members (please see the two project webpages for regularly updated publication lists)
  3. Various training workshops (at least two per annum, ranging from internal workshops for project team members to discuss efficient project progress, to workshops on advanced linguistic resource development within a natural language processing domain, to which students and the larger community of stakeholders in Southern Africa are invited) 
  4. Co-hosting of the Global Wordnet Conference between the node and SADiLaR Hub in 2021
  5. Regularly updated websites for both subprojects to list all project team members and highlight project events and activities such as workshops, new data releases and publications
  6. Development, standardisation and quality assurance of 500 linguistic terms with usage examples and definitions for each in nine South African languages, with English as the pivot language
  7. Integration of the developed termbank to the Lexonomy interface to enhance accessibility
  8. Development of 8 000 new synsets across 9 languages (including basic synsets, usage examples and definitions, plus quality assurance) by language experts, using the SIL Comparative African Wordlist as seed list (a separate version of this data as corpus is also available for download) 
  9. Development of 2 000 new definitions to be added to existing synsets for 5 languages in the AfWN
  10. Integration of all wordnet development to the WordnetLoom interface and the addition of semantic relations to synsets.
  11. Usage of the African Wordnet data (AfWN) for language learning in the Kamusi Mobile Online Dictionary

Current Project Activities:

The project is currently in a third development phase (commencing in 2022 and ending in 2024). During this phase, the focus will be on expansion of the Multilingual Linguistic Termbank to also include terms from Literary studies, as well as on expansion of the African Wordnet with the addition of new synsets, updated semantic relations and an updated automatic quality assurance protocol to support manual quality assurance as it is already performed in the project. A focus on dissemination of the resources and the valuable knowledge gained in the process of creating these resources will also be seen in workshops and publications.

Contact Persons:

Professor Lydia Mojapelo, Node Manager: mojapml@unisa.ac.za

Professor Stanley Madonsela, Node Manager: madonfs@unisa.ac.za

Ms Marissa Griesel, Project Manager: griesel.marissa@gmail.com

African Wordnet Project website: https://africanwordnet.wordpress.com/  

Multilingual Linguistic Terminology website: https://linguisticterminology.wordpress.com/