This project aims to provide information on the current state of HLT R&D in South Africa. Specifically, to replicate the HLT audit completed in 2009 and to update the information on the various HLT tools, resources and applications identified in the 2009 audit. The tools, resources and applications developed since 2009 will be identified and categorised using a more updated version of the technology matrix previously employed.
The African Wordnet (AWN) and Linguistic Terminology project is two-pronged and concerns the development of language resources in the form of word nets for a variety of African languages as well as the development of linguistics terminology for all official African languages. The project comprises of two work packages: Work Package 1 deals with expanding the scope of the existing African Wordnet while Work Package 2 involves the expansion of the Open Educational Resource Term Bank (OERTB) with newly extracted linguistic terminology.
Link with a dedicated SADiLaR server to host the AWN
One article per annum in a peer reviewed journal or peer-reviewed conference proceedings.
Training workshops and meetings with partners (including at least 1 international guest who will be funded by SADilaR)
Digitisation of outdated study guides (in collaboration with the University of Pretoria (UP)
Term extraction of at least 500 terms for Sesotho sa Leboa and 500 terms for isiZulu (in collaboration with UP) from outdated study guides
Quality assurance on 500 extracted terms for Sesotho sa Leboa and 500 extracted terms for isiZulu
Development of 500 new term definitions in English by subject experts
Standardisation of terms in Sesotho sa Leboa and isiZulu
One (1) Training workshop and meetings with partners
Development of 250 new terms each for Setswana; Sesotho; isiXhosa; isiNdebele; Siswati; Xitsonga and Tshivenga by subject experts
11. Quality assurance on 250 extracted terms each for Setswana; Sesotho; isiXhosa; isiNdebele; Siswati; Xitsonga and Tshivenga
Project type: <To be completed> Project Start date: <To be completed> Project Status: Completed [Finalising]
The African Wordnet (AWN) and Linguistic Terminology project is two-pronged and concerns the development of language resources in the form of wordnets for a variety of African languages as well as the development of linguistics terminology for all official African languages. In the initial project, development of the AWN was limited to 7 of the indigenous South African languages. With this expansion, we plan to add the remaining 2 languages so that all 9 indigenous South African languages are represented in the African Wordnet. In doing so, we will ensure that further development for all languages is stimulated and the platform for further wordnet development in any of the languages is created.
The extended project comprises two work packages: Work Package 1 deals with expanding the scope of the existing African Wordnet including the usage of the AWN data for language learning while Work Package 2 involves a wrap up workshop for the Open Educational Resource Term Bank (OERTB) with newly extracted linguistic terminology.
Training workshop (technical, linguistic, lexicographic and corpus extraction) and meetings with partners. In the workshop, all experts (linguists, lexicographers, computer scientists) will develop a joint and general approach for the usage of AWN and other dictionary data for language learning. Open technical problems like data conversion and regular updates will be discussed;
Development of 2 000 new synsets across 2 languages (including basic synsets, usage examples and definitions, plus quality assurance) by language experts;
Usage of the African Wordnet data (AWN) for language learning:
3.1.1 Existing language data, especially dictionary data for the South African languages and AWN data will be made available online (both as a website and mobile app). The following challenges will be addressed:
(1) Defining the requirements from a meta-lexicographical point of view and designing the front end suitable for different dictionary user groups.
(2) Design of the data structure for the dictionary data according to (1), using a database system. The AWN data will be included and updated regularly.
(3) Programming of the website and mobile app, presenting the data according to the metalexicographical guidelines in (1) using the data as described in (2)
One (1) dissemination workshop to showcase the expanded OERTB and meetings with partners;
Press release on the expansion of the OERTB for publication in Unisa's internal newsletter and for distribution to SADilaR stakeholders