The CSIR Meraka Institute focuses on shaping South Africa’s digital future and is known for the research, development and innovation in the information and communication technology sector. Within the Institute, the Voice Computing (VC) Research Group focuses on solving communication challenges that South Africa face as a result of the lack of language resources and data. The Research Group delivers text-to-speech, automatic speech recognition and human language analytics to support the government’s service delivery and provide access to information. This, in turn, facilitates smarter decision-making.
On 1 July 2017, the CSIR node formally commenced the project entitled “Human Language Technology Audit 2017/2018”. The aim of the project is to update the previous HLT audit of 2009 that was conducted by Ms A Sharma-Grover as part of her research theses. The main reason for undertaking the updated HLT audit is the increase in HLT research and development activity in South Africa, specifically an increase in the number of institutions conducting HLT research and development. The HLT audit will provide SADiLaR with updated information on HLT components (software and models) and general language resources (data) for the South African languages, both in South Africa, and internationally.
The project consists of a number of phases, including the audit design, audit instrument development, audit execution, dynamic audit updates and audit results consolidation and reporting.
Two of the phases have been completed, namely:
- the audit design, including research into audit methodologies, previous audits in the field as well as workshops with HLT experts; and
- the audit instrument that was developed as an online tool using open source software which is easy to transfer to SADiLaR once the audit execution has been completed.
The HLT audit went live in December 2017 and a number of experts in the HLT community were invited to participate in the project. The CSIR has been monitoring the progress of the audit responses and is continuing with the dynamic audit updates work. As part of the project, the node tested various accessibility tools in order to extend the audit to a wider community. These accessibility tools were found to be compatible with most functions of the online audit.
Furthermore, the CSIR node has continued to do research on the dynamic audit updates, which is similar to work by CLARIN and the Language Resource and Evaluation map. The HLT audit will deliver valuable data and resources that will all be presented to SADiLaR and to the broader HLT community.