The CSIR Voice Computing Research Group: Speech Node

The Council for Scientific and Industrial Research (CSIR) speech node, situated in Pretoria, is involved in localised language technology development, and focuses on speech technologies such as automatic speech recognition (ASR) and text-to-speech (TTS) and controlled natural language processing (CNLP). The CSIR’s TTS offering, known as Qfrency is the only commercial TTS product catering for all of the South African official languages. The Qfrency TTS suite consists of a growing number of TTS voices, in male and female genders, in all of the official languages, as well as a male child voice.

The Node’s TTS research focuses on improving the naturalness of their TTS voices with a particular emphasis on tone and prosody in the African languages and building TTS voices using state-of-the-art techniques. The ASR research focuses on semi-supervised harvesting of audio data required to develop speech recognition systems on a par with international offerings, and ASR system development for the local languages using state-of-the-art techniques. On the CNLP side, the Speech Node is following an approach known as grammar-based language modelling, via Grammatical Framework, a state-of-the-art multilingual grammar engineering framework. This approach allows for highly accurate and rich multilingual natural language processing, starting in limited domains and for use-case specific language fragments and working towards increased coverage. This is often useful in speech technology applications, and especially so in domains such as education or healthcare where reliability is essential.

The focus of the Speech Node is on the education domain, where its technologies are well-placed to support early literacy development and accessibility. Capabilities and resources required in this domain include expressive speech synthesis that can automatically adapt to the text of the domain, voice adaptation work in order to create more voices with fewer resources, ASR data sets in the literacy domain, adaptation of speech-to-text systems for younger users, speech assessment, and high accuracy multilingual natural language generation and understanding. In developing and deploying these technologies, the node’s HCI capability ensures the involvement of clients and end-users from the early design stages to ensure maximum impact.

The aim is to enable technologies like Isinkwe and Ngiyaqonda, which are apps designed to help learners overcome barriers to reading and learning. While Isinkwe focuses on the accessibility space, Ngiyaqonda tackles home language literacy development and language learning. Both these apps pull together various capabilities of the Speech Node into integrated real-world solutions.

Contact person Dr Laurette Marais, node manager: lmarais@csir.co.za