14 May 2026

From 21-23 April 2026, ESCALATOR and the Council for Scientific and Industrial Research (CSIR) facilitated a three-day workshop at the University of Limpopo, bringing together participants from Humanities and Computer Science backgrounds to explore speech corpus development, Digital Humanities (DH), and language resource creation for South African languages.
The opening sessions, led by ESCALATOR’s Mrs Marissa Griesel, set the tone. Rather than diving straight into the technical, participants first explored how digital technologies are reshaping humanities research and why that matters for a country as linguistically rich as South Africa. From there, the conversation expanded into SADiLaR’s research infrastructure and the platforms already in place for sharing and collaborating on language resources nationally.
The CSIR’s NLP Research Group, represented by Ms Ilana Wilken, Mr Franco Mak, and Mr Sthembiso Mkhwanazi, shared developments in neural text-to-speech, speech recognition, and multilingual AI applications. A recurring theme was the urgent need for locally developed datasets and for more voices to contribute to language resource creation.
Much of the workshop’s energy came from its hands-on focus. Participants worked through the full lifecycle of building a speech dataset: selecting prompts, recording audio, verifying and annotating data, and thinking through long-term storage. National initiatives like the Lwazi and NCHLT speech corpora provided instructive examples of just how complex, and how worthwhile, this work can be at scale. Sessions on ethics, consent, FAIR data principles, and POPIA compliance grounded everything in responsible practice.
By the final day, participants were recording with mobile devices, external microphones, Audacity, and camcorders, putting theory directly into practice.
Dr Annastasia Lekganyane perhaps captured the spirit of the three days best: “I came here for the development of a language corpus in research, but I’m leaving with knowledge of ASR, using multiple devices for data collection, capturing voice prompts, and [converting] video into audio files.”
For many, the three days offered more than expected, not just new ideas to think about, but practical knowledge and skills to take back to their work.
BY: Marissa Griesel