Stakeholder engagement is a crucial part of the South African Centre for Digital Language Resources (SADiLaR)’s strategic mission. With the adoption of a new five-year strategy plan, the infrastructure is dedicated to promoting its mandate and establishing a local and global presence to attract potential partners in the domains of natural language processing and digital humanities. This commitment was reinforced by the infrastructure’s acceptance as a full member of CLARIN, a distributed European digital infrastructure consortium, as of 1 January 2024.
Professor Justus Roux, South Africa’s official delegate to the CLARIN General Assembly, was tasked by SADiLaR to support the Centre by assessing research and development projects taking place within the CLARIN network which could potentially lead to wider cooperation with South African academics working in the same fields. His support contract with SADiLaR commenced in February 2024 and runs until November 2024.
In April 2024, Professor Roux had encouraging discussions with some members of the CLARIN Board in Utrecht, Netherlands followed by a presentation at the Radboud University in Nijmegen on SADiLaR’s strategic vision for the next five years.
Prof Roux’s presentation highlighted SADiLaR’s strategic views as follows:
SADiLaR – the next five years
Over the next five years, SADiLaR’s infrastructure will prioritise several strategic objectives to strengthen the impact of readily available language-related technologies and digital humanities in driving transformative research. In addition, it aims to support the implementation of language policies in achieving an inclusive and transformed digital future for South Africa.
Research focus
SADiLaR is committed to advancing the scholarship of human language technologies and digital humanities in South Africa and across the African continent. The organisation aims to strengthen the knowledge production and dissemination pathways in the Global South, thereby contributing to global knowledge production.
Technology and resources
SADiLaR aims to enhance the development, deployment, and maintenance of software and technology in the domains of digital language resources and digital humanities by continuously strengthening the technical infrastructure.
Projects and services
Since its inception, SADiLaR has sustained and enabled a broad range of multiphase projects by increasing internal and external synergies.
Communication and brand visibility
The infrastructure commits itself to further enhance and promote effective communication, increase brand visibility, and engage with stakeholders in a meaningful way.
Digitisation, enablement, and promotion of South Africa’s official languages
SADiLaR will continuously contribute to and drive the vision to ensure a digital future for all official languages in South Africa. SADiLaR is involved in all three stages ensuring the longer-term availability of data and tools in the value chain. In a simpler form, the value chain from SADiLaR’s perspective can be summarised as follows:
1. Raw (unprocessed) data
This relates to creating and digitising analogue data that must be made digital. This process includes cleaning and refining to ensure that good-quality metadata is included. It also entails the creation/maintenance and updates of tools and technologies required to parse African language data.
2. Dataset/technology available for reuse
After the refinement and creation of tools or the processing of data to a final format, the data and tools are released as openly as is practically possible to further downstream innovation.
SADiLaR’s current mandate and the way forward
As it stands, SADiLaR plays a critical role in providing long-term preservation and maintenance of digital language resources through its repository. In this way, SADiLaR provides a place where digital language resource building blocks can be developed in specialised projects run by or in collaboration with the Centre which are then made openly available for reuse in downstream technologies. These building blocks consist of, but are not limited to, text, speech, and multimodal datasets; monolingual/parallel corpora; translation memories/glossaries; human language technologies such as language parsers for African languages; more user-friendly tools such as spelling checkers created at SADiLaR’s CTexT node, and text-to-speech technologies developed at SADiLaR’s CSIR node. The same parallel corpora used to support machine translation systems hosted at SADiLaR can be utilised by other national and international entities to refine or develop new machine translation systems, thereby enabling downstream innovation.
Affiliation with CLARIN
South Africa is the first member country outside of Europe, and SADiLaR is the proud representative body for South Africa. Currently, CLARIN ERIC has 24 EU country members and two observers. “The CLARIN network aligns impeccably with SADiLaR’s strategic objective of strengthening stakeholder relationships and building mutually beneficial partnerships. The network will therefore extend to increase the impact of the infrastructure in the digital humanities space,’’ concludes Prof Roux.
SADiLaR’s full strategy document is accessible here.
(Written by Lihle Sosibo)