Activities
2024 Conference presentations
- AFRILEX – Role of SADiLaR: advice on formats, backups, licensing, software, platforms; Community of Practice – Reflections (Mr Juan Steyn and Dr Friedel Wolff) – PDF Presentation
- AFRILEX – Making sense of kuningi using a corpus linguistic analysis (Prof Langa Khumalo) – PDF Presentation
- AFRILEX – Corpus-based dictionaries for low-resource languages (Ms Mmasibidi Setaka & Prof Menno van Zaanen) – PDF Presentation
- Global AI Conference– Multilingualism: A case of the South African Centre for Digital Language Resources in developing language resources (Ms Rooweither Mabuya) – PDF Presentation
- Global Southern Forum– African Digital Humanities and the Ethics of AI (Ms Andiswa Bukula) – PDF Presentation
- Southern African Folkore Society– Digitisation as a Catalyst for Preserving Xhosa Oral Literature and Histories in the Age of Artificial Intelligence (Ms Andiswa Bukula) – PDF Presentation
2024 Conference workshops and tutorials and other events
- UCT Language Indaba– Language Resources as Enablers (Prof Langa Khumalo) – PDF Presentation
- SADC Open Science Workshop– “Democratising Knowledge through Open Science“ (Prof Langa Khumalo) – PDF Presentation
- Pre-conference Workshop- Towards a sustainable National Term Bank for the official languages of South Africa: Collaboration vs Fragmentation (Prof Justus Roux, Prof Rufus Gouws,….) – PDF Presentation
- DH-IGNITE @ ALASA 2024– (Ms Jessica Mabaso, Dr Muzi Matfunjwa, Dr Respect Mlambo, Ms Rooweither Mabuya) – PDF Presentation
- Pre-conference Workshop (SAALT)- Assessment literacy and the matter of enhancing translation practices of assessment tools (Prof Tobie Van Dyk) – PDF Presentation
- SALALS Pre-conference Workshop – Introduction to Text Analysis Tools (Dr Muzi Matfunjwa and Dr Respect Mlambo)
DH Colloquia
SADiLaR organizes monthly Digital Humanities colloquia. These typically take place on Wednesdays (in the middle of the month) from 10:00 to 11:00 SAST. During these DH colloquia a wide variety of topics are discussed, mostly on content related to Digital Humanities, sometimes focusing more on the techniques or methodologies used, sometimes focusing more on the applications or application areas.
The DH colloquia are part of Escalator’s Explorer track. You can find more information on Escalator here: https://escalator.sadilar.org/, on Escalator’s championship programme here: https://escalator.sadilar.org/champions/overview/, and on the Explorer track within Escalator’s championship programme here: https://escalator.sadilar.org/champions/explorer/. Also check out the other tracks within the Escalator championship programme as there may be tracks directly related to your interests. If you want to be a member of the Digital Humanities community, you may also want to consider joining the DHCSSza Slack. This page will provide more information on how to join (this is also free): https://escalator.sadilar.org/connect/.
If you have suggestions for speakers at the DH colloquium (or if you want to speak yourself), or if you want to provide feedback, please do not hesitate to contact Prof Menno van Zaanen: menno.vanzaanen@nwu.ac.za.
- Andreas Baumann- Frequent words are semantically more stable than rare ones: what computational modeling, corpus analysis, and psycholinguistic databases can tell us about lexico-semantic change (2 September 2024)
- Tim Brookes- Writing Beyond Writing (14 August 2024)
- Rory du Plessis- “Are they human or are they data?” Digital archives and the creation of humanising stories (17 July 2024)
- Maciej Ogrodniczuk- Universal Discourse: Towards a multilingual model of discourse relations (12 June 2024)
- Johannes Sibeko- Is it written to be read? A case of readability in Sesotho (15 May 2024)
- Iris Auda and Pule kaJanolintji- isiBheqe: First additional script Language Pedagogy in African Digital Orthographies — The case of isiBheqe soHlamvu digital tools for use in language and linguistics learning (10 April 2024)
- Robyn Berghoff and Emanuel Bylund- What do we study when we study multilingualism? A bibliometric(-adjacent) analysis of the field (13 March 2024)
- Hanél Duvenage- Data in healthcare: efforts digitisation and digitalisation (21 February 2024)
- Phillip Ströbel- Innovating Historical Scholarship: The Bullinger Digital Project (31 January 2024)
SWiP Events
- SWiP Writing Competition– The competition provides an opportunity for both new and existing editors to engage in content creation while fostering a sense of community and collaboration.
- Two-day authorship workshops– Conducted across 6 regions targeting 10 universities and training 20 participants per university.
- SWiP side event & exhibition at SFSA2024– Preserving Languages & Scientific Information: Accessible Knowledge for All
- SWiP Project Launch– The event introduced a collaboration aiming to preserve African languages and open up access to scientific information in South Africa.
Projects
African Wordnet and Multilingual Linguistic Terminology
The African Wordnet (AfWN) and Multilingual Linguistic Terminology project is two-pronged and concerns the development of language resources in the form of wordnets for the South African languages as well as the development of linguistics terminology for nine South African languages.
Project Start Date: 1 October 2017
Project Status: Phase 1 & 2 complete; Phase 3 in progress
Communicative Development Inventories for all South Africa’s eleven official languages
The aim of this project is to collect and digitize data on children’s language development from 8 to 30 months and from these data construct and validate Communicative Development Inventories (COIs), which are parent completed questionnaires (for infants 8-18 months and toddlers 16-30 months) about children’s vocabulary, gesture and grammatical abilities for all official South African languages: Setswana, Sesotho, isiXhosa, Xitsonga, Afrikaans, Sesotho sa Leboa, Tshivenda, isiNdebele, Siswati, isiZulu and South African English.
Project Start date: 1 January 2018
Project Status: Phase 1 complete; Phase 2 in progress
Corpus and system development for automatic captioning of official speeches
The primary aim of the proposed project is to create a corpus of automatically transcribed government speeches. The CSIR proposes to start with the current president (Mr Cyril Ramaphosa) and then expand the corpus with speeches made by previous presidents and/or other members of parliament. A secondary aim is to initiate the development of an automatic speech recognition system that could serve as a first step towards addressing the need for automatic captioning expressed by GCIS.
Project Start Date: 1 April 2020
Project Status: Project in progress
Development of a multi-level, multi-genre learner corpus academic writing
Development of a multi-genre, multi-level learner corpus of academic writing in order to develop, refine and implement an online academic writing tool.
Project Start Date: 1 March 2017
Project Status: Project completed
Digitisation of Language resources
Building language resources for the indigenous South African languages through digitization of language and language related text, audio, online and video data. This project entails the continuation of mass digitisation of all 11 official languages of South Africa. Digitisation will also include digital resources for specific needs and projects.
Project Start Date: 1 April 2017
Project Status: Project ongoing
Enabling localised language technology applications: A Computational Wide coverage resource grammar for isiZulu
The CSIR node of SADiLaR recently completed a project with as its main aim to deliver to the research community a high-quality, computational, wide coverage resource grammar (WCRG) for isiZulu. WCRGs unlock opportunities for the South African languages to participate in multilingual research, nationally and internationally.
Project Start Date: 1 April 2021
Project Status: Project completed
Escalator
This programme will ensure the sustainability of the community by contributing to the development of leaders at the public universities and research centres. Through the interventions, both champions and other community members will build skills and confidence in using digital tools and methodologies in their own research and teaching. The program will align closely with other institutional, regional, national and international digital capacity and community development as well as infrastructure initiatives.
Project Start Date: 1 December 2020
Project Status: Project in progress
Exploring fair and unbiased testing
Creation of a Protocol for fair and unbiased testing
Project Start Date: 1 March 2017
Project Status: Project completed
Harvesting existing sources of speech data for HTL development in South Africa
The aim of the project is to explore different possibilities for the (semi-) automatic harvesting of existing sources of speech data to create resources that can be used to develop new and improve on existing speech technologies. Ultimately the aim of the project is to enlarge the size of the existing speech corpora for all South Africa’s official languages. This will entail the collection of appropriate speech and text data for L1 to L6, enabling the development of baseline ASR systems, followed by the development and release of automatically transcribed speech data and updated harvesting procedures for the remaining languages (L7 to L11).
Project Start Date: 1 April 2018
Project Status: Project completed
Health Resources in the South African Languages
A systematic review of available health resources available for the South African Languages, culminating in an index of health resources. A wide range of resources form part of the index, including screening questionnaires, diagnostic assessments, and intervention programmes designed for health professionals.
Project Start Date: 1 November 2018
Project Status: Project completed
Human Language Technologies Audit 2017/2018
This project aims to provide information on the current state of HL T R&D in South Africa. Specifically, to replicate the HL T audit completed in 2009 and to update the information on the various HL T tools, resources and applications identified in the 2009 audit. The tools, resources and applications developed since 2009 will be identified and categorised using a more updated version of the technology matrix previously employed.
Project Start Date: 1 July 2017
Project Status: Project completed
Linguistic corpus enrichment for conjunctively written South African languages
This project was developed under the Nodes Specialisation Project, makes linguistically enriched corpora available for the four official South African languages with a conjunctive orthography, i.e. isiNdebele, isiXhosa, isiZulu, and Siswati.
Project Start Date: 1 October 2017
Project Status: Project completed
Mobile Dictionary application framework
The project aims to develop an open-source hybrid mobile application framework that will allow for online access to a TMS and dictionary API, managed through a TMS API manager (TAM) and offline access to local dictionary content. The framework will create a shared codebase supporting the deployment of both Android and iOS apps from their respective marketplaces. This framework will expand access to dictionaries to allow users to not only gain online access to dictionaries but also provide users with an option to store dictionary content in a local database on mobile devices.
Project Start Date: 1 August 2020
Project Status: Project in progress
Multimedia Digital Corpus of siPhûthî
A multimodal digital corpus of siPhûtî as spoken in South Africa and Lesotho.
The compilation of a multimodal corpus of siPhûtî recordings containing narratives, conversations, interviews, folktales, oral histories and poems is a central feature of the project. The audio and video recordings are transcribed, translated, and annotated. The corpus covers a wide range of topics and includes recordings from a large number of speakers from different generations and geographic locations. This corpus is due to be completed in 2024 and will be made available in the SADiLaR repository.
The siPhûtî corpus is being developed to serve as a resource for community members as well as academics from various disciplines. The corpus provides insights not only for linguists, but also for historians, geographers, and cultural anthropologists. Most importantly, it also serves as a cultural and historical memory for community members.
Project Start Date: 1 August 2019
Project Status: Project in progress
Parallel corpora for English-isiXhosa and English-Siswati
This project entails the collection and processing of bilingual data for the development of English–isiXhosa and English–Siswati corpora.
Project Start Date: 1 July 2019
Project Status: Project completed
Spoken data corpus for Afrikaans, Setswana, Sesotho sa Leboa
The phonetics and phonology of Coloured Afrikaans have as yet barely received any serious attention. This is largely due to the lack of adequate spoken data corpora. Without it, no complete and reliable acoustic descriptions are possible. In relation to this, satisfactory sociolinguistic studies also are unlikely. The main aim of this project is the filling of this gap. The first phase of the project will focus on Coloured Afrikaans. Subsequent projects are planned for Setswana and Sesotho sa Leboa.
Project Start Date: 1 January 2020
Project Status: Project completed
Through the lens Ex achina: using NLP and statistical learning methods to model eyewitness statements and choosing behavior
The primary aim of this project is to develop and put to trial a new, innovative way of analysing and using eyewitness statements and descriptions to predict eyewitness identification performance. This has not been done before with natural language processing or machine learning methods, and this could solve the current difficulty of analysing large quantities of verbal data.
Project Start Date: 1 November 2018
Project Status: Project completed
Towards multilingual academic literacy testing for Secondary and Higher Education
Develop a translation protocol for academic literacy tests (this protocol will also consider the possibility of bias in translations) and translate and refine academic literacy tests for the following languages English, Afrikaans, isiXhosa, isiZulu, Setswana and Sesotho
Project Start Date: 1 January 2020
Project Status: Project in progress
Tracing History Trust: VC Daghregister transcription project
The project is exclusively focussed on the digitisation and transcription of a number of VOC Journals (VC series) of the Cape of Good Hope, vested in the Western Cape Archives and Record Services, Cape Town, in order to make linguistic information available in the public domain. The main purpose of the project is to make available in the public domain historical linguistic material (in particular Afrikaans and Dutch) offering numerous examples of diachronic and synchronic importance which are contained in documents of the Dutch East-India Company (VOC). Of the entire period (1651-1795) the years 1671 to 1679 will be completed during this phase of the project.
Project Start Date: 1 October 2018
Project Status: Project completed