Latest News
The one-day SWiP event comprised an exciting panel discussion by experts in the fields of preservation of languages, culture, and digitisation of information, and an introductory mini workshop to Wikipedia focusing on editing, translating, and making content available online.
The one-day SWiP event comprised an exciting panel discussion by experts in the fields of preservation of languages, culture, and digitisation of information, and an introductory mini workshop to Wikipedia focusing on editing, translating, and making content available online.
-
Digitisation of Language resources
Project Type: NodeProject Start Date: 1 April 2017Project Status: Ongoing Project Aims and Motivation: The UP digitisation node focuses on the preservation of invaluable language (and cultural) resources for the African languages by digitising textual, video and audio material, and providing language communities with access to these digital resources via the SADiLaR repository. Digitised content delivered by…
-
Linguistic corpus enrichment for conjunctively written South African languages
Project Type: NodeStart Date: 1 October 2017Project Status: Completed and delivered Project Aims: This project was developed under the Nodes Specialisation Project, makes linguistically enriched corpora available for the four official South African languages with a conjunctive orthography, i.e. isiNdebele, isiXhosa, isiZulu, and Siswati. The parallel corpora consist of approximately 50,000 tokens each, aligned…
-
Mobile Dictionary application framework
Project Type: NodeProject Start Date: 1 August 2020Project Status: In Progress Project Aims: The project aims to develop an open-source hybrid mobile application framework that will allow for online access to a TMS and dictionary API, managed through a TMS API manager (TAM) and offline access to local dictionary content. The framework will create a…
-
Parallel corpora for English-isiXhosa and English-Siswati
Project Type: NodeProject Start Date: 1 July 2019Project Status: Completed and delivered English-Siswati corpus Project Aims: This project entailed the collection and processing of bilingual data to develop a 2-million-word English–Siswati parallel-aligned corpus that can be used to train machine translation systems. The data was acquired by crawling various South African web domains and human…
-
Project Expansion and further refinement of a multi-level, multi-genre learner corpus academic writing
Project Type: Node Project Start Date: 1 October 2020 Project Status: Completed [Finalising] Project Aims: Redevelopment of the Write‐it Course in SADiLaR’s Moodle environment Project Deliverables: Moodle Course 1.1 Task Analysis 1.2 Planning your writing 1.3 searching for information 1.4 Academic writing 1.5 Structure of a paragraph 1.6 Introduction and conclusion 1.7 Introduction to argument…
-
Towards multilingual academic literacy testing for Secondary and Higher Education
Project Type: NodeProject Start Date: 1 January 2020 Project Status: In Progress Project Aims: Develop a translation protocol for academic literacy tests (this protocol will also consider the possibility of bias in translations) and translate and refine academic literacy tests for the following languages English, Afrikaans, isiXhosa, isiZulu, Setswana and Sesotho Project Deliverables: Literature…
-
Spoken data corpus for Afrikaans, Setswana, Sesotho sa Leboa
Project Type: Open CallProject Start Date: 1 January 2020Project Status: Completed Project Aims: The phonetics and phonology of Coloured Afrikaans have as yet barely received any serious attention. This is largely due to the lack of adequate spoken data corpora. Without it, no complete and reliable acoustic descriptions are possible. In relation to this, satisfactory…
-
Corpus and system development for automatic captioning of official speeches
Project Type: SADiLaR Node – CSIR Speech NodeProject Start Date: 1 April 2020 Project Status: In progress Project Aims: The primary aim of the proposed project is to create a corpus of automatically transcribed government speeches. The CSIR proposes to start with the current president (Mr Cyril Ramaphosa) and then expand the corpus with speeches made…
-
SADiLaR Publications
List of published and/or submitted research output (conference or journal papers, book chapters and other academic dissemination) Bosch, S and M, Griesel. 2018. African Wordnet: facilitating language learning in African languages. 9th Global Wordnet Conference, Singapore: Nanyang Technological University (NTU). Type: Conference Paper Link: (1) https://aclanthology.org/2018.gwc-1.36/ (2) http://compling.hss.ntu.edu.sg/events/2018-gwc/pdfs/GWC2018_paper_22.pdf Baumann, A and Wissing, D.P. 2018. Stabilising…