Fourth workshop on Resources for African Indigenous Language (RAIL)

The 4th RAIL (Resources for African Indigenous* Languages) workshop will be co-located with EACL 2023 in Dubrovnik, Croatia. The Resources for African Indigenous Languages (RAIL) workshop is an interdisciplinary platform for researchers working on resources (data collections, tools, etc.) specifically targeted towards African indigenous languages. In particular, it aims to create the conditions for the emergence of a scientific community of practice that focuses on data, as well as computational linguistic tools specifically designed for or applied to indigenous languages found in Africa.

Previous workshops showed that the presented problems (and solutions) are not only applicable to African languages. Many issues are also relevant to other low-resource languages, such as different scripts and properties like tone. As such, these languages share similar challenges. This allows for researchers working on these languages with such properties (including non-African languages) to learn from each other, especially on issues pertaining to language resource development.

The RAIL workshop has several aims. First, it brings together researchers working on African indigenous languages, forming a community of practice for people working on indigenous languages. Second, the workshop aims to reveal currently unknown or unpublished existing resources (corpora, NLP tools, and applications), resulting in a better overview of the current state-of-the-art, and also allows for discussions on novel, desired resources for future research in this area. Third, it enhances sharing of knowledge on the development of low-resource languages. Finally, it enables discussions on how to improve the quality as well as availability of the resources.

The workshop has “Impact of impairments on language resources” as its theme, but submissions on any topic related to properties of African indigenous languages may be accepted. Suggested topics include (but are not limited to) the following:

  • Digital representations of linguistic structures
  • Descriptions of corpora or other data sets of African indigenous languages
  • Building resources for (under resourced) African indigenous languages
  • Developing and using African indigenous languages in the digital age
  • Effectiveness of digital technologies for the development of African indigenous languages
  • Revealing unknown or unpublished existing resources for African indigenous languages
  • Developing desired resources for African indigenous languages
  • Improving quality, availability and accessibility of African indigenous language resources

*: The term indigenous languages used in the RAIL workshop is intended to refer to non-colonial languages (in this case those used in Africa).  In no way is this term used to cause any harm or discomfort to anyone.  Many of these languages were or still are marginalised and the aim of the workshop is to bring attention to the creation, curation, and development of resources for these languages in Africa.


Submission requirements:

We invite papers on original, unpublished work related to the topics of the workshop. Submissions, presenting completed work, may consist of up to eight (8) pages of content plus additional pages of references. The final camera-ready version of accepted long papers are allowed one additional page of content (so up to 9 pages) so that reviewers’ feedback can be incorporated.

Submissions need to use the EACL stylesheets. These can be found at Submission is electronic in PDF through the START system ( Reviewing is double-blind, so make sure to anonymize your submission (e.g., do not provide author names, affiliations, project names, etc.) Limit the amount of self citations (anonymized citations should not be used). Accepted papers will be published in the ACL workshop proceedings.


8:30–9:00Registration and opening remarks
9:00–9:25IsiXhosa Intellectual Traditions Digital Archive: Digitizing isiXhosa texts from 1870–1914; Jonathan Schoots, Amandla Ngwendu, Jacques De Wet and Sanjin Muftic
9:25–9:50Preparing the Vuk’uzenzele and ZA-gov-multilingual South African multilingual corpora; Richard Lastrucci, Jenalea N. Rajab, Matimba Shingange, Daniel Njini and Vukosi Marivate
9:50–10:15Automatic Spell Checker and Correction for Under-represented Spoken Languages: Case Study on Wolof; Thierno Ibrahima Cissé and Fatiha Sadat
10:15–10:55Morning tea break
10:55–11:20SpeechReporting Corpus: annotated corpora of West African traditional narratives; Ekaterina Aplonova, Izabela Jordanoska, Timofey Arkhangelskiy and Tatiana Nikitina
11:20–11:45Analyzing political formation through historical isiXhosa text analysis: Using frequency analysis to examine emerging African Nationalism in South Africa; Jonathan Schoots
11:45–12:05Unsupervised Cross-lingual Word Embedding Representation for English-isiZulu; Derwin T. Ngomane, Vukosi Marivate, Jade Abbott and Rooweither Mabuya
12:05–12:30Investigating Sentiment-Bearing Words- and Emoji-based Distant Supervision Approaches for Sentiment Analysis; Ronny Koena Mabokela, Mpho Roborife and Turguy Celik
12:30–14:00Lunch break
14:00–14:25Towards a Swahili Universal Dependency Treebank: Leveraging the Annotations of the Helsinki Corpus of Swahili; Kenneth M. Steimel, Sandra Kübler and Daniel Dakota
14:25–14:50Evaluating the Sesotho rule-based syllabification system on Sepedi and Setswana words; Johannes Sibeko and Mmasibidi Setaka
14:50–15:15Deep learning and low-resource languages: How much data is enough? A case study of three linguistically distinct South African languages; Roald Eiselen and Tanja Gaustad
15:15–15:40Comparing methods of orthographic conversion for Bàsàá, a language of Cameroon; Alexandra O’Neil, Daniel G. Swanson, Robert Pugh, Francis Tyers and Emmanuel Ngue Um
15:40–16:20Afternoon tea break
16:20–16:45Mini But Mighty: Efficient Multilingual Pretraining with Linguistically-Informed Data Selection; Tolúlọpẹ́ Ògúnrẹ̀mí, Dan Jurafsky, Christopher D. Manning
16:45–17:10Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities; Atnafu Lambebo Tonja, Tadesse Destaw Belay, Israel Abebe Azime, Abinew Ali Ayele, Moges Mehamed, Olga Kolesnikova and Seid Muhie Yimam
17:10–17:35A Corpus-Based List of Frequently Used Words in Sesotho; Johannes Sibeko and Orphée De Clercq
17:35–18:00Vowels and the Igala Language Resources; Mahmud Mohammed Momoh
18:00–18:05Closing remarks


Important dates:

Submission deadline 13 February 2023 20 February 2023

Date of notification 13 March 2023 (a little bit later due to missing reviews)

Camera ready deadline 27 March 2023

RAIL workshop 6 May 2023

Programme Committee

Ayodele James Akinola, Michigan Technological University, USA
Dimakatso Mathe, University of Limpopo, South Africa
Elsabé Taljard, University of Pretoria, South Africa
Emmanuel Ngue Um, University of Yaoundé I, Cameroon
Febe de Wet, Stellenbosch University, South Africa
Friedel Wolff, South African Centre for Digital Language Resources (SADiLaR), South Africa
Gilles-Maurice de Schryver, Ghent University, Belgium
Hussein Suleman, University of Cape Town, South Africa
Innocentia Mhlambi, University of the Witwatersrand, South Africa
Johannes Sibeko, Nelson Mandela University, South Africa
Lorraine Shabangu, University of the Witwatersrand, South Africa
Makanjuola Ogunleye, Virginia Tech, USA
Maria Keet, University of Cape Town, South Africa
Marissa Griesel, University of South Africa, South Africa
Mpho Raborife, University of Johannesburg, South Africa
Muzi Matfunjwa, South African Centre for Digital Language Resources (SADiLaR), South Africa
Papi Lemeko, Central University of Technology, South Africa
Pule Phindane, Central University of Technology, South Africa
Richard Ajah, University of Uyo, Nigeria
Roald Eiselen, Centre for Text Technology, North-West University, South Africa
Sara Petrollino, Leiden University, the Netherlands
Sibonelo Dlamini, University of KwaZulu-Natal, South Africa
Tanja Gaustad van Zaanen, Centre for Text Technology, North-West University, South Africa
Tunde Ope-Davies, University of Lagos, Nigeria
Valencia Wagner, Sol Plaatje University, South Africa
Vukosi Marivate, University of Pretoria, South Africa


Organising Committee

Rooweither Mabuya, South African Centre for Digital Language Resources (SADiLaR), South Africa
Don Mthobela, Cam Foundation
Mmasibidi Setaka, South African Centre for Digital Language Resources (SADiLaR), South Africa
Menno van Zaanen, South African Centre for Digital Language Resources (SADiLaR), South Africa