2nd Workshop on Resources for African Indigenous Languages


2nd workshop on Resources for African Indigenous Language (RAIL)


The South African Centre for Digital Language Resources (SADiLaR) is organising the second RAIL workshop in the field of African Indigenous Language Resources. This workshop aims to bring together researchers who are interested in showcasing their research and thereby boosting the field of African indigenous languages. This provides an overview of the current state-of-the-art and emphasises availability of African indigenous language resources, including both data and tools. Additionally, it will allow for information sharing among researchers interested in African indigenous languages and also start discussions on improving the quality and availability of the resources.  Many African indigenous languages currently have no or very limited resources available and, additionally, they are often structurally quite different from more well-resourced languages, requiring the development and use of specialized techniques.  By bringing together researchers from different fields (e.g., (computational) linguistics, sociolinguistics, language technology) to discuss the development of language resources for African indigenous languages, we hope to boost research in this field.

The Resources for African Indigenous Languages (RAIL) workshop is an interdisciplinary platform for researchers working on resources (data collections, tools, etc.) specifically targeted towards African indigenous languages.  It aims to create the conditions for the emergence of a scientific community of practice that focuses on data, as well as tools, specifically designed for or applied to indigenous languages found in Africa. 


Suggested topics include the following:

  • Computational linguistics for African indigenous languages
  • Descriptions of corpora or other data sets of African indigenous languages
  • Building resources for (under resourced) African indigenous languages
  • Developing and using African indigenous languages in the digital age
  • Effectiveness of digital technologies for the development of African indigenous languages
  • Revealing unknown or unpublished existing resources for African indigenous languages
  • Developing desired resources for African indigenous languages
  • Improving quality, availability and accessibility of African indigenous language resources


Submission Guidelines

Link for submissions: DHASA Conference – ConfTool – Login

RAIL 2021 asks for the following  type of submissions:

  • RAIL asks for full papers from 4 pages to 8 pages (plus more pages for references if needed), which must strictly follow the DHASA styles guide which will be available on the conference website Style guides | DHASA 2021 
  • Papers must be submitted through the DHASA submission platform (ConfTool) and will be peer-reviewed.

When sending in your submission, be sure to select RAIL 2021 Submissions.


Important dates:

  • Submission deadline: 13 September 2021 
  • Extension on submission deadline: 20 September 2021
  • Final extension on submission deadline: 30 September 2021
  • Date of notification: 15 October 2021
  • Extension on date of notification: 25 October 2021
  • Camera ready copy deadline: 10 November 2021
  • RAIL Workshop: 29 November – 08:30 – 13:00 SAST

Programme: Session Chair – Benito Trollip

08:30 – 08:40 Opening and Welcoming Mmasibidi Setaka
08:40 – 09:00 Development of linguistically annotated parallel language resources for four South African languages Tanja Gaustad, Martin J. Puttkammer
09:00 – 09:20 New uses for old books: Description of digitised corpora based Setswana language collection at WITS Cullen Africana Collection Malebogo Thabong, Nina Lewin, Taariq Surtee
09:20 – 09:40 Digitising Afrikaans: Establishing a protocol for digitalizing historical sources for Early Afrikaans (1675-1925) as apossible template for indigenous
South African languages
Roné Wierenga, Wannie Carstens
09:40 – 10:00 Investigating the feasibility of harvesting broadcast speech data to develop resources for
South African languages
Jaco Badenhorst, Febe de Wet
10:00 – 10:20
A novel method for redefining language ecology and endangerment in Nigeria – towards a geospatial
Imelda Udoh, Moses Ekpenyong, Eno-Abasi Urua, Harrison Adeniyi, Gregory Obiamalu, Ayo Yusuff, Ogbonna Anyanwu, Ebitare Obikudo
10:20 – 10:25 Masakhane: Bridging the gap between NLP practitioners and linguists  Olanrewaju Samuel
10:25 – 10:30 Carpentries session Mmasibidi Setaka
10:30 – 11:00 BREAK  
11:00 – 11:20 An Open Source System for Crowd Sourcing an African Language Short Story Corpus Benson Muite
11:20 – 11:40 Training Cross-Lingual embeddings for Setswana and Sepedi Mack Makgatho, Vukosi Marivate, Tshephisho Sefara, Valencia Wagner
11:40 – 12:00 Wordsmith Tools as an Enabler for Text Analysis Rooweither Mabuya
12:00 – 12:20 Canonical Segmentation and Syntactic Morpheme Tagging of Four Resourcescarce Nguni Languages Jakobus S. du Toit, Martin J. Puttkammer
12:20 – 12:40 Using MonoConc Pro to teach and learn lexical collocations in Xitsonga Respect Mlambo, Muzi Matfunjwa
12:40 – 13:00 CLOSING  



The RAIL workshop will be co-located with the DHASA conference, and therefore registration will run through the DHASA website.

Participants will have to register for the conference and choose to attend the RAIL workshop during the registration process. 


Organising committee

Rooweither Mabuya
Mmasibidi Setaka
Deon Du Plessis
Dimakatso Mathe
Respect Mlambo
Liané Van Den Bergh
Cascious Mofokeng
Muzi Matfunjwa
South African centre for Digital Language Resources (SADiLaR), South Africa


Program committee

Ayodele James Akinola, Michigan Technological University, USA
Sonja Bosch, University of South Africa, South Africa
Elias Malete, University of the Free State, South Africa
Emmanuel Ngue Um, University of Yaoundé I, Cameroon
Pule Phindane, Central University of Technology, South Africa
Felix Ameka, Leiden University, Netherlands
Elsabé Taljard, University of Pretoria, South Africa
Mpho Raborife, University of Johannesburg, South Africa
Marissa Griesel, University of South Africa, South Africa
Roald Eiselen, North-West Universty, South Africa
Sree Thottempudi, South African Centre for Digital Language Resources, South Africa
Deon du Plessis, South African Centre for Digital Language Resources, South Africa
Dimakatso Mathe, South African Centre for Digital Language Resources, South Africa
Benito Trollip, South African Centre for Digital Language Resources, South Africa
Muzi Matfunjwa, South African Centre for Digital Language Resource, South Africa