First workshop on Resources for African Indigenous Languages (RAIL)

First workshop on Resources for African Indigenous Languages (RAIL)

Free, online workshop

LREC 2020

The South African Centre for Digital Language Resources (SADiLaR) is organizing a workshop (originally expected to be held at the LREC 2020 conference in Marseille, France) in the field of African Indigenous Language Resources. This workshop aims to bring together researchers who are interested in showcasing their research and thereby boosting the field of African indigenous languages. This provides an overview of the current state-of-the-art and emphasizes availability of African indigenous language resources, including both data and tools. Additionally, it allows for information sharing among researchers interested in African indigenous languages as well as starting discussions on improving the quality and availability of the resources.  Many African indigenous languages currently have no or very limited resources available and, additionally, they are often structurally quite different from more well-resourced languages, requiring the development and use of specialized techniques. By bringing together researchers from different fields (e.g., (computational) linguistics, sociolinguistics, language technology) to discuss the development of language resources for African indigenous languages, we hope to boost research in this field.

The Resources for African Indigenous Languages (RAIL) workshop is an interdisciplinary platform for researchers working on resources (data collections, tools, etc.) specifically targeted towards African indigenous languages.  It aims to create the conditions for the emergence of a scientific community of practice that focuses on data, as well as tools, specifically designed for or applied to indigenous languages found in Africa. With the UNESCO-supported International Year of Indigenous Languages, there is currently much interest in indigenous languages.  The Permanent Forum on Indigenous Issues mentioned that “40 percent of the estimated 6,700 languages spoken around the world were in danger of disappearing” and the “languages represent complex systems of knowledge and communication and should be recognized as a strategic national resource for development, peace building and reconciliation.” As such, the workshop falls within one of the hot topic areas of this year’s conference: “Less Resourced and Endangered Languages”.

Topics include the following:

  • Language collections (description and creation)
  • Lexicography
  • Syntactic analysis
  • Computational linguistic tools
  • Wordnets


The RAIL workshop will, unfortunately, not be held in Marseille, France this year, due to the Covid-19 pandemic. Instead, the workshop will take place online. Participation is free. However, if you want to participate, you will need to register on EventBrite. Details on how to join the workshop will be sent out to registered participants. The workshop will take place on Saturday 16 May 2020 from 9:00 until 13:00 SAST.


  • Free virtual workshop on African indigenous languages
  • 16 May 9:00 to 13:00 SAST
  • Register through



09:00-09:10 Opening and introduction

09:10-09:30 Endangered African Languages Featured in a Digital Collection: The Case of the ‡Khomani San | Hugh Brody Collection

Kerry Jones and Sanjin Muftic

09:30-09:50 Usability and Accessibility of Bantu Language Dictionaries in the Digital Age: Mobile Access in an Open Environment

Thomas Eckart, Sonja Bosch, Uwe Quasthoff, Erik Körner, Dirk Goldhahn and Simon Kaleschke

09:50-10:10 Investigating an Approach for Low Resource Language Dataset Creation, Curation and Classification: Setswana and Sepedi

Vukosi Marivate, Tshephisho Sefara and Abiodun Modupe

10:10-10:30 Comparing Neural Network Parsers for a Less-resourced and Morphologically-rich Language: Amharic Dependency Parser

Binyam Ephrem Seyoum, Yusuke Miyao and Baye Yimam Mekonnen

10:30-10:50 Mobilizing Metadata: Open Data Kit (ODK) for Language Resource Development in East Africa

Richard Griscom

10:50-11:20 Coffee break

11:20-11:40 A Computational Grammar of Ga

Lars Hellan

11:40-12:00 Navigating Challenges of Multilingual Resource Development for Under-Resourced Languages: The Case of the African Wordnet Project

Marissa Griesel and Sonja Bosch

12:00-12:20 Building Collaboration-based Resources in Endowed African Languages: Case of NTeALan Dictionaries Platform

Elvis Mboning Tchiaze, Jean Marc Bassahak, Daniel Baleba, Ornella Wandji and Jules Assoumou

12:20-13:00 Closing

Identify, Describe and Share your LRs!

Describing your LRs in the LRE Map is now a normal practice in the submission procedure of LREC (introduced in 2010 and adopted by other conferences). To continue the efforts initiated at LREC 2014 about “Sharing LRs” (data, tools, web-services, etc.), authors will have the possibility,  when submitting a paper, to upload LRs in a special LREC repository. This effort of sharing LRs, linked to the LRE Map for their description, may become a new “regular” feature for conferences in our field, thus contributing to creating a common repository where everyone can deposit and share data.

As scientific work requires accurate citations of referenced work so as to allow the community to understand the whole context and also replicate the experiments conducted by other researchers, LREC 2020 endorses the need to uniquely Identify LRs through the use of the International Standard Language Resource Number (ISLRN,, a Persistent Unique Identifier to be assigned to each Language Resource. The assignment of ISLRNs to LRs cited in LREC papers  will be offered at submission time.

Submission Guidelines

RAIL 2020 asks for full papers from 4 pages to 8 pages (plus more pages for references if needed) , which must strictly follow the LREC stylesheet ( which will be available on the conference website. Papers must be submitted through START ( and will be peer-reviewed. 

Important dates

  • Submission deadline: 16 February 2020
  • Extended deadline: 23 February 2020
  • Date of notification: 13 March 2020
  • Camera-ready copy deadline: 2 April 2020
  • Workshop online: 9:00 until 13:00 SAST, 16 May 2020

Organizing Committee

South African centre for Digital Language Resources (SADiLaR), South Africa

Programme committee

  • Richard Ajah, University of Uyo, Nigeria
  • Ayodele James Akinola, Chrisland University, Nigeria
  • Felix Ameka, Leiden University, the Netherlands
  • Sonja Bosch, University of South Africa, South Africa
  • Ibrahima Cissé, University of Humanities, Mali
  • Roald Eiselen, Eiselen software consulting, South Africa
  • Tanja Gaustad, Centre for Text Technology, South Africa
  • Elias Malete, University of the Free State, South Africa
  • Dimakatso Mathe, South African centre for Digital Language Resources, South Africa
  • Elias Mathipa, University of South Africa, South Africa
  • Fekede Menuta, Hawassa University, Ethiopia
  • Innocentia Mhlambi, Wits University, South Africa
  • Emmanuel Ngue Um, University of Yaoundé I, Cameroon
  • Guy de Pauw, Antwerp University and Textgain, Belgium
  • Sara Petrollino, Leiden University, the Netherlands
  • Pule Phindane, Central University of Technology, South Africa
  • Danie Prinsloo, University of Pretoria, South Africa
  • Martin Puttkammer, Centre for Text Technology, South Africa
  • Justus Roux, Stellenbosch University, South Africa
  • Msindisi Sam, Rhodes University, South Africa
  • Gilles-Maurice de Schryver, Ghent University, Belgium
  • Lorraine Shabangu, Bangula Lingo Centre, South Africa
  • Elsabé Taljard, University of Pretoria, South Africa


This workshop is supported by