The South African Centre for Digital Language Resources (SADiLaR) hosted a successful fourth workshop on Resources for African Indigenous Languages (RAIL) in Dubrovnik, Croatia. The annual workshop, which took place on 6 May 2023 as part of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), was organised by SADiLaR’s Rooweither Mabuya, Mmasibidi Setaka and Prof Menno van Zaanen, and the CAM Foundation’s Don Mthobela.
“The RAIL workshop was a great success. All our attendees were fully engaged throughout the session even though it was a full-day workshop,” says Mabuya, who co-chaired the event with Setaka. “The workshop provides an interdisciplinary platform for researchers working on African indigenous languages, particularly those languages that are under-resourced,” she explains. “It brings together researchers interested in showcasing their research; provides an overview of the current availability of African indigenous language resources, including data collections as well as tools; and allows for information sharing and discussions on improving the quality and availability of the resources.”
Many African indigenous languages currently have no or very limited resources available and are often structurally quite different from the more well-resourced languages, thus they require the development and use of specialised techniques.
Growing a scientific community
“By bringing together researchers from different fields, such as (computational) linguistics, sociolinguistics and language technology, to discuss the development of language resources for African indigenous languages, we hope to boost research in this fields,” says Setaka. “Ultimately, we aim to create the conditions for the emergence of a scientific community of practice that focuses on data, as well as tools, specifically designed for or applied to indigenous languages found in Africa.”
Both Mabuya and Setaka, who are digital humanities researchers at SADiLaR in IsiZulu and Sesotho respectively, have been involved with the RAIL workshop since its inception in 2019. The first two workshops were virtual events co-located at the Language Resources and Evaluation Conference (LREC) in 2020 and Digital Humanities Association of Southern Africa (DHASA) conference in 2021, whereas the third RAIL workshop was hosted as an in-person event in 2023 at the 10th Southern African Microlinguistics Workshop in Potchefstroom.
“This year’s workshop, which was one of 13 workshops accepted at the EACL 2023 conference, had a total of 14 papers and one findings paper presented,” Mabuya remarks. “It was a hybrid format as some participants were not able to travel – seven papers were presented in person and eight were presented virtually.”
Excellent feedback
Mabuya and Setaka were fortunate to travel to Croatia and attend the EACL 2023 conference in person, thanks to travel grants that they received to present their papers at the conference. “Mmasibidi Setaka received the Diversity and Inclusion Subsidy aimed at supporting scholars from marginalised regions, whereas I received a grant for the student volunteer programme aimed at supporting early-career scholars to attend the conference and also assist with conference duties,” Mabuya says.
The two researchers both presented papers on behalf of their co-authors at the RAIL workshop. Mabuya’s paper was titled ‘Unsupervised Cross-lingual Word Embedding Representation for English-isiZulu’ (with Derwin T Ngomane, Vukosi Marivate, Jade Abbott and Rooweither Mabuya as authors).
“Our paper received such great feedback in comments and questions from the audience,” Mabuya recalls. “One of the participants even asked for a meet-up with my co-authors as he needed some assistance in his own research which was similar to ours.”
Setaka was equally pleased with the response she received for the paper she presented, titled ‘Evaluating the Sesotho rule-based syllabification system on Sepedi and Setswana words’ (with Johannes Sibeko and Mmasibidi Setaka as authors). “People were very interested, and so delighted to learn that there’s a workshop dedicated to natural language processing (NLP) in Africa.”
Making new connections
Reflecting on the conference and overall success of the workshop, Setaka says: “The conference brought together a diversity of people interested in the many aspects of NLP. The workshop itself was a great success with a lot of participation from the audience. The fact that our workshop was accepted at EACL was a great highlight for me, considering the nature of EACL and its standing in the NLP community.”
Mabuya adds that she made some great connections. “It was a big conference with numerous interesting talks and presentations, and I got to meet some amazing scholars. Regarding our workshop, it was great that we had authors who have been submitting their research to RAIL each year since it started. This shows the quality of our workshop and the work published in our proceedings. We also have an excellent programme committee who assist with reviewing the submissions.”
The Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023) are now available, and the SADiLaR team is looking forward to receiving submissions for the fifth instalment of RAIL, once the call for papers is out. “The aim is to grow the workshop to greater heights and also allow other scholars to help organise it going forward,” Setaka concludes.
(Written by Birgit Ottermann)