Menno van Zaanen, professor in Digital Humanities at the South African Centre for Digital Language Resources (SADiLaR) recently spent a productive two weeks at the University of Gdańsk in Poland to conduct teaching activities and interdisciplinary research in digital humanities. He was invited by Dr Karolina Rudnicka from the Faculty of Languages as part of the university’s 4th edition of the ‘Visiting Professors UG’ programme, which is aimed at increasing the internationalisation of education by exposing students and researchers to excellent researchers from around the world.
Van Zaanen’s visit during May 2024 involved giving two guest lectures, collaborating with students on a small joint research project, and working with Dr Rudnicka on an interdisciplinary research publication.
“I first met Menno in May 2023 during a visit to SADiLaR where we discovered our shared research interests,” recalls Dr Rudnicka, who is an assistant professor at the Institute of Applied Linguistics within the Faculty of Languages. “Knowing he had never been to Gdańsk or Poland, I saw an opportunity through our university’s ‘Visiting Professors’ programme. Menno was excited to visit, so we applied and successfully secured the funding. It was a productive and enjoyable visit for everyone. Additionally, it was a pleasure to show Menno our campus, city, and the beautiful surrounding nature.”
Van Zaanen shared his personal journey from computer science to digital humanities in his first guest lecture, at the Institute of Applied Linguistics, by recounting some examples of his initial research in digital humanities and highlighting the pitfalls he experienced. “The field of digital humanities merges digital tools and computational techniques with humanities and social sciences (HSS), opening avenues for innovative research, but it presents both opportunities and challenges,” Van Zaanen says. “By sharing some of my stumbling blocks, I hope to stop other researchers from making the same mistakes. I also introduced SADiLaR’s Escalator project, which is aimed at fostering widespread adoption of digital research methodologies in HSS through capacity development and awareness initiatives,” he adds.
Designing formal models to learn languages
For his second guest lecture, Van Zaanen discussed the formal means of describing natural language learning at the University of Gdańsk’s Institute of Computer Science. “My lecture was about how we can design formal models – essentially using mathematics – to describe how we can learn languages. This is mostly focused on syntax, which describes the rules of how sentences can be put together from words. It requires information (and choices) on the language formalism, which defines how we can represent which words can be placed where in the sentence, and the learning formalism. The learning formalism describes what the process of learning is (what information is given to a learner, for instance) and what efficient learning means (from a mathematical perspective).”
Exploring the social networks of an Oscar Wilde novel
For the small joint research project with the students, Van Zaanen actively took part in the research activities and supervised the outputs provided by the students.
“The collaboration with the students was wonderful. Together with the natural language processing students, we explored the social networks (characters and their relationships) in translations of Oscar Wilde’s novel The Picture of Dorian Gray. We compared the original English text to translations in German, Polish and Dutch, with the expectation that the social networks would be the same. However, they were not. We now need to figure out exactly why they are not the same – it could be due to computational issues, translation preferences or language preferences.”
One of the goals of the small joint research project was to present the research at a conference and publish the results in a journal. “Following a week of very focused work, we submitted abstracts to two conferences – one has already been accepted while the other is still awaiting an outcome,” Van Zaanen remarks. “I think it is fantastic that this collaboration leads to academic outputs. As far as I know, it will be the first for all of the students.”
Says Rudnicka: “Our students had a great time meeting Menno and collaborating on a joint research project. We’re still working on it with them, as they will present the results at a Young Science Congress in July, and we will also be writing the article.”
AI-powered writing assistants of the English language
Van Zaanen also made time to work on an interdisciplinary research publication with Dr Rudnicka concerning the influence of AI-powered writing assistants of the English language. An important part of this was customising techniques, including sequence alignment, developed and applied by Van Zaanen. The visit allowed for this to happen and for experiments to run on first datasets prepared by Rudnicka.
Fifth workshop on Resources for African Indigenous Languages (RAIL)
Following his visit to Poland, Van Zaanen travelled to Torino, Italy, where he attended the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), along with SADiLaR’s digital humanities researcher in Siswati, Dr Muzi Matfunjwa. The five-day hybrid conference (from 20 to 25 May 2024) brought together researchers and practitioners in computational linguistics, speech, multimodality, and natural language processing, with special attention to evaluation and the development of resources that support work in these areas.
SADiLaR hosted the Fifth Resources for African Indigenous Languages (RAIL) workshop on the last day of the LREC-COLING 2024 conference.
“The Resources for African Indigenous Languages (RAIL) workshop is an interdisciplinary platform for researchers working on resources (such as data collections and tools), specifically targeted towards African indigenous languages,” says Van Zaanen, who is a member of the RAIL 2024 organising committee, together with SADiLaR colleagues Rooweither Mabuya, Mmasibidi Setaka and Dr Muzi Matfunjwa. “Our aim is to create the conditions for the emergence of a scientific community of practice that focuses on data, as well as computational linguistic tools specifically designed for or applied to indigenous languages found in Africa.”
Creating resources for less-resourced languages
The theme for this year’s RAIL workshop was ‘Creating resources for less-resourced languages’.
“Many African languages are under-resourced while only a few of them are somewhat better resourced. These languages often share interesting properties such as writing systems, or tone, making them different from most high-resourced languages,” Van Zaanen explains. “From a computational perspective, these languages lack enough linguistic resources to undertake high-level development of Human Language Technologies (HLT) and Natural Language Processing (NLP) tools, which in turn impedes the development of African languages in these areas.”
According to Van Zaanen, past workshops made it clear that the problems and solutions presented are not only applicable to African languages but also relevant to many other low-resource languages. “Because these languages share similar challenges, this workshop provided researchers with opportunities to work collaboratively on issues of language resource development and learn from each other.”
The fifth workshop was well attended and offered an interesting range of presentations from researchers across the African continent, including Ethiopia, Senegal, Algeria, Nigeria and Southern Africa. “We received a number of high-quality submissions this year, which I personally really enjoyed,” says Van Zaanen. “And, as always, I loved the informal vibe of the RAIL workshop, which allows people to really ask questions and discuss things, instead of just criticising each other.”
Written by Birgit Ottermann
Photo by Marcel Jakubowski, University of Gdańsk