Building a termbank for SA’s official languages

Linguistics is the scientific study of language. In a multilingual society like South Africa, we need linguists who can provide the skills, insights and expertise to ensure all our South African languages remain relevant in our fast-changing world. To build up the field of linguistics in nine of South Africa’s official languages the University of South Africa (UNISA) node of SADiLaR, which focuses on language resource development, has created a linguistics termbank which is now freely available online. Users need only register and create a profile for themselves in the Lexonomy web portal, navigate to the Multilingual Linguistics Termbank and from there they can search common linguistics terms in Setswana, isiZulu, isiXhosa, Sesotho sa Leboa, Sesotho, Siswati, Xitsonga, Tshivenḓa or isiNdebele with English as the pivot language.

“The goal of the Multilingual Linguistics Termbank is twofold,” says Marissa Griesel, project manager and specialist researcher for the UNISA node of SADiLaR. “The first is to provide an open-access resource to linguistics students as a multilingual classroom support tool. The second is to begin to standardise linguistics terms in the languages taught in the Department of African Languages at Unisa and at other higher education institutions, thereby strengthening these languages of scholarship in the field of linguistics.”

The team who worked on the project included linguists from various education, government, and private institutions.

An open educational resource for linguistics

Because UNISA is a comprehensive open distance e-learning institution it was a priority to build a resource that also gives back to their own community. The termbank is accessible to anybody with Internet access and it offers definitions for 500 linguistics terms in the nine languages.

It is therefore possible for an isiXhosa linguistics student to input a term they are learning about and get back a definition for that term in their mother tongue, as well as a usage example to contextualise the term, and cross-reference the data from a related language such as isiZulu.

“We hope that by enabling students to access complex terminology in the South African languages they are better able to engage with the subject matter and enhance their understanding of this field of specialisation,” says Professor Mampaka Lydia Mojapelo, node co-manager and associate professor in UNISA’s Department of African Languages.

Building the termbank

Both founding members of this project, Professor Mampaka Lydia Mojapelo and Professor Rose Masubelele, are linguists in Sesotho sa Leboa and isiZulu respectively and taught various linguistics courses in UNISA’s Department of African Languages. Their research into the availability of language resources was the driving force behind the project – setting up initial termbanks for these two languages. Linguists from the other languages were then invited to use the pilot lists to expand the resource to their languages.

“The first step in the project was to collect existing terms that our academic predecessors worked so hard to establish in the linguistics classroom– these were mainly contained in old resources that are now out of print,” says Mojapelo. “We referenced old study guides, dictionaries and other linguistics textbooks used in the department to identify key terms in Sesotho sa Leboa and isiZulu. It was important to also include the English equivalent as a binding or pivot language so that cross-referencing could be done.”

After compiling this initial list of terms in the pilot project, linguists from the other languages were invited to join the project and expand it to include seven additional languages.

“This was not just a translation exercise. Each language has its own grammar, structure and linguistic phenomena that does not necessarily occur in the others,” says Griesel. The team that formed was made up of linguists with strong teaching experience, their experience and knowledge of the specific languages they were working on was critical to the success of the project.”

“As we embarked on the project,” says Griesel, “we realised that for many of the languages different terms were used for the same linguistic concepts across institutions.”

The final step was a workshop where linguistics teachers and stakeholders who were not initially part of the project and from different institutions could hash out the terms in the various languages and work towards standardising them.

“The original workshop date had to be delayed because of COVID-19 and lockdowns,” says Griesel. “But eventually, in July 2022, we managed to make it happen.”

Team members invited at least two linguists from each one of the nine indigenous languages. The total number of delegates was 42 and for two days the group discussed the definitions, usage examples and what problems might be common across the different languages.

The final product: a freely available Multilingual Linguistics Termbank for South African languages

The final product is now available for anyone to access through an online portal: Multilingual Linguistics Termbank.

“While the product is now available there is always more work to be done, either in expanding the term list or improving the quality. Our next focus will be to add terms from the literary domain,” says Griesel. “Users are welcome to send comments and suggestions for improvement to the project team via the project website. Future projects and research outputs are also shared on this platform.”

The termbank is also available on the SADiLaR repository as an XML file for anyone to download and use for resource development.

“This is a concerted effort to create a large-scale shared open-educational resource for linguistics in the official South African languages. The terms have always existed in landmark textbooks, some of which are out of print, and this resource aims at preserving the knowledge captured. This is also at the core of the National Development Plan 2030, which states that ‘Quality education encourages technology shifts and innovation that are necessary to solve present-day challenges’.” says Mojapelo.

Griesel adds: “But more than that we hope it will be a first step to growing the study and development of our indigenous languages to support a truly multilingual society in the fourth industrial revolution.”