University of Pretoria: Digitisation Node

To ensure that the African languages are able to take up their rightful place in the digital age, the development of digitised language resources is essential. The South African Centre for Digital Language Resources, a national centre supported by the Department of Science and Innovation (DSI), works, in collaboration with the University of Pretoria (UP), as the digitisation node of SADiLaR. This node is housed within the Department of African Languages in the UP Faculty of Humanities.

The main function of the UP digitisation node is to create language resources for African languages by digitising different kinds of language material. The digitised output is then made available on the SADiLaR platform to be used by researchers and developers of Human Language Technologies. This platform is an open resource and the data which is hosted on this platform will be freely available.

The Department of African languages is fortunate to have access to a well-stocked resource centre, which has been built up over many years and contains valuable language data in the form of books, audio and video material for African languages. By making these data available in digital (machine-readable) format, the data are not only preserved, but become an invaluable tool in the creation of digital resources for African language development.

Key projects

Our digitisation activities are centred on three types of data: textual material, audio and audio-visual.

Textual material

Converting text data into digitised format is the main focus of the digitisation node. Many African language books are out of print and no longer commercially available. With the necessary permission and copyright clearance from publishers, these books are converted to digital format through Optical Character Recognition (OCR) scanning. Other textual material being digitised includes back copies of popular magazines, a collection of dictionary index cards bequeathed to the department by the late N J van Warmelo and copies of MA dissertations and PhD theses written in African languages.


Audio material that is digitised is mostly data on audio cassettes, many of which have been retrieved from the archives of the first language laboratory at UP. Other valuable material includes audio notes made by linguists specialising in African languages as they conducted field work. These date back to the early sixties.


Among the valuable material that UP has in its collection, are priceless video recordings of interviews with Northern Sotho authors. Video recordings of lectures on linguistics and literature in isiZulu, Setswana and Sesotho sa Leboa also form part of the collection.

Contact details

Ms Michelle Goosen, UP Project Manager:
Professor Danie Prinsloo – UP Node Manager –
Professor Elsabe Taljard – UP Node Manager –