SADiLaR offers a variety of services that support the curation, development, distribution, and maintenance of language resources. These resources are made available through the SADiLaR Language Resource Repository as downloadable content through the Language Resource Catalogue and Language Resource Index.

Resources in the repository include:

  • electronic text and speech data (such as domain-specific text collections, wordlists, dictionaries, translation memories and aligned multilingual corpora)
  • multimodal resources
  • tools, applications and platforms that support the processing of data and development of new technologies.

How to use the resources

Access to the repository is open to anyone with an interest in language technologies. Resources in the repository are searchable or discoverable but not always downloadable.

The repository contains metadata for all assets in the repository, those that can be downloaded and those that can be requested by the person listed as the contact person for the specific asset.

Resources where the files can be downloaded can be done so after accepting the terms of use.

The downloadable resources can include different formats including sound recordings in MP3, PDF documents and usable applications amongst others.

If you wish to submit a resource please see detailed instructions for this on the SADiLaR website, under resource guidelines.

Who can submit?

Anyone with research data in the fields of languages, humanities and social sciences can submit their data to the SADiLaR repository. It is an easier process if your university is registered with SAFIRE, but we offer support throughout the submission process. So, if you have a dataset we can help you to ensure that your research is made available on a secure environment with the right licensing procedures.

For more information on how to submit a resource visit our Resource Guidelines page.

The following resources are currently available from SADiLaR and its partner nodes:

  • Language Resource Index

    A digital index of language resources that are available for South African languages from various research and private institutions, both nationally and internationally. All Index items contain metadata, including developer details, specifications, and contact information. Not all resources listed in the Index are available as distributable resources from SADiLaR, but does include all resources that are available for download via the Language Resource Catalogue.

    Language researchers that have data sets available can register their resources, digital or otherwise, with the Language Resource Index through the Resource procedures.

  • Language Resource Catalogue

    A digital collection of language resources, in various modalities, that are available for download from SADiLaR.
    Data providers who would like to distribute their resources on the SADiLaR site can upload metadata and resources via the repository. All resources are reviewed by SADiLaR before being made available.

  • Student data repository

    Are you a Masters or PhD student? Would you like your research data to be used in future research projects? Do you want to contribute to the academic community and give access to your research data via a trusted and internationally-accredited repository?

    SADiLaR offers a repository to researchers to access language resources, datasets and tools in all of the official languages of South Africa as well as a few other African languages. This repository is free to use by anyone and it also offers a platform for submission of your data. Join the open science movement by allowing SADiLaR to support you to make your research data available for maximum impact and visibility. We will also help you to structure your data according to international standards.