{"id":6685,"date":"2023-07-20T14:32:34","date_gmt":"2023-07-20T14:32:34","guid":{"rendered":"https:\/\/sadilar.org\/repository-sadilar\/"},"modified":"2023-08-29T12:56:16","modified_gmt":"2023-08-29T12:56:16","slug":"repository-sadilar","status":"publish","type":"post","link":"https:\/\/sadilar.org\/en\/repository-sadilar\/","title":{"rendered":"SADiLaR\u2019s Language Resource Repository Empowers Language Research"},"content":{"rendered":"<div class=\"googlefontscall\"><\/div>\n<div class=\"pagebuilderckparams\" data-colorpalettefromtemplate=\"\" data-colorpalettefromsettings=\",,,,\" data-styles=\"\"><\/div>\n<div id=\"row_ID1689863287157\" class=\"rowck ckstack3 ckstack2 ckstack1 uick-sortable\" style=\"position: relative;\" data-gutter=\"2%\" data-nb=\"1\">\n<style class=\"ckcolumnwidth\">[data-gutter=\"2%\"][data-nb=\"1\"]:not(.ckadvancedlayout) [data-width=\"100\"] {width:100%;}[data-gutter=\"2%\"][data-nb=\"1\"].ckadvancedlayout [data-width=\"100\"] {width:100%;}<\/style>\n<div class=\"inner animate clearfix\">\n<div id=\"block_ID1689863287157\" class=\"blockck\" style=\"position: relative;\" data-real-width=\"100%\" data-width=\"100\">\n<div class=\"ckstyle\"><\/div>\n<div class=\"inner animate resizable\">\n<div class=\"innercontent uick-sortable\">\n<div id=\"ID1689863327125\" class=\"cktype\" style=\"position: relative;\" data-type=\"text\">\n<div class=\"ckstyle\"><\/div>\n<div class=\"cktext inner\" style=\"position: relative;\" spellcheck=\"false\">\n<p>The curation, distribution and maintenance of reusable digital text and speech resources for South Africa&#8217;s official languages is of vital concern for research and development in the field of language technology. The data is important not only for the development of tools for facilitation of communication between different language groups but also for empowering local languages for use in modern communication systems. The South African Centre for Digital Language Resources (SADiLaR) has taken on this crucial guardian role through its\u00a0<a href=\"https:\/\/repo.sadilar.org\/handle\/20.500.12185\/540\">Language Resource Repository<\/a>. To date, it contains hundreds of items in multiple languages which are available to the public through an open-access platform.<\/p>\n<p>\u201cSADiLaR\u2019s Language Resource Repository has over 400 records of items in multiple languages, even a few languages from outside South Africa,\u201d says Dr Friedel Wolff, SADiLaR\u2019s Technical Manager. \u201cSome of the items themselves describe a resource that is itself multilingual or, for example, software that supports several languages. Not every resource in your language might interest you, but it might just be what some researcher or software engineer needs to build something exciting for your language.\u201d<\/p>\n<p><strong>Giving permanence to resources<\/strong><\/p>\n<p>The various types of available resources range from electronic text and speech data (such as domain-specific text collections, wordlists, dictionaries, translation memories and aligned multilingual corpora) to multimodal resources and tools, applications and platforms that support the processing of data and development of new technologies.<\/p>\n<p>According to Wolff, the research data stored in SADiLaR\u2019s repository is of immeasurable value to researchers. \u201cMuch of the research data on the repository was costly and time-consuming to create. Some required expert knowledge or computing power that few of us have access to,\u201d he comments. \u201cThe repository makes these available to anyone who is interested, and the idea with repositories like these is that the repository should outlive any specific research topic, researcher&#8217;s interest or industry fad \u2013 in other words, it tries to give some permanence to these resources. Providing this permanence is maybe too hard and tedious for many of the creators, and not always easy to justify in their place of employment. This provides a centralised access point, without trying to take away any of the credit to the people who put the work into creating them,\u201d he explains.<\/p>\n<p><strong>Central point of access<\/strong><\/p>\n<p>Dr Benito Trollip, a digital humanities researcher at SADiLaR, and enthusiastic user and contributor to the repository, echoes the above. \u201cThe SADiLaR Language Resource Repository provides a (in principle) permanent platform for the availing of linguistic data to the broader community (that includes not only researchers). It takes one curious person to see what is out there for less well-known languages and they start developing useful technology,\u201d says Trollip.<\/p>\n<p>When it comes to the repository being a central point of access, Trollip emphasizes how difficult it can be to utilise existing linguistic data source if it, or information about it (is of a sensitive nature), is not made available.<\/p>\n<p>\u201cIt often took a lot of time and hard work to generate and curate that data. In my humble opinion, we should move away from the mindset of owning, developing and using data solely for our own gain or professional and financial benefit, and rush toward a mindset of sharing data to enable and empower the community at large,\u201d he says.<\/p>\n<p><strong>Integral tool<\/strong><\/p>\n<p>Dr Laurette Marais, manager of SADiLaR\u2019s speech node at the\u00a0Council for Scientific and Industrial Research (CSIR), and her team have experienced the advantages of SADilaR\u2019s repository as both contributors and users: they shared their valuable resources with others, which enabled the development of commercial products, and also benefited by accessing resources that they did not create themselves.<\/p>\n<p>\u201cFor the CSIR Voice Computing research group, also known as the\u00a0<a href=\"..\/index.php\/en\/about\/sadilar-nodes\/csir-node\">Speech Node of SADiLaR<\/a>, the Resource Repository has become an integral tool in the planning and execution of our research agenda, both as a reliable venue for sharing the data that we gather and produce, but also as a first port of call when we require language resources for our projects. A notable contribution of ours to the repository was high-quality speech data from our Lwazi 3 project, which we have also used to develop our commercial suite of TTS voices, named Qfrency,\u201d says Marais.<\/p>\n<p>\u201cWe have in the past and still are contributing speech data aimed at training automatic speech recognition (ASR) systems. Furthermore, the repository has served as an essential source when we require text data in any of the South African languages. I believe that any student or researcher in language technology in South Africa should be familiar with the repository and what it has to offer, especially given the resource scarce nature of our languages.\u201d<\/p>\n<p><strong>A short history<\/strong><\/p>\n<p>Interestingly, the repository actually predates SADiLaR. It was launched in 2012 by the North-West University\u2019s Centre for Text Technology as the Resource Management Agency (RMA) with funding from the Department of Arts and Culture\u2019s National Centre for Human Language Technologies. When SADiLaR was launched in 2019 with the support of the Department of Science and Innovation (following an incubation and development phase since 2016), the RMA was incorporated in SADiLaR\u2019s Language Resource Repository. SADiLaR took over full responsibility for the curation and maintenance of the repository thereafter.<\/p>\n<p><strong>Submit a resource\u00a0<\/strong><\/p>\n<p>If you have developed a language resource and wish to make it usable and\/or discoverable for others, SADiLaR\u2019s repository is an excellent option. It is a secure environment with the correct licensing procedures for anyone with\u00a0research data in the fields of languages, humanities and social sciences. For more information on how to submit a resource, please visit the \u00a0<a href=\"https:\/\/www.sadilar.org\/index.php\/en\/guidelines\/resource-guidelines\">SADiLaR Resource Guidelines<\/a>\u00a0page.<\/p>\n<p><em>(Written by Birgit Ottermann)<\/em><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"ckstyle\"><\/div>\n<\/div>\n<div id=\"row_ID1689863316849\" class=\"rowck ckstack3 ckstack2 ckstack1 uick-sortable\" style=\"position: relative;\" data-gutter=\"2%\" data-nb=\"1\">\n<style class=\"ckcolumnwidth\">[data-gutter=\"2%\"][data-nb=\"1\"]:not(.ckadvancedlayout) [data-width=\"100\"] {width:100%;}[data-gutter=\"2%\"][data-nb=\"1\"].ckadvancedlayout [data-width=\"100\"] {width:100%;}<\/style>\n<div class=\"inner animate clearfix\">\n<div id=\"block_ID1689863316849\" class=\"blockck\" style=\"position: relative;\" data-real-width=\"100%\" data-width=\"100\">\n<div class=\"ckstyle\"><\/div>\n<\/div>\n<\/div>\n<div class=\"ckstyle\"><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>The curation, distribution and maintenance of reusable digital text and speech resources for South Africa&#8217;s official languages is of vital concern for research and development in the field of language technology. The data is important not only for the development of tools for facilitation of communication between different language groups but also for empowering local [&hellip;]<\/p>\n","protected":false},"author":252,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[730],"tags":[],"class_list":["post-6685","post","type-post","status-publish","format-standard","hentry","category-general"],"acf":[],"_links":{"self":[{"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/posts\/6685","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/users\/252"}],"replies":[{"embeddable":true,"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/comments?post=6685"}],"version-history":[{"count":1,"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/posts\/6685\/revisions"}],"predecessor-version":[{"id":7012,"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/posts\/6685\/revisions\/7012"}],"wp:attachment":[{"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/media?parent=6685"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/categories?post=6685"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/tags?post=6685"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}