{"id":6457,"date":"2021-11-30T13:29:03","date_gmt":"2021-11-30T13:29:03","guid":{"rendered":"https:\/\/sadilar.org\/corpus-and-system-development\/"},"modified":"2021-11-30T13:29:03","modified_gmt":"2021-11-30T13:29:03","slug":"corpus-and-system-development","status":"publish","type":"post","link":"https:\/\/sadilar.org\/en\/corpus-and-system-development\/","title":{"rendered":"Corpus and system development for automatic captioning of official speeches"},"content":{"rendered":"<div class=\"googlefontscall\"><\/div>\n<div class=\"pagebuilderckparams\" data-colorpalettefromtemplate=\"\" data-colorpalettefromsettings=\",,,,\" data-styles=\"\"><\/div>\n<div class=\"rowck ckstack3 ckstack2 ckstack1 uick-sortable\" id=\"row_ID1638278822762\" data-gutter=\"2%\" data-nb=\"1\" style=\"position: relative;\">\n<style class=\"ckcolumnwidth\">[data-gutter=\"2%\"][data-nb=\"1\"]:not(.ckadvancedlayout) [data-width=\"100\"] {width:100%;}[data-gutter=\"2%\"][data-nb=\"1\"].ckadvancedlayout [data-width=\"100\"] {width:100%;}<\/style>\n<div class=\"inner animate clearfix\">\n<div class=\"blockck\" id=\"block_ID1638278822762\" data-real-width=\"100%\" data-width=\"100\" style=\"position: relative;\">\n<div class=\"ckstyle\"><\/div>\n<div class=\"inner animate resizable\">\n<div class=\"innercontent uick-sortable\">\n<div id=\"ID1638278822787\" class=\"cktype\" data-type=\"text\" style=\"position: relative;\">\n<div class=\"tab_effects ckprops\" fieldslist=\"\"><\/div>\n<div class=\"tab_blocstyles ckprops\" blocbackgroundpositionend=\"100\" blocbackgrounddirection=\"topbottom\" blocbackgroundimageattachment=\"scroll\" blocbackgroundimagerepeat=\"no-repeat\" blocbackgroundimagesize=\"auto\" blocbordertopstyle=\"solid\" blocborderrightstyle=\"solid\" blocborderbottomstyle=\"solid\" blocborderleftstyle=\"solid\" blocbordersstyle=\"solid\" blocshadowinset=\"0\" fieldslist=\"blocbackgroundpositionend,blocbackgrounddirection,blocbackgroundimageattachment,blocbackgroundimagerepeat,blocbackgroundimagesize,blocalignementleft,blocalignementcenter,blocalignementright,blocalignementjustify,blocbordertopstyle,blocborderrightstyle,blocborderbottomstyle,blocborderleftstyle,blocbordersstyle,blocshadowinset\"><\/div>\n<div class=\"tab_edition ckprops\" fieldslist=\"\"><\/div>\n<div class=\"ckstyle\">\n<style><\/style>\n<\/div>\n<div class=\"cktext inner\" style=\"position: relative;\" spellcheck=\"false\">\n<p><\/p>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\" data-mce-style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><strong>Project Type:<\/strong><span style=\"font-weight: 400;\" data-mce-style=\"font-weight: 400;\"> SADiLaR Node &#8211; CSIR Speech Node<\/span><span style=\"font-weight: 400;\" data-mce-style=\"font-weight: 400;\"><br \/><\/span><strong>Project Start Date: <\/strong><span style=\"font-weight: 400;\" data-mce-style=\"font-weight: 400;\">1 April 2020&nbsp;<br \/><\/span><strong>Project Status<\/strong><span style=\"font-weight: 400;\" data-mce-style=\"font-weight: 400;\">: In progress<\/span><\/span><\/p>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\" data-mce-style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><strong>Project Aims:&nbsp;<\/strong><\/span><\/p>\n<p><span style=\"font-weight: 400; font-family: 'trebuchet ms', geneva, sans-serif;\" data-mce-style=\"font-weight: 400; font-family: 'trebuchet ms', geneva, sans-serif;\">The primary aim of the proposed project is to create a corpus of automatically transcribed government speeches. The CSIR proposes to start with the current president (Mr Cyril Ramaphosa) and then expand the corpus with speeches made by previous presidents and\/or other members of parliament. A secondary aim is to initiate the development of an automatic speech recognition system that could serve as a first step towards addressing the need for automatic captioning expressed by GCIS.<\/span><\/p>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\" data-mce-style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><strong>Project Deliverables:<\/strong><\/span><\/p>\n<ul>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\" data-mce-style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Resources transferred from GCIS archive will be utilised to produce the following:&nbsp;<\/span>\n<ul>\n<li><span style=\"font-weight: 400; font-family: 'trebuchet ms', geneva, sans-serif;\" data-mce-style=\"font-weight: 400; font-family: 'trebuchet ms', geneva, sans-serif;\">Evaluation data set (5 hours in total)&nbsp;<\/span><\/li>\n<li><span style=\"font-weight: 400; font-family: 'trebuchet ms', geneva, sans-serif;\" data-mce-style=\"font-weight: 400; font-family: 'trebuchet ms', geneva, sans-serif;\">Report on ASR performance evaluation<\/span><\/li>\n<\/ul>\n<\/li>\n<li>Corpus and related documentation transferred to SADiLaR (Depending on the availability of speeches from the GCIS archive the project will provide) approximately 10 hours of speech per year spanning a 7-year period which should yield approximately 100 hours of speech in total. This corpus will be released under a&nbsp;non-commercial, non- exclusive, research license, as GCIS is the proprietary owner thereof<\/li>\n<li>Research outputs describing the released corpus, the acoustic analysis and findings<\/li>\n<li>The baseline captioning system.<\/li>\n<\/ul>\n<\/div><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"ckstyle\"><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Project Type: SADiLaR Node &#8211; CSIR Speech NodeProject Start Date: 1 April 2020&nbsp;Project Status: In progress Project Aims:&nbsp; The primary aim of the proposed project is to create a corpus of automatically transcribed government speeches. The CSIR proposes to start with the current president (Mr Cyril Ramaphosa) and then expand the corpus with speeches made [&hellip;]<\/p>\n","protected":false},"author":246,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[730],"tags":[],"class_list":["post-6457","post","type-post","status-publish","format-standard","hentry","category-general"],"acf":[],"_links":{"self":[{"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/posts\/6457","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/users\/246"}],"replies":[{"embeddable":true,"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/comments?post=6457"}],"version-history":[{"count":0,"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/posts\/6457\/revisions"}],"wp:attachment":[{"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/media?parent=6457"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/categories?post=6457"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/tags?post=6457"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}