{"id":6470,"date":"2021-11-30T13:36:21","date_gmt":"2021-11-30T13:36:21","guid":{"rendered":"https:\/\/sadilar.org\/linguistic-corpus-enrichment\/"},"modified":"2021-11-30T13:36:21","modified_gmt":"2021-11-30T13:36:21","slug":"linguistic-corpus-enrichment","status":"publish","type":"post","link":"https:\/\/sadilar.org\/en\/linguistic-corpus-enrichment\/","title":{"rendered":"Linguistic corpus enrichment for conjunctively written South African languages"},"content":{"rendered":"<div class=\"googlefontscall\"><\/div>\n<div class=\"pagebuilderckparams\" data-colorpalettefromtemplate=\"\" data-colorpalettefromsettings=\",,,,\" data-styles=\"\"><\/div>\n<div class=\"rowck ckstack3 ckstack2 ckstack1 uick-sortable\" id=\"row_ID1638279331714\" data-gutter=\"2%\" data-nb=\"1\" style=\"position: relative;\">\n<style class=\"ckcolumnwidth\">[data-gutter=\"2%\"][data-nb=\"1\"]:not(.ckadvancedlayout) [data-width=\"100\"] {width:100%;}[data-gutter=\"2%\"][data-nb=\"1\"].ckadvancedlayout [data-width=\"100\"] {width:100%;}<\/style>\n<div class=\"inner animate clearfix\">\n<div class=\"blockck\" id=\"block_ID1638279331714\" data-real-width=\"100%\" data-width=\"100\" style=\"position: relative;\">\n<div class=\"ckstyle\"><\/div>\n<div class=\"inner animate resizable\">\n<div class=\"innercontent uick-sortable\">\n<div id=\"ID1638279331735\" class=\"cktype\" data-type=\"text\" style=\"position: relative;\">\n<div class=\"tab_effects ckprops\" fieldslist=\"\"><\/div>\n<div class=\"tab_blocstyles ckprops\" blocbackgroundpositionend=\"100\" blocbackgrounddirection=\"topbottom\" blocbackgroundimageattachment=\"scroll\" blocbackgroundimagerepeat=\"no-repeat\" blocbackgroundimagesize=\"auto\" blocbordertopstyle=\"solid\" blocborderrightstyle=\"solid\" blocborderbottomstyle=\"solid\" blocborderleftstyle=\"solid\" blocbordersstyle=\"solid\" blocshadowinset=\"0\" fieldslist=\"blocbackgroundpositionend,blocbackgrounddirection,blocbackgroundimageattachment,blocbackgroundimagerepeat,blocbackgroundimagesize,blocalignementleft,blocalignementcenter,blocalignementright,blocalignementjustify,blocbordertopstyle,blocborderrightstyle,blocborderbottomstyle,blocborderleftstyle,blocbordersstyle,blocshadowinset\"><\/div>\n<div class=\"tab_edition ckprops\" fieldslist=\"\"><\/div>\n<div class=\"ckstyle\">\n<style><\/style>\n<\/div>\n<div class=\"cktext inner\" style=\"position: relative;\" spellcheck=\"false\">\n<p><strong>Project Type: <\/strong>Node<br \/><strong>Start Date: <\/strong>1 October 2017<br \/><strong>Project Status: <\/strong>Completed and delivered<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Project Aims:<\/strong><\/p>\n<p class=\"paragraph\" style=\"vertical-align: baseline;\" data-mce-style=\"vertical-align: baseline;\"><span class=\"normaltextrun\"><span lang=\"EN-ZA\" style=\"font-size: 10.0pt; font-family: 'Arial',sans-serif;\" data-mce-style=\"font-size: 10.0pt; font-family: 'Arial',sans-serif;\">This project was developed under the <span style=\"color: black; background: white;\" data-mce-style=\"color: black; background: white;\">Nodes Specialisation Project,<\/span> makes linguistically enriched corpora available for the four official South African languages with a conjunctive orthography, i.e. isiNdebele, isiXhosa, isiZulu, and Siswati. <\/span><\/span><\/p>\n<p class=\"paragraph\" style=\"margin: 0cm; vertical-align: baseline;\" data-mce-style=\"margin: 0cm; vertical-align: baseline;\"><span class=\"normaltextrun\"><span lang=\"EN-ZA\" style=\"font-size: 10.0pt; font-family: 'Arial',sans-serif;\" data-mce-style=\"font-size: 10.0pt; font-family: 'Arial',sans-serif;\">The parallel corpora consist of approximately 50,000 tokens each, aligned between all four languages and English and annotated for morphology, part of speech and lemmas. Based on the annotated corpora, we also developed core technologies, namely lemmatisers, POS taggers and morphological analysers for these four languages.<\/span><\/span><span class=\"eop\"><span lang=\"EN-ZA\" style=\"font-size: 10.0pt; font-family: 'Arial',sans-serif;\" data-mce-style=\"font-size: 10.0pt; font-family: 'Arial',sans-serif;\">&nbsp;<\/span><\/span><\/p>\n<p class=\"paragraph\" style=\"margin: 0cm; vertical-align: baseline;\" data-mce-style=\"margin: 0cm; vertical-align: baseline;\">&nbsp;<\/p>\n<p><strong>Project Deliverables:<\/strong><\/p>\n<ul>\n<li>50,000 token <a href=\"https:\/\/hdl.handle.net\/20.500.12185\/546\" data-mce-href=\"https:\/\/hdl.handle.net\/20.500.12185\/546\">parallel corpus for four languages<\/a><\/li>\n<li><a href=\"https:\/\/hdl.handle.net\/20.500.12185\/548\" data-mce-href=\"https:\/\/hdl.handle.net\/20.500.12185\/548\">Lemmatisers, POS taggers and morphological analysers<\/a> for four languages<\/li>\n<\/ul>\n<p><strong>Contact details:<\/strong><\/p>\n<p>Please contact <a href=\"mailto:ctext@nwu.ac.za\">ctext@nwu.ac.za<\/a>&nbsp;<\/p>\n<\/div><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"ckstyle\"><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Project Type: NodeStart Date: 1 October 2017Project Status: Completed and delivered &nbsp; Project Aims: This project was developed under the Nodes Specialisation Project, makes linguistically enriched corpora available for the four official South African languages with a conjunctive orthography, i.e. isiNdebele, isiXhosa, isiZulu, and Siswati. The parallel corpora consist of approximately 50,000 tokens each, aligned [&hellip;]<\/p>\n","protected":false},"author":246,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[730],"tags":[],"class_list":["post-6470","post","type-post","status-publish","format-standard","hentry","category-general"],"acf":[],"_links":{"self":[{"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/posts\/6470","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/users\/246"}],"replies":[{"embeddable":true,"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/comments?post=6470"}],"version-history":[{"count":0,"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/posts\/6470\/revisions"}],"wp:attachment":[{"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/media?parent=6470"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/categories?post=6470"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sadilar.org\/en\/wp-json\/wp\/v2\/tags?post=6470"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}