Enabling localised language technology applications: A Computational Wide coverage resource grammar for isiZulu

Project Type: Node
Project Start Date: 1 April 2020
Project Status: Completed 

Project Aims:

The CSIR node of SADiLaR recently completed a project with as its main aim to deliver to the research community a high-quality, computational, wide coverage resource grammar (WCRG) for isiZulu.  WCRGs unlock opportunities for the South African languages to participate in multilingual research, nationally and internationally.

The project focused on developing various foundational components of the WCRG, namely the isiZulu resource grammar itself, a lexicon aimed at enabling wide-coverage, and a framework for development and evaluation based on a manually curated treebank. Furthermore, an extension module was developed to enable chunk parsing via the grammar, and a web service was developed to provide parsing and linearisation functionality. A web user interface was developed to showcase the isiZulu RG and make it available to the Natural Language Processing (NLP) community as end users.

 Project Deliverables:

1. Resource Grammar for isiZulu

Implementation of isiZulu RG functions, merged into the official GF RGL repository

Access at: https://github.com/GrammaticalFramework/gf-rgl

2. GF Lexicon modules

Monolingual and multilingual GF concrete and abstract syntax modules

Access at: https://github.com/GrammaticalFramework/gf-rgl

Phrase-level adjectival qualificative GF concrete and abstract syntax modules

Access at: https://github.com/LauretteM/gf-afwn

3. Treebanks

A manually curated treebank of 1000 sentences was developed and a set of treebanks for regression testing was developed

Access at: https://github.com/LauretteM/gf-zulu-resources

Automatically generated treebanks: VulaBula Graded Reader treebank, isiZulu Wordnet usage examples treebank

Access at: https://github.com/LauretteM/gf-zulu-resources

4. GF chunk extension module

GF modules PChunk.gf and PChunkZul.gf, merged into the official GF RGL repository

Access at: https://github.com/GrammaticalFramework/gf-rgl

5. REST API web service and a web user interface

A web service for parsing of isiZulu sentences and linearisation of abstract parse trees.

Access at: https://rhonda.qfrency.com/api/v1/mt/zulurg/v1

A web user interface to serve end users of the RG.

Access at: https://grammar.qfrency.com/

6. Capacity development and research outputs

Slides presented at GF Summer School

Access at: https://github.com/LauretteM/gf-zulu-resources

Slides presented at GF online seminars

Access at: https://github.com/LauretteM/gf-zulu-resources  

International workshop

Listed here: https://www.eventbrite.co.uk/e/language-technology-for-education-in-the-south-african-languages-registration-349430665527

Title: Approximating a Zulu GF concrete syntax with a neural network for natural language understanding

Presented at CNL 2021

Access at: https://sadilar.org/wp-content/uploads/2021/11/2021.cnl-1.4.pdf

Title: Extending the Usage of Adjectives in the Zulu AfWN

Presented at GWC 2023

Access at: https://sadilar.org/wp-content/uploads/2021/11/GWC2023_paper_5400.pdf

Title: Parsing Zulu text using Grammatical Framework

Submitted to CLIRAI (special session) 2023

Not available yet.

Title: Leveraging a resource grammar for developing language resources for Zulu

Submitted to Language, Resources and Evaluation

Not available yet

 

Contact Person:

Dr Laurette Marais, node manager: LMarais@csir.co.za