Technologies

One of the supporting functions of SADiLaR is providing researchers with easily accessible technologies that can support their research efforts. Over the course of the project, SADiLaR will provide access to these technologies through relatively simple web interfaces that allow access to data and some of the technologies that are available from the SADiLaR repository. Currently there are only a limited number of technologies available, but this will be extended on a regular basis, to not only include those technologies that are available from SADiLaR, but also to those services provided by our partner nodes. The services listed below are not necessarily developed by SADiLaR, but the services are hosted by SADiLaR as agreed with external parties.

Currently, the following technologies are available: 

Corpus portal

A service that allows for online searches in the corpora available from SADiLaR, and includes the following functionality:

  • Key word in context searches;
  • Word and frequency list generation;
  • Filtering of data based on metadata annotations;
  • Part-of-speech and lemma-based searches. 

CTexT NCHLT web services

A collection of 61 text technologies that are made available as web services and a simple user interface to automatically process textual input. The technologies are available for all indigenous South African languages, and include:

  • Optical character recognition engines;
  • Language identifier;
  • Sentence boundary detection;
  • Tokenisers;
  • Part-of-speech taggers;
  • Named-entity recognisers;
  • Phrase chunkers.

Autshumato machine translation web services (MTWS)

The MTWS is a unified interface through which anyone can gain access to the MT systems developed in the Autshumato project. It can provide sentence, document and web page translation in any of the available language pairs. New MT systems can easily be added to the MTWS to be instantly available to anyone with an internet connection. Currently the MTWS supports the following language pairs:

  • English to Afrikaans;
  • English to isiZulu;
  • English to Sesotho sa Leboa;
  • English to Setswana;
  • English to Xitsonga.

Voyant tools

Voyant Tools is a web-based text reading and analysis environment. It is a scholarly project that is designed to facilitate reading and interpretive practices for digital humanities students and scholars as well as for the general public.

What you can do with Voyant:

  • Use it to learn how computers-assisted analysis works. Check out our examples that show you how to do real academic tasks with Voyant.
  • Use it to study texts that you find on the web or texts that you have carefully edited and have on your computer.
  • Use it to add functionality to your online collections, journals, blogs or web sites so others can see through your texts with analytical tools.
  • Use it to add interactive evidence to your essays that you publish online. Add interactive panels right into your research essays (if they can be published online) so your readers can recapitulate your results.
  • Use it to develop your own tools using our functionality and code.

 

ZulMorph

ZulMorph is a finite state morphological analyser for Zulu, developed using the Xerox finite state tools lexc and xfst. It also compiles with Foma.

Zulu words in their surface form are analysed to their base form. Any meaningful word can be input, and the output will be a complete morphological analysis of that word.

Words marked with a “+?” could not be analysed by the analyser for various reasons – in most of the cases it is because the stem/root of the word is not included in the embedded lexicon of the analyser yet.

Most words have multiple analyses, and the selection of the correct analysis would be context dependent. Such disambiguation forms a next processing step.