SADiLaR visits with European CLARIN centres

Over the course of September and October, SADiLaR’s technical manager, Dr Roald Eiselen, visited with various European centres that form part of the Common Language Resource Infrastructure (CLARIN). The aim of the visits was to establish connections that will lead to future cooperation and collaboration with entities performing activities similar to those of SADiLaR. The visits provided useful insights on the various technologies and resources available from CLARIN, and it is clear that the implementation and reuse of technical infrastructures from these institutions will significantly reduce the cost and time required to further our objectives.

During the first part of the visit, Roald met with Prof Jan Hajič and Dr Pavel Straňák at the Institute of Formal and Applied Linguistics, Charles University in Prague, who manage the LINDAT CLARIN centre. As a certified CLARIN B centre, LINDAT has been one of the primary institutes developing technologies and infrastructure within the CLARIN network, and is actively developing DSpace implementations and extensions to facilitate the integrated use of language resources, not only for Czech, but extensible to other languages and resource centres. The institute is also involved in various machine translation activities, and the development of tree banks and tools to analyse and search tree bank data, both of which link to activities within SADiLaR, where the various tools and processes developed at LINDAT can be reused and investigated by SADiLaR and its partner nodes.

The visit to Prague was followed by appointments with Prof Maciej Piasecki, the current chair of the national coordinators forum of CLARIN, and his team in Wrocławska, Poland. Roald was introduced to various tools used in the development of Wordnets, specifically the WordnetLoom software, and tools to distribute and analyse linguistic data, which will be implemented by SADiLaR for use by the UNISIA node in their Wordnet development project. The Polish team has extensive expertise in the development of both monolingual and multilingual corpora, as well as the annotation of corpora in various modalities. A discussion with the coordinator of user involvement in Poland, Dr Jan Wieczorek, was especially insightful, as it provided some background on the context of getting more people involved in the usage of language resources and analysis tools, especially in the humanities and social sciences, outside computational linguistics.

The next stop on the European expedition was a visit to the University of Leipzig’s Natural Language Processing group, ASV. Leipzig has a well-established connection with South Africa, through Prof Dr Uwe Quasthoff, who regularly attends the yearly Afrilex conference, and has worked with both the University of Stellenbosch and UNISA, on projects for South African languages. ASV is also the technical hub of Germany’s CLARIN consortium, while maintaining one of the largest corpora collections in multiple languages, through the Leipzig Corpus Collection. Meetings with Prof Quasthoff, Drs Thomas Eckhart and Dirk Goldhahn provided Roald with great insight into the procedures for corpus collection, maintenance, and distribution.

During a visit to the Low Lands, Roald visited the University of Tilburg, where he made a presentation to a collection of students and faculty members on the current and future activities of SADiLaR. This visit also re-established a longstanding relationship with Prof Menno van Zaanen, who is especially interested in forming cooperation agreements between the respective institutions that would enable students and researchers to work together on digital humanities projects, both in the Netherlands and South Africa. At a meeting in Brussels between the Dutch Language Union, Virtual Institute for Afrikaans, and SADiLaR, a jointly funded project to develop pilot language portals for South African languages was discussed. The project will commence in the near future. These portals will be proof-of-concept implementations that will give students on various academic levels access to language resources which they can use in their daily learning activities.

At the annual CLARIN conference in Pisa, Italy, most of the CLARIN community came together to discuss projects, current status, and future activities within the various CLARIN centres, as well as the CLARIN ERIC. Informative presentations and posters on a variety of topics also provided guidance to SADiLaR regarding what the future of language resource infrastructure holds, and what to expect in the short to medium term. Following the conference, a visit to the ERIC in Utrecht provided SADiLaR with an opportunity to explore the details of greater integration and cooperation with CLARIN, especially around the distribution of language resources and technologies for the South African languages. In discussions with the technical director, Dieter van Uytvanck, Roald started plans to update many of SADiLaR’s processes, technologies, and procedures, to align more closely with those of CLARIN. This in turn will ensure that SADiLaR remains up to date with the latest developments and technologies that are available to the language resource and digital humanities communities.

SADiLaR intends to formalise agreements with many of the institutions visited during the trip, and there will be close collaboration between these institutions, SADiLaR, and especially the researchers involved in various SADiLaR projects.