In order to access resources efficiently and accurately, it is essential that the structured data describing the resource, i.e. metadata, conforms to a certain set of minimum requirements. Although there are well-established metadata descriptions for various types of resources, the literature and dispersion of different metadata structures can be very intimidating. The following information tries to provide guidelines for the creation of basic metadata in the creation of digital language resources. The proposed guidelines only aim to provide a minimal set of required metadata items that will make the retrieval and browsing of digital resources possible, while making it relatively simple to create metadata records for individuals and organisations with limited knowledge and resources.
Because of the specialised nature of the various types of language resources that are relevant in the context of SADiLaR, it would be impossible to provide a single metadata structure that is applicable to all possible language resources. The metadata fields proposed in this document are a combination of Dublin Core Metadata initiative (DCMI); Text Encoding Initiative (TEI); Open Language Archives Community; DSpace; META-SHARE; ISOcat; and the Common Language Resource Infrastructure (CLARIN). If there are metadata items that are not addressed in the document; feel free to review the resource documents and sites in the final section for more detailed metadata fields.
Metadata fields
The following tables provide the minimum set of mandatory fields that should be included in a metadata record in order to make the language resources easily accessible and searchable. The field descriptions are mostly sourced from the NRF and DSpace, with additional information from META-SHARE, and Dublin Core.
Mandatory fields
Field |
Short description |
Title |
Title statement/title proper. |
Author/Creator |
Author(s) of the work. |
Date issued |
Date of publication or distribution. |
Subject/Keywords |
The topic of the resource. |
Language |
ISO 639-1/2 standard code for language of intellectual content. |
Publisher |
Entity responsible for publication, distribution, or imprint. |
Description |
A short description of the resource, which could be the abstract or table of contents. |
Contact person name |
Name of person with more information on the resource. |
Contact person email |
Contact person’s email address. |
Additional common metadata fields
The following is a selection of commonly used metadata fields that improve the usefulness of the metadata and ability to search and filter items during searches. Where possible these fields should also be included in the metadata records.
Field |
Short description |
Contributor(s) |
A person, organization, or service responsible for the content of the resource. |
Format |
The format of the resource, such as XML, text, docx, etc. |
Medium |
Physical medium of the resource. |
Size/Extent |
Size or duration or the resource. |
URL |
A URL used as homepage of an entity (e.g. of a person, organization, resource etc.) and/or where an entity (e.g.LR, document etc.) is located. |
Date created |
Date of creation or manufacture of intellectual content if different from Date issued. |
Date copyright |
Date of copyright. |
License/Rights |
Terms governing use and reproduction. |
Identifier |
ISBN/ISSN/ISMN/ISLRN |
Citation |
Human-readable, standard bibliographic citation. |
Rights holder |
A person or organization owning or managing rights over the resource. |
Version/Edition |
The specific version of edition of the original resource. |
Description |
A short description of the resource, which could be the abstract or table of contents. |
Location |
The place where the resources was produced. |
Country |
The country where the resource was produced. |
Region |
The state/province where the resource was produced. |
City |
The city where the resource was produced. |
Coverage |
The spatial or temporal coverage of the content. |
Language codes
Language name |
ISO 639-1 |
ISO-639-2 |
Afrikaans |
af |
afr |
English |
en |
eng |
isiNdebele |
nr |
nbl |
isiXhosa |
xh |
xho |
isiZulu |
zu |
zul |
Sesotho |
st |
sot |
Sesotho sa Leboa |
nso |
nso |
Setswana |
tn |
tsn |
SiSwati |
ss |
ssw |
Tshivenḓa |
ve |
ven |
Xitsonga |
ts |
tso |
Established metadata initiatives
More information, including extended descriptions of fields, and additional possible fields are available from the following resources:
- Dublin Core Metadata Initiative: http://dublincore.org/
- NRF: http://digi.nrf.ac.za/publ/index.php
- META-SHARE: http://www.meta-net.eu/meta-share/index_html
- OLAC: http://www.language-archives.org/
- TEI: http://www.tei-c.org/index.xml
- CMDI: https://www.clarin.eu/content/component-metadata