Metadata guidelines

In order to access resources efficiently and accurately, it is essential that the structured data describing the resource, i.e. metadata, conforms to a certain set of minimum requirements. Although there are well-established metadata descriptions for various types of resources, the literature and dispersion of different metadata structures can be very intimidating. The following information tries to provide guidelines for the creation of basic metadata in the creation of digital language resources. The proposed guidelines only aim to provide a minimal set of required metadata items that will make the retrieval and browsing of digital resources possible, while making it relatively simple to create metadata records for individuals and organisations with limited knowledge and resources.

Because of the specialised nature of the various types of language resources that are relevant in the context of SADiLaR, it would be impossible to provide a single metadata structure that is applicable to all possible language resources. The metadata fields proposed in this document are a combination of Dublin Core Metadata initiative (DCMI); Text Encoding Initiative (TEI); Open Language Archives Community; DSpace; META-SHARE; ISOcat; and the Common Language Resource Infrastructure (CLARIN). If there are metadata items that are not addressed in the document; feel free to review the resource documents and sites in the final section for more detailed metadata fields.

Metadata fields

The following tables provide the minimum set of mandatory fields that should be included in a metadata record in order to make the language resources easily accessible and searchable. The field descriptions are mostly sourced from the NRF and DSpace, with additional information from META-SHARE, and Dublin Core.

Mandatory fields

Field Short description
Title Title statement/title proper.
Author/Creator Author(s) of the work.
Date issued Date of publication or distribution.
Subject/Keywords The topic of the resource.
Language ISO 639-1/2 standard code for language of intellectual content.
Publisher Entity responsible for publication, distribution, or imprint.
Description A short description of the resource, which could be the abstract or table of contents.
Contact person name Name of person with more information on the resource.
Contact person email Contact person’s email address.

Additional common metadata fields

The following is a selection of commonly used metadata fields that improve the usefulness of the metadata and ability to search and filter items during searches. Where possible these fields should also be included in the metadata records.

Field Short description
Contributor(s) A person, organization, or service responsible for the content of the resource.
Format The format of the resource, such as XML, text, docx, etc.
Medium Physical medium of the resource.
Size/Extent Size or duration or the resource.
URL A URL used as homepage of an entity (e.g. of a person, organization, resource etc.) and/or where an entity (e.g. LR, document etc.) is located.
Date created Date of creation or manufacture of intellectual content if different from Date issued.
Date copyright Date of copyright.
License/Rights Terms governing use and reproduction.
Identifier ISBN/ISSN/ISMN/ISLRN
Citation Human-readable, standard bibliographic citation.
Rights holder A person or organization owning or managing rights over the resource.
Version/Edition The specific version of edition of the original resource.
Description A short description of the resource, which could be the abstract or table of contents.
Location The place where the resources was produced.
Country The country where the resource was produced.
Region The state/province where the resource was produced.
City The city where the resource was produced.
Coverage The spatial or temporal coverage of the content.

Language codes

Language name ISO 639-1 ISO-639-2
Afrikaans af afr
English en eng
isiNdebele nr nbl
isiXhosa xh xho
isiZulu zu zul
Sesotho st sot
Sesotho sa Leboa nso nso
Setswana tn tsn
SiSwati ss ssw
Tshivenḓa ve ven
Xitsonga ts tso

Established metadata initiatives

More information, including extended descriptions of fields, and additional possible fields are available from the following resources: