Creating quality metadata for research data
Metadata for research data
Metadata is information about content that enables users to find the content, navigate to related information or share it with others. Metadata can be stored in various forms that model information as a tree (XML), linked tables (relational databases) or a graph of connected information elements (RDF).
Metadata is only useful if it is understandable to people or software that uses it. Therefore a variety of metadata standards have been developed that enable consistent recording and reporting of metadata, addressing both syntax and semantics.
There are generic metadata standards applicable to all research data, such as the DataCite Metadata Schema:
There are also many research domain-specific metadata standards, such as those listed here:
UK Polar Data Centre develops a catalogue of metadata records compliant with the ISO 19139 XML implementation schema for the ISO 19115 Geographic Information – Metadata standard and uses the DataCite Metadata Schema to issue Digital Object Identifiers for datasets.
What metadata do I need in order to publish my research data?
Good quality metadata of research data can help:
- Data Creators to share their data with others, maximise reach, receive credits when the data are cited and understand their data better when reused in the future
- Data Users to discover, evaluate and reuse created data
Information provided by Data Creators should address both discovery metadata (allowing Data Users to find the data) and contextual metadata (allowing Data Users to reuse the data).
The following quality metadata enable the Data Centre to publish research data in a discoverable and reusable way.
Quality metadata for research data:
- What the dataset comprises and Why was the work undertaken
- short description
- data type, volume, structure and format (format guidance)
- Where was the dataset collected
- site name
- When was the dataset collected, when will it be made available and what restrictions on use the dataset has
- collection date
- access constraints
- use constraints
- Who has created the dataset and who funded the work
- full name, affiliation and ORCID* of authors and contact person
- funding reference
- How was the dataset created, analysed and quality assessed
- instrumentation and/or software (including version)
- quality control and data resolution
- Contextual metadata enabling to better evaluate and reuse the dataset, such as specification of measured parameters with their units, explanation of abbreviations, references to associated publications and projects.
This quality metadata is summarised in the UK PDC metadata template, available HERE.
An example of best-practice metadata can be viewed here.
The UK Polar Data Centre will use the quality metadata to issue a unique permanent Digital Object Identifier (DOI) for a dataset and make the dataset available via data catalogues, such as the UK PDC Data Catalogue, NERC Data Catalogue or Antarctic Metadata Directory.
*ORCID (Open Researcher and Contributor ID) is an alphanumeric code that uniquely identifies academic authors and contributors and connects all their affiliations and scientific contributions (such as all published articles and datasets). ORCID improves recognition and increases discoverability of all scientific output.