LibGuides: Research Data Management & Sharing: Documenting Data

Data Documentation

Data documentation usually describes the content, formats, and internal relations of your data. Aspects of your data which should be documented, as described by DMPTool's guide to documentation, are as follows:

General Overview

Title: Name of the dataset or research project that produced it
Creator: Names and addresses of the organizations or people who created the data; preferred format for personal names is surname first (e.g., Smith, Jane)
Identifier: Unique number used to identify the data, even if it is just an internal project reference number
Date: Key dates associated with the data, including: project start and end date; release date; time period covered by the data; and other dates associated with the data lifespan, such as maintenance cycle, update schedule; preferred format is yyyy-mm-dd, or yyyy.mm.dd-yyyy.mm.dd for a range
Method: How the data were generated, listing equipment and software used (including model and version numbers), formulae, algorithms, experimental protocols, and other things one might include in a lab notebook
Processing: How the data have been altered or processed (e.g., normalized)
Source: Citations to data derived from other sources, including details of where the source data is held and how it was accessed
Funder: Organizations or agencies who funded the research

Content Description

Subject: Keywords or phrases describing the subject or content of the data
Place: All applicable physical locations
Language: All languages used in the dataset
Variable list: All variables in the data files, where applicable
Code list: Explanation of codes or abbreviations used in either the file names or the variables in the data files (e.g. "999 indicates a missing value in the data")

Technical Description

File inventory: All files associated with the project, including extensions (e.g. "NWPalaceTR.WRL", "stone.mov")
File formats: Formats of the data, e.g., FITS, SPSS, HTML, JPEG, etc.
File structure: Organization of the data file(s) and layout of the variables, where applicable
Version: Unique date/time stamp and identifier for each version
Checksum: A digest value computed for each file that can be used to detect changes; if a recomputed digest differs from the stored digest, the file must have changed
Necessary software: Names of any special-purpose software packages required to create, view, analyze, or otherwise use the data

Access

Rights: Any known intellectual property rights, statutory rights, licenses, or restrictions on use of the data
Access information: Where and how your data can be accessed by other researchers

Tools for Creating readme.txt files

Readme.txt files are a valuable tool for documenting data, with a data dictionary being another good option. It is important that the readme.txt file describes the data and file overview as a whole, but also describes each individual dataset or file. It is imperative to have a variable list of column headings for tabular data.

Creating readme.txt Files Guide by Cornell University: Cornell University's guide on creating readme.txt files is easy to follow and includes numerous resources. It also includes a free template which can be downloaded and used to create your own readme.txt file.

Metadata & the FAIR Principles

Metadata is a standardized way of describing data, including information on who, what, where, when, why, and how. It is important to use metadata in your documentation process so the data can be understood, reused, and integrated with other data.
The FAIR Principles were created in 2016 and published in Scientific Data as "FAIR Guiding Principles for scientific data management and stewardship." The FAIR Principles stand for Findable, Accessible, Interoperable, and Reusable. To learn more about the FAIR Principles, visit the GO FAIR website.

FAIR: Findable, Accessible, Interoperable, Reusable

This chart, created by the Cambridge Crystallographic Data Centre (CCDC), shows the importance of metadata when it comes to ensuring your data is FAIR.

FAIRsharing.org

FAIRsharing.org is a curated, informative and educational resource on data and metadata standards, inter-related to databases and data policies. FAIRsharing.org has information on more than 1600 standards, 1900 databases, and 150 policies.

Metadata Outreach

The University of Rochester has a Metadata Outreach Service led by Maggie Dull, Director of Metadata Strategies. Please feel free to contact them with any metadata questions you have.

Metadata Standards

RDA Metadata Standards Catalog
The RDA Metadata Standards Catalog is a collaborative, open directory of metadata standards applicable to research data.
DCC Disciplinary Metadata Standards
A list of metadata standards collected by the Digital Curation Center (DCC). Search for metadata standards by discipline.
Dublin Core
Dublin Core (DCMI) is general metadata standard.
Data Documentation Initiative
The Data Documentation Initiative (DDI) is an international free standard for describing the data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences.
Text Encoding Initiative
The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form. Its chief deliverable is a set of guidelines which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics.
Ecological Metadata Language
The Ecological Metadata Language (EML) defines a comprehensive vocabulary and a readable XML markup syntax for documenting research data, specifically for ecology disciplines.
ISO 19115-1:2014
ISO 19115-1:2014 defines the schema required for describing geographic information and services by means of metadata.
FGDC-CSDGM
The Content Standard for Digital Geospatial Metadata (CSDGM) is the long time FGDC endorsed geographic metadata standard.
Flexible Image Transport System
Flexible Image Transport System (FITS) is a file format designed to store, transmit, and manipulate scientific images and associated data, specifically in the field of astronomy.
Miner Library's List of Biological and Biomedical Metadata
University of Rochester's Miner Library has a list of metadata standards useful in the biological and biomedical fields.

Research Data Management & Sharing: Documenting Data