Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Research Data Management & Sharing: Documenting Data

This guide provides resources on research data management and sharing.

Why is Documenting Data Important?

Data documentation ensures that your data can be understood, interpreted, found, and cited by any user. It is beneficial to use a metadata standard in order to ensure FAIR data, and to create a readme.txt file to ensure everything is documented and organized. It is also ideal to start the documentation process at the beginning of the research project, as it can be quite painful to complete afterwards. 

Data Documentation

Data documentation usually describes the content, formats, and internal relations of your data. Aspects of your data which should be documented, as described by DMPTool's guide to documentation, are as follows:

General Overview

  • Title: Name of the dataset or research project that produced it
  • Creator: Names and addresses of the organizations or people who created the data; preferred format for personal names is surname first (e.g., Smith, Jane)
  • Identifier: Unique number used to identify the data, even if it is just an internal project reference number
  • Date: Key dates associated with the data, including: project start and end date; release date; time period covered by the data; and other dates associated with the data lifespan, such as maintenance cycle, update schedule; preferred format is yyyy-mm-dd, or yyyy.mm.dd-yyyy.mm.dd for a range
  • Method: How the data were generated, listing equipment and software used (including model and version numbers), formulae, algorithms, experimental protocols, and other things one might include in a lab notebook
  • Processing: How the data have been altered or processed (e.g., normalized)
  • Source: Citations to data derived from other sources, including details of where the source data is held and how it was accessed
  • Funder: Organizations or agencies who funded the research

Content Description

  • Subject: Keywords or phrases describing the subject or content of the data
  • Place: All applicable physical locations
  • Language: All languages used in the dataset
  • Variable list: All variables in the data files, where applicable
  • Code list: Explanation of codes or abbreviations used in either the file names or the variables in the data files (e.g. "999 indicates a missing value in the data")

Technical Description

  • File inventory: All files associated with the project, including extensions (e.g. "NWPalaceTR.WRL", "stone.mov")
  • File formats: Formats of the data, e.g., FITS, SPSS, HTML, JPEG, etc.
  • File structure: Organization of the data file(s) and layout of the variables, where applicable
  • Version: Unique date/time stamp and identifier for each version
  • Checksum: A digest value computed for each file that can be used to detect changes; if a recomputed digest differs from the stored digest, the file must have changed
  • Necessary software: Names of any special-purpose software packages required to create, view, analyze, or otherwise use the data

Access

  • Rights: Any known intellectual property rights, statutory rights, licenses, or restrictions on use of the data
  • Access information: Where and how your data can be accessed by other researchers

Tools for Creating readme.txt files

Readme.txt files are a valuable tool for documenting data, with a data dictionary being another good option. It is important that the readme.txt file describes the data and file overview as a whole, but also describes each individual dataset or file. It is imperative to have a variable list of column headings for tabular data. 

Metadata & the FAIR Principles

  • Metadata is a standardized way of describing data, including information on who, what, where, when, why, and how. It is important to use metadata in your documentation process so the data can be understood, reused, and integrated with other data. 
  • The FAIR Principles were created in 2016 and published in Scientific Data as "FAIR Guiding Principles for scientific data management and stewardship." The FAIR Principles stand for Findable, Accessible, Interoperable, and Reusable. To learn more about the FAIR Principles, visit the GO FAIR website.  

FAIR: Findable, Accessible, Interoperable, Reusable

This chart, created by the Cambridge Crystallographic Data Centre (CCDC), shows the importance of metadata when it comes to ensuring your data is FAIR. 

FAIRsharing.org

FAIRsharing.org, standards, databases, policies

FAIRsharing.org is a curated, informative and educational resource on data and metadata standards, inter-related to databases and data policies. FAIRsharing.org has information on more than 1600 standards, 1900 databases, and 150 policies. 

Metadata Outreach

The University of Rochester has a Metadata Outreach Service led by Maggie Dull, Director of Metadata Strategies. Please feel free to contact them with any metadata questions you have. 

Metadata Standards