Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Why is Documenting Data Important?
Data documentation ensures that your data can be understood, interpreted, found, and cited by any user. It is beneficial to use a metadata standard in order to ensure FAIR data, and to create a readme.txt file to ensure everything is documented and organized. It is also ideal to start the documentation process at the beginning of the research project, as it can be quite painful to complete afterwards.
Data documentation usually describes the content, formats, and internal relations of your data. Aspects of your data which should be documented, as described by DMPTool's guide to documentation, are as follows:
- Title: Name of the dataset or research project that produced it
- Creator: Names and addresses of the organizations or people who created the data; preferred format for personal names is surname first (e.g., Smith, Jane)
- Identifier: Unique number used to identify the data, even if it is just an internal project reference number
- Date: Key dates associated with the data, including: project start and end date; release date; time period covered by the data; and other dates associated with the data lifespan, such as maintenance cycle, update schedule; preferred format is yyyy-mm-dd, or yyyy.mm.dd-yyyy.mm.dd for a range
- Method: How the data were generated, listing equipment and software used (including model and version numbers), formulae, algorithms, experimental protocols, and other things one might include in a lab notebook
- Processing: How the data have been altered or processed (e.g., normalized)
- Source: Citations to data derived from other sources, including details of where the source data is held and how it was accessed
- Funder: Organizations or agencies who funded the research
- Subject: Keywords or phrases describing the subject or content of the data
- Place: All applicable physical locations
- Language: All languages used in the dataset
- Variable list: All variables in the data files, where applicable
- Code list: Explanation of codes or abbreviations used in either the file names or the variables in the data files (e.g. "999 indicates a missing value in the data")
- File inventory: All files associated with the project, including extensions (e.g. "NWPalaceTR.WRL", "stone.mov")
- File formats: Formats of the data, e.g., FITS, SPSS, HTML, JPEG, etc.
- File structure: Organization of the data file(s) and layout of the variables, where applicable
- Version: Unique date/time stamp and identifier for each version
- Checksum: A digest value computed for each file that can be used to detect changes; if a recomputed digest differs from the stored digest, the file must have changed
- Necessary software: Names of any special-purpose software packages required to create, view, analyze, or otherwise use the data
- Rights: Any known intellectual property rights, statutory rights, licenses, or restrictions on use of the data
- Access information: Where and how your data can be accessed by other researchers
Tools for Creating readme.txt files
Readme.txt files are a valuable tool for documenting data, with a data dictionary being another good option. It is important that the readme.txt file describes the data and file overview as a whole, but also describes each individual dataset or file. It is imperative to have a variable list of column headings for tabular data.
Metadata & the FAIR Principles
- Metadata is a standardized way of describing data, including information on who, what, where, when, why, and how. It is important to use metadata in your documentation process so the data can be understood, reused, and integrated with other data.
- The FAIR Principles were created in 2016 and published in Scientific Data as "FAIR Guiding Principles for scientific data management and stewardship." The FAIR Principles stand for Findable, Accessible, Interoperable, and Reusable. To learn more about the FAIR Principles, visit the GO FAIR website.
This chart, created by the Cambridge Crystallographic Data Centre (CCDC), shows the importance of metadata when it comes to ensuring your data is FAIR.
FAIRsharing.org is a curated, informative and educational resource on data and metadata standards, inter-related to databases and data policies. FAIRsharing.org has information on more than 1600 standards, 1900 databases, and 150 policies.
The University of Rochester has a Metadata Outreach Service led by Maggie Dull, Director of Metadata Strategies. Please feel free to contact them with any metadata questions you have.
RDA Metadata Standards Catalog
The RDA Metadata Standards Catalog is a collaborative, open directory of metadata standards applicable to research data.
DCC Disciplinary Metadata Standards
A list of metadata standards collected by the Digital Curation Center (DCC). Search for metadata standards by discipline.
Dublin Core (DCMI) is general metadata standard.
Data Documentation Initiative
The Data Documentation Initiative (DDI) is an international free standard for describing the data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences.
Text Encoding Initiative
The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form. Its chief deliverable is a set of guidelines which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics.
Ecological Metadata Language
The Ecological Metadata Language (EML) defines a comprehensive vocabulary and a readable XML markup syntax for documenting research data, specifically for ecology disciplines.
ISO 19115-1:2014 defines the schema required for describing geographic information and services by means of metadata.
The Content Standard for Digital Geospatial Metadata (CSDGM) is the long time FGDC endorsed geographic metadata standard.
Flexible Image Transport System
Flexible Image Transport System (FITS) is a file format designed to store, transmit, and manipulate scientific images and associated data, specifically in the field of astronomy.
Miner Library's List of Biological and Biomedical Metadata
University of Rochester's Miner Library has a list of metadata standards useful in the biological and biomedical fields.