LibGuides: Research Data Management & Sharing: Organizing Data

LabArchives (Electronic Lab Notebook)

The University of Rochester has selected LabArchives as our institution-wide, electronic lab notebook solution. Researchers at UR can access LabArchives free of charge to manage both research labs and laboratory courses. Our team at UR Libraries’ can provide more information, help you get set-up in the platform, and troubleshoot any issues you encounter. We also facilitate regular training sessions for the platform provided directly from the experts at LabArchives.

LabArchives allows you to:

Organize your research and make your notebook searchable
Backup your research work on the cloud
Share your research or keep your work secure using LabArchives’ access controls
Add DOIs and make your work citable
Use LabArchives in your classes (with optional integration with Blackboard)

See our landing page for more information.

Qualitative Research Tools

Taguette is a free and open-source tool for qualitative research. You can import your research materials, highlight and tag quotes, and export the results. User can:

Import PDFs, Word Docs (.docx), Text files (.txt), HTML, EPUB, MOBI, Open Documents (.odt), and Rich Text Files (.rtf).
Highlight words, sentences, or paragraphs and tag them with the codes you create.
Work collaboratively with other users (if self-hosting or using app.taguette.org).
Your data stays your own; export everything including your project, highlights, documents, and codes.

Recommended File Formats

It is imperative that you think carefully about the file formats you use to manage, share, and preserve your data, as technology is always changing, and software can become obsolete.

According to the DMPTool, formats likely to be accessible in the future are:

Non-proprietary
Open, with documented standards
In common usage by the research community
Using standard character encodings (i.e., ASCII, UTF-8)
Uncompressed (space permitting)

Examples of preferred format choices include:

Image: JPEG, JPG-2000, PNG, TIFF
Text: plain text (TXT), HTML, XML, PDF/A
Audio: AIFF, WAVE
Containers: TAR, GZIP, ZIP
Databases: prefer XML or CSV to native binary formats

Another good resource to use to learn more about file formats is UK Data Service Guidance on Recommended Formats.

File Renaming Tools

Advanced Renamer
Windows, Free
Bulk Rename Utility
Windows, Free
Rename-It!
Windows, Free
Renamer 6
Mac
Rename
Linux Command Line Tool, Free

Structuring Files

It is important to use a consistent file structure in order to ensure all of your files can be found.
This file structure should be recorded in your readme.txt file and in your data documentation. This readme.txt file should be located at the top of the file structure hierarchy so it can be easy to find.
Try to keep raw data, processed data, code and outputs in separate folders in order to avoid confusion.
The names and folders should follow a file naming convention (see box below).
The exact file structure can differ according to the needs of the researcher.

Example 1: Created by Lane Medical Library at Stanford Medicine with reference to TIER Protocol.

Example 2: A more complicated file structure, which can be generated and auto-populated with the Reproducible Science template for CookieCutter.

.
├── AUTHORS.md
├── LICENSE
├── README.md
├── bin <- Your compiled model code can be stored here (not tracked by git)
├── config <- Configuration files, e.g., for doxygen or for your model if needed
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── docs <- Documentation, e.g., doxygen or scientific papers (not tracked by git)
├── notebooks <- Ipython or R notebooks
├── reports <- For a manuscript source, e.g., LaTeX, Markdown, etc., or any project reports
│ └── figures <- Figures for the manuscript or reports
└── src <- Source code for this project
├── data <- scripts and programs to process data
├── external <- Any external source code, e.g., pull other git projects, or external libraries
├── models <- Source code for your own model
├── tools <- Any helper scripts go here
└── visualization <- Scripts for visualisation of your results, e.g., matplotlib, ggplot2 related.

Naming Files

A file naming convention (FNC) is a framework for naming your files in a way that describes what they are and their relationship to other files. It is important to create the FNC at the very beginning of the project. Make sure everyone involved in the research project is aware of the FNC, and that all members consistently used it. You want to record the FNC in your readmt.txt file and in the data documentation section of your research data management and sharing plan.

General rules to follow include:

Be consistent.
It should be as short as possible (try to make sure it is less than 32 characters)
Reserve the three letter file extension for the file format, such as .csv.
Avoid using special characters.
Do not use spaces, as they are not recognized by some software. Use underscores (file_name), capital letters (fileName), or dashes (file-name) instead
Use the ISO 8601 date format: YYYYMMDD
To ensure the files are sequential, consider the sort order.
- Use leading zeroes when it comes to numbers. 07 will sort above 70, but 7 will not. Consider how many files you will have, and use that many digits. (i.e., less than 100 use 01-99. More than 100 use 001-999.)
- Consider the hierarchy of the terms in the FNR. If you want files to be organized first by date, then date should be first. If you want to organize first by interviewee name, then the interviewee name should be first.
Always include version numbers on a file, as it can be difficult to find the "correct" version of a file.
Avoid generic file names.
Avoid using acronym names that cannot be easily understood, or are not explained in the readme.txt file.

Information to consider including in your FNC:

Project name, experiment name or acronym
Initials or name of researcher
Date or range of dates when data was collected
Location or spatial information
Type of data
Type of analysis
Conditions
Description of experiment
Unique identifier
Language
Name or pseudonym of interviewee
Sample name
Version number of file (with leading zeroes)
Three letter file extension for the file format

Include the formula for the FNC in your readme.txt file, including the meanings of any acronyms that need to be used in the FNC.

Example 1

FNC	[Date]_[Interviewee]_[DocumentType].pdf
Date	The date the interview was taken in YYYYMMDD format.
Interviewee	Pseudonym of the interviewee.
Document Type	Which document type is this: Notes - Raw notes taken by the interviewer during the interview process. Transcript - Transcript created from the audio file of the interview.
Example	20220818_Noelle_Transcript.pdf

Example 2

FNC	[SampleLocation]_[Date]_[VersionNumber].csv
Sample Location	The location where the sample was taken. ERI - Lake Erie ONT - Lake Ontario
Date	The date the sample was taken in YYYYMMDD format.
Version Number	The version number of the table. Record as vXX.
Example	ONT_20220818_v03.csv

Research Data Management & Sharing: Organizing Data