It is important to organize your data so your data can be easily used and understood by others. Key practices to organizing your data include having a consistent folder and file structure, using recommended file formats, and using a file naming convention (FNR). Using tools such as LabArchives (an Electronic Lab Notebook) can help you keep your data organized and managed. Your folder structure should be documented and described in your readme.txt file.
The University of Rochester has selected LabArchives as our institution-wide, electronic lab notebook solution. Researchers at UR can access LabArchives free of charge to manage both research labs and laboratory courses. Our team at UR Libraries’ can provide more information, help you get set-up in the platform, and troubleshoot any issues you encounter. We also facilitate regular training sessions for the platform provided directly from the experts at LabArchives.
LabArchives allows you to:
Taguette is a free and open-source tool for qualitative research. You can import your research materials, highlight and tag quotes, and export the results. User can:
It is imperative that you think carefully about the file formats you use to manage, share, and preserve your data, as technology is always changing, and software can become obsolete.
According to the DMPTool, formats likely to be accessible in the future are:
Examples of preferred format choices include:
Another good resource to use to learn more about file formats is UK Data Service Guidance on Recommended Formats.
Example 1: Created by Lane Medical Library at Stanford Medicine with reference to TIER Protocol.
Example 2: A more complicated file structure, which can be generated and auto-populated with the Reproducible Science template for CookieCutter.
.
├── AUTHORS.md
├── LICENSE
├── README.md
├── bin <- Your compiled model code can be stored here (not tracked by git)
├── config <- Configuration files, e.g., for doxygen or for your model if needed
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── docs <- Documentation, e.g., doxygen or scientific papers (not tracked by git)
├── notebooks <- Ipython or R notebooks
├── reports <- For a manuscript source, e.g., LaTeX, Markdown, etc., or any project reports
│ └── figures <- Figures for the manuscript or reports
└── src <- Source code for this project
├── data <- scripts and programs to process data
├── external <- Any external source code, e.g., pull other git projects, or external libraries
├── models <- Source code for your own model
├── tools <- Any helper scripts go here
└── visualization <- Scripts for visualisation of your results, e.g., matplotlib, ggplot2 related.
A file naming convention (FNC) is a framework for naming your files in a way that describes what they are and their relationship to other files. It is important to create the FNC at the very beginning of the project. Make sure everyone involved in the research project is aware of the FNC, and that all members consistently used it. You want to record the FNC in your readmt.txt file and in the data documentation section of your research data management and sharing plan.
General rules to follow include:
Information to consider including in your FNC:
Include the formula for the FNC in your readme.txt file, including the meanings of any acronyms that need to be used in the FNC.
Example 1
FNC |
[Date]_[Interviewee]_[DocumentType].pdf |
Date | The date the interview was taken in YYYYMMDD format. |
Interviewee | Pseudonym of the interviewee. |
Document Type |
Which document type is this: Notes - Raw notes taken by the interviewer during the interview process. Transcript - Transcript created from the audio file of the interview. |
Example | 20220818_Noelle_Transcript.pdf |
Example 2
FNC | [SampleLocation]_[Date]_[VersionNumber].csv |
Sample Location |
The location where the sample was taken. ERI - Lake Erie ONT - Lake Ontario |
Date | The date the sample was taken in YYYYMMDD format. |
Version Number | The version number of the table. Record as vXX. |
Example | ONT_20220818_v03.csv |