Skip to main content

Research Data Management Toolkit: Organize Data

This guide provides information about research data management and the Tropical Data Hub (TDH) Research Data repository

Introduction: File Directories, File Names and Version Control

Setting up file directories and naming files properly can save an enormous amount of time later.

Version control is required to ensure the authenticity of research data. Working with outdated versions wastes research time and puts data at risk.

Data cleaning (or "wrangling") is often required to prepare data for analysis and/or visualization. This process identifies and corrects errors or makes formatting more consistent. 

Data documentation and metadata are also very important, during the active data stage and when archiving data. 

File Directories

Take a look at this tweet from @micahgallen for an effective directory structure (for research projects) and notice how many researchers this resonated with!

/Projects
…/inProgress
 ....../ProjectName
 ........../docs 
.........../code
 ........../data
 ........../figures 
.../published 
.../submitted

The UK Data Service provides some more formal advice on folder structures and notes that it helps to restrict the level of folders to three or four deep and not to have more than ten items in each list. 

File Names

Photo by Samuel Zeller on Unsplash

File names should be consistent and documented. Abbreviations and codes are fine as long as they meet these criteria.

File names could include information such as:

  • Project or experiment name or acronym
  • Location/spatial coordinates
  • Researcher name/initials
  • Date or date range of experiment
  • Data type
  • Conditions
  • File version number

It's a great idea to include a readme.txt file in the directory that explains the naming format and any abbreviations or code used.

Avoid really long file names and special characters like ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ' " and | in file names, directory paths and field names. 

Spaces in file names can also cause problems for some software or web applications so underscores, dashes or camel case (e.g. FileName) can be used instead.

Re-naming multiple files is onerous but there are bulk re-naming utilities that can help:

Version Control

Photo by Artem Sapegin on Unsplash

Version control is the process of managing file (or record or dataset) revisions. It is particularly important for files that undergo numerous revisions and when there are multiple members of a research team or files are shared across multiple locations.

Basic version control can be achieved by assigning unique file names and keeping a version control table to record changes - the UK Data Archive Version Control and Authenticity guide includes examples

Best practice (from the UK Data Archive) guide is to:

  • decide how many versions of a file to keep, which versions to keep, for how long and how to organize versions
  • identify milestone versions to keep, e.g. major versions rather than minor versions (keep version 02-00 but not 02-01)
  • uniquely identify different versions of files using a systematic naming convention
  • record changes made to a file when a new version is created
  • record relationships between items where needed, for example between code and the data file it is run against; between data file and related documentation or metadata; or between multiple files
  • track the location of files if they are stored in a variety of locations
  • regularly synchronise files in different locations
  • identify a single location for the storage of milestone and master versions

Other strategies for maintaining version control include using version control facilities within the software (the guide includes exercises on applying versioning in MS Word and synchronising files and folders using SyncToy for MS Windows), using versioning software (see the list below), using file sharing services such as Google Drive, controlling rights to file editing and manually merging edits by multiple users.

Version Control Resources

Guides:

Software:

Training:

    return to Toolkit Contents

We acknowledge the Australian Aboriginal and Torres Strait Islander peoples as the first inhabitants of the nation and acknowledge Traditional Owners of the lands where our staff and students, live, learn and work.Acknowledgement of Country