Skip to main content

Research Data Management Toolkit: Durable File Formats

This guide provides information about research data management and the Tropical Data Hub (TDH) Research Data repository

Durable file formats

You may need to use different file formats at different stages in the Research Data Management Lifecycle but for long-term preservation of data you will need to store your files in a durable format. This ensures your files can be opened by others (including your future self) using readily available programs, perhaps long after the research project has concluded.

Revisit the data management snafu video in the introduction to the Toolkit to see what can happen when "bad" formats are used.

Where possible:

DO USE

- formats endorsed by standards agencies such as Standards Australia, ISO

- open formats developed and maintained by communities of interest such as OpenDocument Format

- lossless formats

- formats widely used within a given discipline

AVOID

 - proprietary formats

 - file format and software obsolescence

Exporting data to plain text

You may have to use software that does not save data in a durable format, due to discipline-specific or other requirements e.g. specialised programs to capture or generate data. Export your data to a more durable format such as plain text if you can do so without losing data integrity and include it alongside the original files when you archive them. This is often possible. An example is exporting .csv files from SPSS (with value labels) and archiving them alongside the .sav files.


Some examples of preferred formats for data achiving:

  • Excel spreadsheet (.xlsx) and .csv or OpenDocument Spreadsheet (.ods)
  • Word document (.docx) and Rich text (.rtf), PDF or OpenDocument Text (.odt)
  • Geospatial data: ESRI shapefile (.shp, .shx, .dbf), Geo-referenced TIFF (.tif), ESRI ASCII Grid (.asc)
  • Image files: lossless formats (.tif or .raw) preferred
  • Video: MPEG-4 (.mp4)
  • Audio: Free Lossless Audio Codec (.flac)

It is also important to document data capture and storage formats as well as software used and their versions. See the Data Documentation and Metadata section of the Toolkit for more information.


Idea iconRecommended Formats List

The UK Data Service maintains a list of recommended and acceptable formats for agencies, researchers and others depositing social, economic and population data in their collection.

Formats for Packaging Data

The ETH-Bibliothek (Swiss Federal Institute of Technology) provides some advice for packaging data into archives in their factsheet Recommendations for uploading data:

Packaged files can be used for archiving large collections of heterogenous datasets with some provisos:

  • Use archives with extensions .zip or .tar
  • Zip the data without any data compression
  • If possible, avoid encrypting the files
  • Be aware that very large packages may be difficult to open from a browser - ETH-Bibliothek recommends packages of less than 2GB
  • Avoid long path lengths in your folder structure. Long file names combined with a detailed folder hierarchy may lead to path lengths exceeding 256 characters. This hampers further processing in Windows and WinZip cannot unpack such containers.

File Formats: Resources

    return to Toolkit Contents

We acknowledge the Australian Aboriginal and Torres Strait Islander peoples as the first inhabitants of the nation and acknowledge Traditional Owners of the lands where our staff and students, live, learn and work.Acknowledgement of Country