Skip to main content
Durable file formats
You may need to use different file formats at different stages in the Research Data Management Lifecycle but for long-term preservation of data you will need to store your files in a durable format. This ensures your files can be opened by others (including your future self) using readily available programs, perhaps long after the research project has concluded.
Revisit the data management snafu video in the introduction to the Toolkit to see what can happen when "bad" formats are used.
- formats endorsed by standards agencies such as Standards Australia, ISO
- open formats developed and maintained by communities of interest such as OpenDocument Format
- lossless formats
- formats widely used within a given discipline
- proprietary formats
- file format and software obsolescence
Exporting data to plain text
You may have to use software that does not save data in a durable format, due to discipline-specific or other requirements e.g. specialised programs to capture or generate data. Export your data to a more durable format such as plain text if you can do so without losing data integrity and include it alongside the original files when you archive them. This is often possible. An example is exporting .csv files from SPSS (with value labels) and archiving them alongside the .sav files.
Some examples of preferred formats for data archiving:
- Excel spreadsheet (.xlsx) and .csv or OpenDocument Spreadsheet (.ods)
- Word document (.docx) and Rich text (.rtf), PDF or OpenDocument Text (.odt)
- Geospatial data: ESRI shapefile (.shp, .shx, .dbf), Geo-referenced TIFF (.tif) and ESRI ASCII Grid (.asc)
- Image files: lossless formats (.tif or .raw) preferred
- Video: MPEG-4 (.mp4)
- Audio: Free Lossless Audio Codec (.flac)
It is also important to document data capture and storage formats as well as software used and their versions. See the Data Documentation and Metadata section of the Toolkit for more information.
Recommended Formats List
The UK Data Service maintains a list of recommended and acceptable formats for agencies, researchers and others depositing social, economic and population data in their collection.
Formats for Packaging Data
The ETH-Bibliothek (Swiss Federal Institute of Technology) provides some advice for packaging data into archives in their factsheet Recommendations for uploading data:
Packaged files can be used for archiving large collections of heterogenous datasets with some provisos:
- Use archives with extensions .zip or .tar
- Zip the data without any data compression
- If possible, avoid encrypting the files
- Be aware that very large packages may be difficult to open from a browser - ETH-Bibliothek recommends packages of less than 2GB
- Avoid long path lengths in your folder structure. Long file names combined with a detailed folder hierarchy may lead to path lengths exceeding 256 characters. This hampers further processing in Windows and WinZip cannot unpack such containers.
File Formats: Resources
ANDS Guide: File Formats
This ANDS guide covers institutional planning implications, obsolescence, file migration, open/proprietary formats, lossy/lossless formats, compression, standards and more.
National Archives of Australia: Long-Term File Formats
This Guideline identifies file formats that the Archives has reasonable confidence will continue to be accessible over time. 'Acceptable' preservation formats are formats that the Archives has determined are a low risk of becoming obsolete in the long term. Records in 'acceptable' formats are not normalised but are stored as they are and are monitored over time to confirm their continued accessibility.
Selecting File Formats for Long-term Preservation
Useful document from the UK National Archives. Published in 2008 but the general principles apply.
We acknowledge the Australian Aboriginal and Torres Strait Islander peoples as the first inhabitants of the nation and acknowledge Traditional Owners of the lands where our staff and students, live, learn and work.