Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
JCU logo

Research Data Management Toolkit

This guide provides information about research data management and the Research Data JCU platform

Introduction: Documentation and Metadata

Documenting your research data is required at various stages in the Research Data Management Lifecycle. For example, you will need some form of documentation:

  • to prepare and update a research data management plan
  • during the project in order to stay organized
  • on completion for your own recollection and re-use, and for sharing informally with colleagues
  • to include in reports to funders, technical reports, working papers or publications
  • before archiving data to ensure it is preserved correctly
  • in order to publish or share your data

Please note that you will need to create metadata (a subset of documentation) if you plan to publish or share your data, or to archive your data in a repository like Research Data JCU

 Adequate metadata ensures your data is discoverable and that others can interpret/validate, re-use and cite it correctly.

Good metadata includes data-level documentation as well as study-level documentation - it should not just describe the project or a publication. 


  Research Data JCU - Data descriptions

   The description is the most important part of your archival Data Record and Data Publication and should cover:

  • Why the data was collected to provide context to your data
  • What the dataset consists, for example:
    • ‚Äčtypes of files (e.g. transcripts, recordings, spreadsheets etc.), file structures, formats and any relationships between them
    •  data variables e.g. how they are coded and their units. This information could be included in a codebook, README.txt or other supporting documentation (and loaded as an attachment) or embedded in the data itself instead - see the 'Where is Metadata Stored' box on this page
  • How the data was collected and processed, for example:
    • how subjects were selected/rejected, assigned to treatments/controls, how instruments were calibrated, how measurements were taken
    • how data was processed, analysed, cleaned etc. (R scripts can be loaded as attachments)
   Nb. If your Data Publication lacks sufficient detail a data librarian will contact you for further information before publishing it.


Study-level documentation

Study-level documentation for data is often included in research data management plans and provides a high-level overview and context for the data. It is an important component of the metadata used to describe data and is key to enabling secondary users to make informed use of shared data. Some systems (like Research Data JCU) integrate RDMPs and metadata collection so that researchers don't have to re-enter this information!

Together, study-level and data documentation answer the why, how, when and who questions for your data. They overlap to some extent, but  good study-level data documentation would include:

  • the context of the project: its history and funding, aims, objectives, hypotheses, spatial and temporal coverage etc.
  • personnel: creators, data owners (IP) and custodians, roles and responsibilities, and contact details
  • data collection methods: protocols, sampling design, workflows, instruments, hardware and software used 
  • subject descriptions: keywords, Fields of Research, Socio-Economic Objective codes, discipline-based vocabulary terms
  • structure of data files and the relationships between them
  • quality assurance: calibration, validation, cleaning or other QA processes carried out on data files
  • data provenance: origin and history of the data, use of existing datasets, modifications made over time and identification of different versions
  • access: conditions for access and use or details regarding data confidentiality, licensing arrangements
  • references to publications or other research outputs

Data-level documentation

While it may be tempting to stop at the study-level, metadata also needs to include data-level documentation as this is critical for validating, reproducing and re-using data. It could include (as applicable):

  • names, labels and descriptions for variables
  • definitions of codes and classification schemes used
  • definitions of specialised terminology or acronyms
  • codes and reasons for missing values (see also the Data Wrangling section of the Toolkit)
  • code and scripts used to derive data after collection (simple derivations such as grouping by age levels can be explained in variable and value labels)
  • weighting and grossing variables created

Data-level documentation may also be embedded in the data itself -- as explained in the 'Where is Metadata Stored' box on this page.

(Source: These guidelines have been adapted from material prepared by the UK Data Service and listed in the Resources section of this page.)

undefinedThe terms "data documentation", "data provenance" and "data lineage" are often (understandably) confused. Definitions vary, but they could be considered as a continuum, with data documentation at the broadest level. According to the RDA Research Data Provenance Interest Group, provenance is concerned with questions of data origins, maintenance of identity through the data lifecycle, and how we account for data modification. The Data Wrangling Handbook v0.1 likens this to the chain of custody in criminal investigations (previous owners have to be identified and held accountable for the processing and cleaning operations they have performed on the data!). Technical data lineage relies on metadata that tracks data flows on the lowest level - tables, scripts, and statements, etc.


("Rosetta Stone" by Nrbelex is licensed under CC BY-SA 2.0.)

Where is Metadata Stored?

Metadata can be stored in local systems with the data it is about - or in data or metadata stores when it is complete. Research Data JCU is an example of an institutional metadata store and contains records (Data Records and Data Publications) for datasets generated by JCU researchers and HDR candidates. Data Publications in Research Data JCU are harvested regularly and published by Research Data Australia. The Research Data JCU system also provides secure storage for datasets which (unless restricted) are accessed directly via the catalogue or by negotiation with the data manager.

Data-level documentation/metadata such as workflows, detailed methodologies, variable descriptions, codes and units are often stored with the data (embedded) or included in their own data file (e.g. codebook, README text etc. as supporting documentation). 

Embedded documentation can be as simple as a key in a MS Excel spreadsheet (an additional worksheet) or it may be more complex (e.g. for software packages that include facilities for data annotation as variable attributions, table relationships etc). If possible export this as a plain text file and include it with your supporting documentation, as this facilitates F.A.I.R. data.

Documentation: Resources

This ANDS webinar covers data provenance, the Data Documentation Initiative (DDI) and the C2Metadata Project (automates capture of metadata describing variable transformations) being undertaken at ICPSR. Quite technical but may be of interest to social scientists and data managers (54 min.)

We acknowledge the Australian Aboriginal and Torres Strait Islander peoples as the first inhabitants of the nation and acknowledge Traditional Owners of the lands where our staff and students, live, learn and work.Acknowledgement of Country