Documenting your research data is required at various stages in the Research Data Management Lifecycle. For example, you will need some form of documentation:
Please note that you will need to create metadata (a subset of documentation) if you plan to publish or share your data, or to archive your data in a repository like Research Data JCU.
Adequate metadata ensures your data is discoverable and that others can interpret/validate, re-use and cite it correctly.
Good metadata includes data-level documentation as well as study-level documentation - it should not just describe the project or a publication.
Research Data JCU - Data descriptions
The description is the most important part of your archival Data Record and Data Publication and should cover:
Nb. If your Data Publication lacks sufficient detail a data librarian will contact you for further information before publishing it.
Study-level documentation for data is often included in research data management plans and provides a high-level overview and context for the data. It is an important component of the metadata used to describe data and is key to enabling secondary users to make informed use of shared data. Some systems (like Research Data JCU) integrate RDMPs and metadata collection so that researchers don't have to re-enter this information!
Together, study-level and data documentation answer the why, how, when and who questions for your data. They overlap to some extent, but good study-level data documentation would include:
While it may be tempting to stop at the study-level, metadata also needs to include data-level documentation as this is critical for validating, reproducing and re-using data. It could include (as applicable):
Data-level documentation may also be embedded in the data itself -- as explained in the 'Where is Metadata Stored' box on this page.
(Source: These guidelines have been adapted from material prepared by the UK Data Service and listed in the Resources section of this page.)
The terms "data documentation", "data provenance" and "data lineage" are often (understandably) confused. Definitions vary, but they could be considered as a continuum, with data documentation at the broadest level. According to the RDA Research Data Provenance Interest Group, provenance is concerned with questions of data origins, maintenance of identity through the data lifecycle, and how we account for data modification. The Data Wrangling Handbook v0.1 likens this to the chain of custody in criminal investigations (previous owners have to be identified and held accountable for the processing and cleaning operations they have performed on the data!). Technical data lineage relies on metadata that tracks data flows on the lowest level - tables, scripts, and statements, etc.
Metadata can be stored in local systems with the data it is about - or in data or metadata stores when it is complete. Research Data JCU is an example of an institutional metadata store and contains records (Data Records and Data Publications) for datasets generated by JCU researchers and HDR candidates. Data Publications in Research Data JCU are harvested regularly and published by Research Data Australia. The Research Data JCU system also provides secure storage for datasets which (unless restricted) are accessed directly via the catalogue or by negotiation with the data manager.
Data-level documentation/metadata such as workflows, detailed methodologies, variable descriptions, codes and units are often stored with the data (embedded) or included in their own data file (e.g. codebook, README text etc. as supporting documentation).
Embedded documentation can be as simple as a key in a MS Excel spreadsheet (an additional worksheet) or it may be more complex (e.g. for software packages that include facilities for data annotation as variable attributions, table relationships etc). If possible export this as a plain text file and include it with your supporting documentation, as this facilitates F.A.I.R. data.
This ANDS webinar covers data provenance, the Data Documentation Initiative (DDI) and the C2Metadata Project (automates capture of metadata describing variable transformations) being undertaken at ICPSR. Quite technical but may be of interest to social scientists and data managers (54 min.)
We acknowledge the Australian Aboriginal and Torres Strait Islander peoples as the first inhabitants of the nation and acknowledge Traditional Owners of the lands where our staff and students, live, learn and work.