Data citation refers to the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to other research outputs such as journal articles, reports and conference papers.
Data citation is important because:
Digital Object Identifiers (DOIs) are not essential for published research datasets, but are considered best practice for data citation.
Research Data JCU - DOI Minting ServiceJCU is a member of ANDS Cite My Data Service, which allows Australian research organisations to mint DOIs for datasets, so they can easily be cited. We can mint DOIs for datasets in Research Data JCU under the following conditions:
Highly confidential or sensitive data where access is restricted can be securely stored for you. Contact researchdata@jcu.edu.au to discuss storage options. Making metadata about your dataset public (not data) allows other researchers to discover your work and they contact you to discuss or even collaborate with you in future. Importantly, your data is archived should it ever be challenged. We will not mint DOIs for these private datasets. Datasets without DOIs can still be cited and all datasets are harvested by Research Data Australia. JCU researchers may also deposit metadata records in Research Data JCU to describe data held in other repositories (such as Dryad and GenBank) and link to these datasets. This increases their visibility and ensures they are harvested by Research Data Australia and the Research Portfolio site. We cannot mint DOIs for these "secondary" datasets. |
DataCite provides a recommended minimum format for citing data:
You should follow your style manual or publisher's advice for citing data. If no format is suggested for datasets, take a standard data citation style and adapt it to match the style for textual publications.
The DataCite DOI Citation Formatter is a simple online tool for formatting your citation (just paste in the DOI) in hundreds of different styles.
Datasets without DOIs can also be cited. The Style Guide for the American Psychological Association (APA) (6th ed.) includes an example:
Pew Hispanic Center. (2008). 2007 Hispanic Healthcare Survey [Data file and code book]. Available from Pew Hispanic Center Web site: http://pewhispanic.org/datasets/.
Pew Hispanic Center. (2008). 2007 Hispanic Healthcare Survey [Data file and code book]. Retrieved from http://pewhispanic.org/datasets/
In these examples "available from" is used when the URL takes you to a download site. "Retrieved from" indicates that the URL goes directly to the data files.
Unpublished raw data can also be cited e.g. Smith, J.A. (2006). [Personnel survey]. Unpublished raw data.
EndNote includes a reference type 'dataset' for versions X4 and above.
Data repositories may specify or output a particular style. Here are some examples from JCU researchers:
Dryad: Waldie PA, Almany GR, Sinclair-Taylor TH, Hamilton RJ, Potuku T, Priest MA, Rhodes KL, Robinson J, Cinner JE, Berumen ML (2016) Data from: Restricted grouper reproductive migrations support community-based management. Dryad Digital Repository. https://doi.org/10.5061/dryad.26j85
PANGAEA: McKenzie, Len J; Roder, Chantal A; Yoshida, Rudolf L (2016): Seagrass and associated benthic community data derived from field surveys at Low Isles, Great Barrier Reef, conducted July-August, 1997. Centre for Tropical Water and Aquatic Ecosystem Research, James Cook University, Townsville, PANGAEA, https://doi.org/10.1594/PANGAEA.858945
Research Data JCU: Martinsen, B. (2018). Student and teacher perceptions of blended learning in secondary science, Far North Queensland. James Cook University. (dataset) http://dx.doi.org/10.4225/28/5ae672962c10f
ANDS includes some useful advice about citing dynamic data in their comprehensive guide. Dynamic data is data that is subject to change, either by regularly and systematically appending existing data (a growing dataset) or by modifying or updating an existing dataset (an evolving dataset). As you can imagine it can be challenging to identify and cite these datasets precisely. ANDS provide two fictional examples to illustrate:
Doe, J. (2009-2011): Dynamic Data Set Title. Version: 1.2
Responsible Data Archive [evolving dataset] doi.10.1001/1234@version=1.2
Doe, J. (2009-2011): Dynamic Data Set Title. Subset: 2010-01-01 -2010-12-13
Responsible Data Archive [growing dataset] doi.10.1001/1234@range=2010-01-01-2010-12-13
See the Researcher Profiles, Identifiers and Engagement LibGuide for more information on ORCID.
Data Publications in Research Data JCU can be easily imported (via Research Data Australia) into your ORCID profile! For instructions on importing here, see this short (3:13 min.) video:
Linking between research publications and data improves discovery and access to both the literature and data - as it can "drive traffic" between them, in both directions. This facilitates the re-use, reproducibility and transparency of research. It also ensures researchers receive improved attribution for their published data.
Records in the Tropical Data Hub (TDH) Research Data repository include rich metadata including links to associated publications, websites and datasets. This metadata is harvested by Research Data Australia and users can follow the links or visualize the relationships (powered by Research Graph) as shown:
The aim of the Scholix (Scholarly Link Exchange) initiative is to improve these links. The framework is a global community and multi-stakeholder driven effort involving journal publishers, data centers, and service providers. If implemented, linkage no longer requires author input or typesetting by the journal as this is done programmatically. Scopus was an early adopter and you can see an example (scroll to the research data link) of this in the journal Polyhedron if you subscribe.
ANDS have developed several guides and videos about DOIs and data citation:
We acknowledge the Australian Aboriginal and Torres Strait Islander peoples as the first inhabitants of the nation and acknowledge Traditional Owners of the lands where our staff and students, live, learn and work.