Skip to main content

Research Data Management Toolkit: DOIs and Data Citation

This guide provides information about research data management and the Tropical Data Hub (TDH) Research Data repository

What is Data Citation?

Data citation refers to the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to outputs such as journal articles, reports and conference papers. 

Data citation is important because:

  • it is a key practice leading to recognition of data as a primary research output
  • routine citation of data facilitates reproducible and transparent research
  • only cited data can be counted and tracked to measure impact
  • citations for your published data can be included in researcher profiles (e.g. ORCID) curricula vitae etc.
  • evidence suggests that citing data in related publications increases the citation rate of those publications
  • citation information may be incorporated into research evaluation and reward practices in future  - see for example, the DORA (Declaration on Research Assessment)

Digital Object Identifiers (DOIs) are not essential, but are considered best practice for data citation.

DOIs and the Tropical Data Hub

  DOI Minting Service

JCU is a member of ANDS Cite My Data Service, which allows Australian research organisations to mint DOIs for datasets, so they can easily be cited. 

We can mint DOIs for datasets in the Tropical Data Hub (TDH) Research Data repository under the following conditions:

  • the data is open (available by direct download) or can be made available via conditional access. The data must be available in some way in order to be citable. See this section of the Toolkit for more information about controlling access to your data.
  • it hasn't had a DOI minted elsewhere i.e. the Tropical Data Hub is the primary point of publication

We can review your data deposit and mint a DOI urgently if you need this for a manuscript submission. Please email researchdata@jcu.edu.au so we can prioritise this for you. Private links for peer reviewers (if data is embargoed) are also available on request. 

Highly confidential or sensitive data where access is restricted to the data owner or custodian can be stored securely in the Tropical Data Hub archive and only the metadata made public. Other researchers can discover your work and contact you to discuss or even collaborate. Importantly, your data is archived should it ever be challenged. We will not mint DOIs for these private datasets. Datasets without DOIs can still be cited and all datasets are harvested by Research Data Australia.

JCU researchers  may also deposit metadata records in the Tropical Data Hub to describe data held in other repositories (such as Dryad and GenBank) and link to these datasets. This increases their visibility and ensures they are harvested by Research Data Australia and the Research Portfolio site. We will not mint DOIs for these "secondary" datasets.

Data Citation Styles and Formats

Minimum Format

DataCite provides a recommended minimum format for citing data:

  • Required elements: Creator | Publication Year | Title | Publisher | Identifier (a URL, DOI or other persistent identifier)
  • Options elements: Version | Resource Type (e.g. "dataset")

Citation Styles

You should follow your style manual or publisher's advice for citing data. If no format is suggested for datasets, take a standard data citation style and adapt it to match the style for textual publications.

The DataCite DOI Citation Formatter is a simple online tool for formatting your citation (just paste in the DOI) in hundreds of different styles.

Datasets without DOIs can be cited. The style guide for the American Psychological Association (APA) (6th ed.) includes an example:

Pew Hispanic Center. (2008). 2007 Hispanic Healthcare Survey [Data file and code book]. Available from Pew Hispanic Center Web site: http://pewhispanic.org/datasets/.

Pew Hispanic Center. (2008). 2007 Hispanic Healthcare Survey [Data file and code book]. Retrieved from http://pewhispanic.org/datasets/

In these examples "available from" is used when the URL takes you to a download site.  "Retrieved from" indicates that the URL goes directly to the data files.

Unpublished raw data can also be cited e.g. Smith, J.A. (2006). [Personnel survey]. Unpublished raw data.

EndNote includes a reference type 'dataset' for versions X4 and above.


Repository Styles

Data repositories may specify or output a particular style. Here are some examples from JCU researchers:

Dryad: Waldie PA, Almany GR, Sinclair-Taylor TH, Hamilton RJ, Potuku T, Priest MA, Rhodes KL, Robinson J, Cinner JE, Berumen ML (2016) Data from: Restricted grouper reproductive migrations support community-based management. Dryad Digital Repository. https://doi.org/10.5061/dryad.26j85

PANGAEA: McKenzie, Len J; Roder, Chantal A; Yoshida, Rudolf L (2016): Seagrass and associated benthic community data derived from field surveys at Low Isles, Great Barrier Reef, conducted July-August, 1997. Centre for Tropical Water and Aquatic Ecosystem Research, James Cook University, Townsville, PANGAEA, https://doi.org/10.1594/PANGAEA.858945

Tropical Data Hub: Martinsen, B. (2018). Student and teacher perceptions of blended learning in secondary science, Far North Queensland. James Cook University. [Data Files] http://dx.doi.org/10.4225/28/5ae672962c10f


Dynamic Data

ANDS includes some useful advice about citing dynamic data in their comprehensive guide. Dynamic data is data that is subject to change, either by regularly and systematically appending existing data (a growing dataset) or by modifying or updating an existing dataset (an evolving dataset). As you can imagine it can be challenging to identify and cite these datasets precisely. ANDS provide two fictional examples to illustrate:
Doe, J. (2009-2011): Dynamic Data Set Title. Version: 1.2 
Responsible Data Archive [evolving dataset] doi.10.1001/1234@version=1.2

Doe, J. (2009-2011): Dynamic Data Set Title. Subset: 2010-01-01 -2010-12-13 
Responsible Data Archive [growing dataset] doi.10.1001/1234@range=2010-01-01-2010-12-13

Resources

 ANDS have developed several guides and videos about DOIs and data citation:

Open Researcher and Contributor Identifier (ORCID) and Data

See the Researcher Profiles, Identifiers and Engagement LibGuide for more information on ORCID.

Datasets in the Tropical Data Hub (TDH) Research Data repository can be easily imported from Research Data Australia into your ORCID profile as shown in this short (3:13 min.) video:

Linking Publications and Data

Linking between research publications and data improves discovery and access to both the literature and data - as it can "drive traffic" between them, in both directions. This facilitates the re-use, reproducibility and transparency of research. It also ensures researchers receive improved attribution for their published data.

Records in the Tropical Data Hub (TDH) Research Data repository include rich metadata including links to associated publications, websites and datasets. This metadata is harvested by Research Data Australia and users can follow the links or visualize the relationships (powered by Research Graph) as shown:

The aim of the Scholix (Scholarly Link Exchange) initiative is to improve these links.  The framework is a global community and multi-stakeholder driven effort involving journal publishers, data centers, and service providers. If implemented, linkage no longer requires author input or typesetting by the journal as this is done programmatically. Scopus was an early adopter and you can see an example (scroll to the research data link) of this in the journal Polyhedron if you subscribe.

APA Style Guides

    return to Toolkit Contents

We acknowledge the Australian Aboriginal and Torres Strait Islander peoples as the first inhabitants of the nation and acknowledge Traditional Owners of the lands where our staff and students, live, learn and work.Acknowledgement of Country