Data "wrangling" -- also known as data cleaning -- may be required at different stages of the Research Data Management Lifecycle. This process identifies and corrects errors or makes formatting more consistent. It is often required to prepare data for analysis and/or visualization.
Data also needs to be cleaned before archiving it to ensure it is preserved correctly, that it is not misinterpreted by other users, and to facilitate interoperability (one of the FAIR Data Principles). See the list of Resources for some useful tools and tutorials.
White et al. (2013) published an excellent paper "Nine simple ways to make it easier to (re)use your data". in Ideas in Ecology and Evolution. The authors noted that much of the shared data in ecology and evolutionary biology is not easily reused because they don't follow best practices in terms of data structure, metadata and licences.Their nine specific recommendations are:
Their advice is on point and highly readable - and it includes some specifics on data wrangling.
Source: White, E., Baldridge, E., Brym, Z., Locey, K.,McGlinn, D. & and Supp, S. (2013) Nine simple ways to make it easier to (re)use your data. Ideas in Ecology and Evolution, 6(2):1-10 doi:10.4033/iee.2013.6b.6.f
We acknowledge the Australian Aboriginal and Torres Strait Islander peoples as the first inhabitants of the nation and acknowledge Traditional Owners of the lands where our staff and students, live, learn and work.