Home > Articles

Reshaping and Cleaning Common Data

  • Print
  • + Share This
Discusses Hadley Wickham’s (a prominent member of the R programming language community) “Tidy Data” paper, which deals with reshaping and cleaning common data problems.

Save 35% off the list price* of the related book or multi-format eBook (EPUB + MOBI + PDF) with discount code ARTICLE.
* See informit.com/terms

This chapter is from the book

6.1 Introduction

As mentioned in Chapter 4, Hadley Wickham,1 one of the more prominent members of the R community, introduced the concept of tidy data in a paper in the Journal of Statistical Software.2 Tidy data is a framework to structure data sets so they can be easily analyzed and visualized. It can be thought of as a goal one should aim for when cleaning data. Once you understand what tidy data is, that knowledge will make your data analysis, visualization, and collection much easier.

What is tidy data? Hadley Wickham’s paper defines it as meeting the following criteria:

  • Each row is an observation.

  • Each column is a variable.

  • Each type of observational unit forms a table.

This chapter goes through the various ways to tidy data as identified in Wickham’s paper.

  • + Share This
  • 🔖 Save To Your Account