Home > Articles

  • Print
  • + Share This
This chapter is from the book

12.3 From Rows to Columns: spread()

It is also possible to transform a data table from long format into wide format—that is, to spread out the prices into multiple columns. Thus, while the gather() function collects multiple features into two columns, the spread() function creates multiple features from two existing columns. For example, you can take the long format data shown in Table 12.2 and spread it out so that each observation is a band, as in Table 12.3:

<# Reshape long data (Table 12.2), spreading prices out among multiple features
price_by_band <- spread(
  band_data_long, # data frame to spread from
  key = city,     # column indicating where to get new feature names
  value = price   # column indicating where to get new feature values
)

Table 12.3 A “wide” data set of concert ticket prices for a set of bands. Each observation (i.e., unit of analysis) is a band, and each feature is the ticket price in a given city.

band

Denver

Minneapolis

Portland

Seattle

billy_strings

25

15

25

15

fruition

40

20

50

30

greensky_bluegrass

20

30

40

40

trampled_by_turtles

40

100

20

30

The spread() function takes arguments similar to those passed to the gather() function, but applies them in the opposite direction. In this case, the key and value arguments are where to get the column names and values, respectively. The spread() function will create a new column for each unique value in the provided key column, with values taken from the value feature. In the preceding example, the new column names (e.g., "Denver", "Minneapolis") were taken from the city feature in the long format table, and the values for those columns were taken from the price feature. This process is illustrated in Figure 12.2.

Figure 12.2

Figure 12.2 The spread() function spreads out a single column into multiple columns. It creates a new column for each unique value in the provided key column (city). The values in each new column will be populated with the provided value column (price).

By combining gather() and spread(), you can effectively change the “shape” of your data and what concept is represented by an observation.

  • + Share This
  • 🔖 Save To Your Account