Home > Articles

This chapter is from the book

2.2 Presenting Numerical Variables

You present numerical variables by first establishing groups that represent separate ranges of values and then placing each value into the proper group. Then you create tables that summarize the groups by frequency (count) or percentage and use the table as the basis for creating charts such as a histogram, which this chapter explains.

The Frequency and Percentage Distribution

Concept A table of grouped numerical data that contains the names of each group in the first column, the counts (frequencies) of each group in the second column, and the percentages of each group in the third column. This table can also appear as a two-column table that shows either the frequencies or the percentages.

Example Consider the following data table, which presents the average ticket cost (in U.S. $) for each NBA team during a recent season.

file_icon.jpg

NBA Ticket Cost

Team

Average Ticket Cost

Team

Average Ticket Cost

Atlanta

143

Miami

187

Boston

234

Milwaukee

153

Brooklyn

212

Minnesota

107

Charlotte

89

New Orleans

48

Chicago

251

New York

285

Cleveland

135

Oklahoma City

199

Dallas

124

Orlando

127

Denver

152

Philadelphia

197

Detroit

135

Phoenix

61

Golden State

463

Portland

119

Houston

177

Sacramento

198

Indiana

130

San Antonio

195

L.A. Clippers

137

Toronto

180

L.A. Lakers

444

Utah

78

Memphis

104

Washington

138

Source: Data extracted from “The Most Expensive NBA Teams to See Live,” https://bit.ly/3rvSAah.

The following frequency and percentage distribution summarizes these data using 10 groupings from 0 to under 50 to 450 to under 500.

Average Ticket Cost

Frequency

Percentage

0 to under 50

1

3.33%

50 to under 100

3

10.00%

100 to under 150

11

36.67%

150 to under 200

9

30.00%

200 to under 250

2

6.67%

250 to under 300

2

6.67%

300 to under 350

0

0%

350 to under 400

0

0%

400 to under 450

1

3.33%

450 to under 500

1

3.33%

 

30

100.00%

Interpretation Frequency and percentage distributions enable you to quickly determine differences among the many groups of values. In this example, you can quickly see that most of the average ticket costs are between $100 and $300 and that very few average ticket costs are either below $50 or above $200.

You need to be careful in forming distribution groups because the ranges of the groups affect how you perceive the data. For example, had you grouped the average ticket costs into only two groups, below $150 and $150 and above, you would not be able to see any pattern in the data.

Histogram

Concept A special bar chart for grouped numerical data in which the groups are represented as individual bars on the horizontal X axis and the frequencies or percentages for each group are plotted on the vertical Y axis. In a histogram, in contrast to a bar chart of categorical data, no gaps exist between adjacent bars.

Example The following histogram presents the average ticket cost data of the preceding example. The value below each bar (25, 75, 125, 175, 225, 275, 325, 375, 425, and 475) is the midpoint—the approximate middle value for the group the bar represents. As with the frequency and percentage distributions, you can quickly see that very few average ticket prices are above $275.

Interpretation A histogram reveals the overall shape of the frequencies in the groups. A histogram is considered symmetric if each side of the chart is an approximate mirror image of the other side. The histogram of this example has more values in the lower portion than in the upper portion, so it is considered to be non-symmetric, or skewed.

The Time-Series Plot

Concept A chart in which each point represents the value of a numerical variable at a specific time. By convention, the X axis (the horizontal axis) always represents units of time, and the Y axis (the vertical axis) always represents units of the variable.

Example Consider the following data table, which presents the number of domestic movie releases from 1990 to 2020.

file_icon.jpg

Movie Releases

Year

Movies Released

Year

Movies Released

1990

224

2006

608

1991

244

2007

631

1992

234

2008

607

1993

258

2009

520

1994

254

2010

538

1995

279

2011

601

1996

310

2012

669

1997

303

2013

687

1998

336

2014

708

1999

384

2015

708

2000

371

2016

737

2001

355

2017

740

2002

480

2018

873

2003

507

2019

792

2004

551

2020

200

2005

547

 

 

Source: Data extracted from “Domestic Yearly Box Office,” https://www.boxofficemojo.com/year/.

The following time-series plot visualizes these data.

Interpretation Time-series plots can reveal patterns over time—patterns that you might not see when looking at a long list of numerical values. In this example, the plot reveals that, overall, there was a general increase in the number of movies released between 1990 and 2019. Before the steep drop in 2020 caused by the COVID-19 pandemic, the number of movies released in the preceding 30 years had increased fourfold.

The Scatter Plot

Concept A chart that plots the values of two numerical variables for each observation. In a scatter plot, the X axis (the horizontal axis) always represents units of one variable, and the Y axis (the vertical axis) always represents units of the second variable.

ExampleConsider the following data table, which presents the average ticket cost (in U.S. $) and the premium ticket cost (in U.S. $) for each NBA team during a recent season.

file_icon.jpg

NBA Ticket Cost

Team

Average Ticket Cost

Premium Ticket Cost

Atlanta

143

267

Boston

234

448

Brooklyn

212

391

Charlotte

89

173

Chicago

251

493

Cleveland

135

268

Dallas

124

245

Denver

152

296

Detroit

135

266

Golden State

463

874

Houston

177

346

Indiana

130

252

L.A. Clippers

137

271

L.A. Lakers

444

857

Memphis

104

203

Miami

187

371

Milwaukee

153

301

Minnesota

107

204

New Orleans

48

89

New York

285

561

Oklahoma City

199

390

Orlando

127

249

Philadelphia

197

383

Phoenix

61

110

Portland

119

233

Sacramento

198

380

San Antonio

195

384

Toronto

180

338

Utah

78

142

Washington

138

271

The following scatter plot visualizes these data.

Interpretation A scatter plot helps reveal patterns in the relationship between two numerical variables. The scatter plot for these data reveals a strong positive linear (straight-line) relationship between the average ticket cost and the cost of a premium ticket. Based on this relationship, you can conclude that the average ticket cost is a useful predictor of the premium ticket cost. (Chapter 10 more fully discusses using one numerical variable to predict the value of another numerical variable.)

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.