Home > Articles > Business & Management

Sports Analytics and Data Science: Understanding Sports Markets

  • Print
  • + Share This
Thomas Miller discusses the unique features of sports markets and how successful sports analytics blends business and sports savvy, modern information technology, and sophisticated modeling techniques.
This chapter is from the book
  • “Those of you on the floor at the end of the game, I’m proud of you. You played your guts out. I’m only going to say this one time. All of you have the weekend. Think about whether or not you want to be on this team under the following condition: What I say when it comes to this basketball team is the law, absolutely and without discussion.”
  • —GENE HACKMAN AS COACH NORMAN DALE IN Hoosiers (1986)

In applying the laws of economics to professional sports, we must consider the nature of sports and the motives of owners. Professional sports are different from other forms of business.

There are sellers and buyers of sports entertainment. The sellers are the players and teams within the leagues of professional sports. The buyers are consumers of sports, many of whom never go to games in person but who watch sports on television, listen to the radio, and buy sports team paraphernalia.

Sports compete with other forms of entertainment for people’s time and money. And various sports compete with one another, especially when their seasons overlap. Sports teams produce entertainment content that is distributed through the media. Sports teams license their brand names and logos to other organizations, including sports apparel manufacturers.

Sports teams are not independent businesses competing with one another. While players and teams compete on the fields and courts of play, they cooperate with one another as members of leagues. The core product of sports is the sporting contest, a joint product of two or more players or two or more teams.

Fifty-four sports and recreation activities, shown in table 1.1, are tracked by the National Sporting Goods Association (2015), which serves the sporting goods industry. In recent years, participation in baseball, basketball, football, and tennis has declined, while participation in soccer has increased. There has been growth in individual recreational sports, such as skateboarding and snowboarding. Of course, levels of participation in sports are not necessarily an indicator of levels of interest in sports as entertainment.

Table 1.1. Sports and Recreation Activities in the United States

Aerobic Exercising

Ice/Figure Skating

Archery (Target)

In-Line Roller Skating

Backpack/Wilderness Camping

Kayaking

Baseball

Lacrosse

Basketball

Martial Arts/MMA/Tae Kwon Do

Bicycle Riding

Mountain Biking (Off Road)

Billiards/Pool

Muzzleloading

Boating (Motor/Power)

Paintball Games

Bowling

Running/Jogging

Boxing

Scuba Diving (Open Water)

Camping (Vacation/Overnight)

Skateboarding

Canoeing

Skiing (Alpine)

Cheerleading

Skiing (Cross Country)

Dart Throwing

Snowboarding

Exercise Walking

Soccer

Exercising with Equipment

Softball

Fishing (Fresh Water)

Swimming

Fishing (Salt Water)

Table Tennis/Ping Pong

Football (Flag)

Target Shooting (Airgun)

Football (Tackle)

Target Shooting (Live Ammunition)

Football (Touch)

Tennis

Golf

Volleyball

Gymnastics

Water Skiing

Hiking

Weight Lifting

Hockey (Ice)

Work Out at Club/Gym/Fitness Studio

Hunting with Bow & Arrow

Wrestling

Hunting with Firearms

Yoga

Sports businesses produce entertainment products by cooperating with one another. While it is illegal for businesses in most industries to collude in setting output and prices, sports leagues engage in cooperative output and pricing as a standard part of their business model. The number of games, indeed the entire schedule of games in a sport, is determined by the league. In fact, aspects of professional sports are granted monopoly power by the federal government in the United States.

When developing a model for a typical business or firm, an economist would assume profit maximization as a motive. But for a professional sports team, an owner’s motives may not be so easily understood. While one owner may operate his or her team for profit year by year, another may seek to maximize wins or overall utility. Another may look for capital appreciation—buying, then selling after a few years. Lacking knowledge of owners’ motives, it is difficult to predict what they will do.

Gaining market share and becoming the dominant player is a goal of firms in many industries. Not so in the business of professional sports. If one team were assured of victory in almost all of its contests, interest in those contests could wane. A team benefits by winning more often than losing, but winning all the time may be less beneficial than winning most of the time. Professional sports leagues claim to be seeking competitive balance, although there are dominant teams in many leagues.

Sports is big business as shown by valuations and finances of the major professional sports in the United States and worldwide. Data from Forbes for Major League Baseball (MLB), the National Basketball Association (NBA), the National Football League (NFL), and worldwide soccer teams are shown in tables 1.2 through 1.5.

Table 1.2. MLB Team Valuation and Finances (March 2015)

Team Rank

Team

Current Value ($ Millions)

One-Year Change in Value (Percentage)

Debt/Value (Percentage)

Revenue ($ Millions)

Operating Income ($ Millions)

1

New York Yankees

3,200

28

0

508

8.1

2

Los Angeles Dodgers

2,400

20

17

403

-12.2

3

Boston Red Sox

2,100

40

0

370

49.2

4

San Francisco Giants

2,000

100

4

387

68.4

5

Chicago Cubs

1,800

50

24

302

73.3

6

St Louis Cardinals

1,400

71

21

294

73.6

7

New York Mets

1,350

69

26

263

25.0

8

Los Angeles Angels

1,300

68

0

304

16.7

9

Washington Nationals

1,280

83

27

287

41.4

10

Philadelphia Phillies

1,250

28

8

265

-39.0

11

Texas Rangers

1,220

48

13

266

3.5

12

Atlanta Braves

1,150

58

0

267

33.2

13

Detroit Tigers

1,125

65

15

254

-20.7

14

Seattle Mariners

1,100

55

0

250

26.4

15

Baltimore Orioles

1,000

61

15

245

31.4

16

Chicago White Sox

975

40

5

227

31.9

17

Pittsburgh Pirates

900

57

10

229

43.6

18

Minnesota Twins

895

48

25

223

21.3

19

San Diego Padres

890

45

22

224

35.0

20

Cincinnati Reds

885

48

6

227

2.2

21

Milwaukee Brewers

875

55

6

226

11.3

22

Toronto Blue Jays

870

43

0

227

-17.9

23

Colorado Rockies

855

49

7

214

12.6

24

Arizona Diamondbacks

840

44

17

211

-2.2

25

Cleveland Indians

825

45

9

207

8.9

26

Houston Astros

800

51

34

175

21.6

27

Oakland Athletics

725

46

8

202

20.8

28

Kansas City Royals

700

43

8

231

26.6

29

Miami Marlins

650

30

34

188

15.4

30

Tampa Bay Rays

625

29

22

188

7.9

Source. Badenhausen, Ozanian, and Settimi (2015b).

Table 1.3. NBA Team Valuation and Finances (January 2015)

Team Rank

Team

Current Value ($ Millions)

One-Year Change in Value (Percentage)

Debt/Value (Percentage)

Revenue ($ Millions)

Operating Income ($ Millions)

1

Los Angeles Lakers

2,600

93

2

293

104.1

2

New York Knicks

2,500

79

0

278

53.4

3

Chicago Bulls

2,000

100

3

201

65.3

4

Boston Celtics

1,700

94

9

173

54.9

5

Los Angeles Clippers

1,600

178

0

146

20.1

6

Brooklyn Nets

1,500

92

19

212

-99.4

7

Golden State Warriors

1,300

73

12

168

44.9

8

Houston Rockets

1,250

61

8

175

38.0

9

Miami Heat

1,175

53

8

188

12.6

10

Dallas Mavericks

1,150

50

17

168

30.4

11

San Antonio Spurs

1,000

52

8

172

40.9

12

Portland Trail Blazers

940

60

11

153

11.7

13

Oklahoma City Thunder

930

58

15

152

30.8

14

Toronto Raptors

920

77

16

151

17.9

15

Cleveland Cavaliers

915

78

22

149

20.6

16

Phoenix Suns

910

61

20

145

28.2

17

Washington Wizards

900

86

14

143

10.1

18

Orlando Magic

875

56

17

143

20.9

19

Denver Nuggets

855

73

1

136

14.0

20

Utah Jazz

850

62

6

142

32.7

21

Indiana Pacers

830

75

18

139

25.0

22

Atlanta Hawks

825

94

21

133

14.8

23

Detroit Pistons

810

80

23

144

17.6

24

Sacramento Kings

800

45

29

125

8.9

25

Memphis Grizzlies

750

66

23

135

10.5

26

Charlotte Hornets

725

77

21

130

1.2

27

Philadelphia 76ers

700

49

21

125

24.4

28

New Orleans Pelicans

650

55

19

131

19.0

29

Minnesota Timberwolves

625

45

16

128

6.9

30

Milwaukee Bucks

600

48

29

110

11.5

Source. Badenhausen, Ozanian, and Settimi (2015a).

Table 1.4. NFL Team Valuation and Finances (August 2014)

Team Rank

Team

Current Value ($ Millions)

One-Year Change in Value (Percentage)

Debt/Value (Percentage)

Revenue ($ Millions)

Operating Income ($ Millions)

1

Dallas Cowboys

3,200

39

6

560

245.7

2

New England Patriots

2,600

44

9

428

147.2

3

Washington Redskins

2,400

41

10

395

143.4

4

New York Giants

2,100

35

25

353

87.3

5

Houston Texans

1,850

28

11

339

102.8

6

New York Jets

1,800

30

33

333

79.5

7

Philadelphia Eagles

1,750

33

11

330

73.2

8

Chicago Bears

1,700

36

6

309

57.1

9

San Francisco 49ers

1,600

31

53

270

24.8

10

Baltimore Ravens

1,500

22

18

304

56.7

11

Denver Broncos

1,450

25

8

301

30.7

12

Indianapolis Colts

1,400

17

4

285

60.7

13

Green Bay Packers

1,375

16

1

299

25.6

14

Pittsburgh Steelers

1,350

21

15

287

52.4

15

Seattle Seahawks

1,330

23

9

288

27.3

16

Miami Dolphins

1,300

21

29

281

8.0

17

Carolina Panthers

1,250

18

5

283

55.6

18

Tampa Bay Buccaneers

1,225

15

15

275

46.4

19

Tennessee Titans

1,160

10

11

278

35.6

20

Minnesota Vikings

1,150

14

43

250

5.3

21

Atlanta Falcons

1,125

21

27

264

13.1

22

Cleveland Browns

1,120

11

18

276

35.0

23

New Orleans Saints

1,110

11

7

278

50.1

24

Kansas City Chiefs

1,100

9

6

260

10.0

25

Arizona Cardinals

1,000

4

15

266

42.8

26

San Diego Chargers

995

5

10

262

39.9

27

Cincinnati Bengals

990

7

10

258

11.9

28

Oakland Raiders

970

18

21

244

42.8

29

Jacksonville Jaguars

965

15

21

263

56.9

30

Detroit Lions

960

7

29

254

-15.9

31

Buffalo Bills

935

7

13

252

38.0

32

St Louis Rams

930

6

12

250

16.2

Source. Badenhausen, Ozanian, and Settimi (2014).

Table 1.5. World Soccer Team Valuation and Finances (May 2015)

Team Rank

Team

Current Value ($ Millions)

One-Year Change in Value (Percentage)

Debt/Value (Percentage)

Revenue ($ Millions)

Operating Income ($ Millions)

1

Real Madrid

3,263

-5

4

746

170

2

Barcelona

3,163

-1

3

657

174

3

Manchester United

3,104

10

20

703

211

4

Bayern Munich

2,347

27

0

661

78

5

Manchester City

1,375

59

0

562

122

6

Chelsea

1,370

58

0

526

83

7

Arsenal

1,307

-2

30

487

101

8

Liverpool

982

42

10

415

86

9

Juventus

837

-2

9

379

50

10

AC Milan

775

-10

44

339

54

11

Borussia Dortmund

700

17

6

355

55

12

Paris Saint-Germain

634

53

0

643

-1

13

Tottenham Hotspur

600

17

9

293

63

14

Schalke 04

572

-1

0

290

57

15

Inter Milan

439

-9

56

222

-41

16

Atletico de Madrid

436

33

53

231

47

17

Napoli

353

19

0

224

43

18

Newcastle United

349

33

0

210

44

19

West Ham United

309

33

12

186

54

20

Galatasaray

294

-15

17

220

-37

Source. Ozanian (2015).

Professional sports teams most certainly compete with one another in the labor market, and labor in the form of star players is in short supply. Some argue that salary caps are necessary to preserve competitive balance. Salary caps also help teams in limiting expenditures on players.

Most professional sports in the United States have salary caps. The 2015 salary cap for NFL teams, with fifty-three player rosters, is set at $143.28 million (Patra 2015). Most teams have payrolls at or near the cap, making the average salary of an NFL player about $2.7 million. One player on an NFL team may be designated as a franchise player, restricting that player from entering free agency. The league sets minimum salaries for franchise players. For example, a franchise quarterback has a minimum salary of $18.544 million in 2015. The highest annual salary among NFL players is $22 million for Aaron Rodgers, Green Bay Packers quarterback (spotrac 2015c). The minimum annual salary is $420 thousand.

NBA teams have a $70 million salary cap for the 2015–16 season, with penalties for teams going over the cap. Maximum player salaries are based on a percentage of cap and years of service. For example, LeBron James, with ten years of experience, would have a maximum salary of $23 million (Mahoney 2015). New Orleans Pelicans Anthony Davis’ average salary of $29 million is the highest among NBA players (spotrac 2015b). Team rosters include fifteen players under contract, with as many as thirteen available to play in any particular game. The minimum annual salary is $428,498.

Major League Baseball (MLB) has a “luxury tax” for teams with payrolls in excess of $189 million. There is a regular-player roster of twenty-five or twenty-six players for double-header days/nights. A forty-man roster includes players under contract and eligible to play. Between September 1 and the end of the regular season the roster is expanded to forty players. The roster drops back to twenty-five players for the playoffs. The minimum MLB annual salary is $505,700 in 2015. The highest MLB annual salary is $31 million for Miguel Cabrera of the Detroit Tigers (spotrac 2015a).

Figure 1.1, a histogram lattice, shows how player salaries compare across the MLB, NBA, and NFL in August 2015. Player salary distributions are positively skewed. The mean salary across NFL players is around $1.7 million, but the median is $630 thousand. The mean salary across NBA players is around $5.1 million, with median salary $2.8 million. The mean salary across MLB players is around $4.1 million, with the median $1.1 million.

Figure 1.1

Figure 1.1. MLB, NBA, and NFL Average Annual Salaries

Sources. spotrac (2015a, 2015b, 2015c).

Do team expenditures on players buy success? This is a meaningful question to ask for leagues that have no salary caps. Szymanski (2015) reports studies showing that between 60 and 90 percent of the variability in U.K. soccer team positions may be explained by wages paid to players. Major League Baseball has a luxury tax in place of a salary cap, and team payrolls vary widely in size. The New York Yankees have been known for having the highest payrolls in baseball. Recently, the Los Angeles Dodgers have surpassed the Yankees with the highest player payroll—more than $257 million at the end of the 2014 season (Woody 2014).

Figure 1.2 shows baseball team salaries at the beginning of the 2014 season plotted against the percentage of games won across the regular season. Notice how teams that made the playoffs in 2014, labeled with team abbreviations, have a wide range of payrolls. While the biggest spenders in baseball are often among the set of teams going to the playoffs, the relationship between team payrolls and team performance is weak at best—less than 7 percent of the variability in win/loss percentages is explained by player payrolls.

Figure 1.2

Figure 1.2. MLB Team Payrolls and Win/Loss Performance (2014 Season)

Sources. Sports Reference LLC (2015b) and USA Today (2015).

See Appendix B, page 255, for team abbreviations and names.

The thesis of Michael Lewis’ Moneyball (2003) and what has become the ethos of sports analytics is that small-market baseball teams can win by spending their money wisely. Star players demand top salaries due as much to their celebrity status as to their skills. Players with high on-base percentages, overlooked by major-market teams, can be hired at much lower salaries than star players.

Teams, although associated with particular cities, can be known nationwide or worldwide. The media of television and the Internet provide opportunities for reaching consumers across the globe. A Super Bowl at the Rose Bowl in Pasadena, California or AT&T Stadium in Arlington, Texas may be attended by around 100 thousand fans (Alder 2015), while U.S. television audiences have grown to over 100 million (statista 2015).

Media revenues are important to successful sports teams. Other revenues come from business partnerships, sponsorships, advertising, and stadium naming rights. City governments understand well the power of sports to promote business. Locating sports arenas in cities can help to revitalize downtown areas, as demonstrated by the experience of the Oklahoma City Thunder. Indianapolis, Indiana promotes itself as a sports capital with the Colts and Pacers (Rein, Shields, and Grossman 2015).

Teams seek to build their brands, developing a positive reputation in the minds of consumers. Players, like fans, are attracted to teams with a reputation for hard work, courage, fair play, honesty, teamwork, and community service. The character of a team is often as important as its likelihood of winning. The Cubs are associated with Chicago, but Cub fans may be found from Maine to California. This is despite the fact that the Cubs have not won the World Series since 1908. Teams in U.S. professional sports vie to become “America’s team,” with fans across the land wearing their logoembossed hats and jerseys.

The demand for sports and the feelings of sports consumers are not so easily understood. Fans can be fickle and fandom fleeting. Fans can be loyal to a sport, to a team, or to individual players. Multivariate methods can help us understand how sports consumers think by revealing relationships among products or brands.

Figure 1.3 provides an example, a perceptual map of seven sports. Along the horizontal dimension, we move from individual, non-contact sports on the left-hand side, to team sports with little contact, to team sports with contact on the right-hand side. The vertical dimension, less easily described, may be thought of as relating to the aerobic versus anaerobic nature of sports and to other characteristics such as physicality and skill. Sports such as tennis, soccer, and basketball entail aerobic exercise. These are endurance sports, while football is an example of a sport that involves both aerobic and anaerobic exercise, including intense exercise for short durations. Sports close together on the map have similarities. Baseball and golf, for example, involve special skills, such as precision in hitting a ball. Soccer and hockey involve almost continuous movement and getting a ball through the goal. Football and hockey have high physicality or player contact.

Figure 1.3

Figure 1.3. A Perceptual Map of Seven Sports

In many respects, professional sports teams are decidedly different from other businesses. They are in the public eye. They live and die in the media. And a substantial portion of their revenues come from media.

Késenne (2007), Szymanski (2009), Fort (2011), Fort and Winfree (2013), Leeds and von Allmen (2014), and the edited volumes of Humphreys and Howard (2008a, 2008b, 2008c) review sports economics and business issues.

Gorman and Calhoun (1994) and Rein, Shields, and Grossman (2015) focus on alternative sources of revenue for sports teams and how these relate to business strategy. The business of baseball has been the subject of numerous volumes (Miller 1990; Zimbalist 1992; Powers 2003; Bradbury 2007; Pessah 2015). And Jozsa (2010) reviews the history of the National Basketball Association.

An overview of sports marketing is provided by Mullin, Hardy, and Sutton (2014). Rein, Kotler, and Shields (2006) and Carter (2011) discuss the convergence of entertainment and sports. Miller (2015a) reviews methods in marketing data science, including product positioning maps, market segmentation, target marketing, customer relationship management, and competitive analysis.

Sports also represents a laboratory for labor market research. Sports is one of the few industries in which job performance and compensation are public knowledge. Economic studies examine player performance measures and value of individual players to teams (Kahn 2000; Bradbury 2007). Miller (1991), Abrams (2010), and Lowenfish (2010) review baseball labor relations. And Early (2011) provides insight into labor and racial discrimination in professional sports.

Sports wagering markets have been studied extensively by economists because they provide public information about price, volume, and rates of return. Furthermore, sports betting opportunities have fixed beginning and ending times and published odds or point spreads, making them easier to study than many financial investment opportunities. As a result, sports wagering markets have become a virtual field laboratory for the study of market efficiency. Sauer (1998) provides a comprehensive review of the economics of wagering markets.

When management objectives can be defined clearly in mathematical terms, teams use mathematical programming methods—constrained optimization. Teams attempt to maximize revenue or minimize costs subject to known situational factors. There has been extensive work on league schedules, for which the league objective may be to have teams playing one another an equal number of times while minimizing total distance traveled between cities. Alternatively, league officials may seek home/away schedules, revenue sharing formulas, or draft lottery rules that maximize competitive balance. Briskorn (2008) reviews methods for scheduling sports competition, drawing on integer programming, combinatorics, and graph theory. Wright (2009) provides an overview of operations research in sport.

Extensive data about sports are in the public domain, readily available in newspapers and online sources. These data offer opportunities for predictive modeling and research. Throughout the book we also identify places to apply methods of operations research, including mathematical programming and simulation.

Exhibit 1.1 shows an R program for exploring distributions of player salaries across the MLB, NBA, and NFL. The program draws on software for statistical graphics from Sarkar (2008).

Exhibit 1.2 (page 18) shows an R program for examining the relationship between MLB payrolls and win-loss performance. The program draws on software for statistical graphics from Wickham and Chang (2014).

Exhibit 1.3 (page 19) shows an R program to obtain a perceptual map of seven sports, showing their relationships with one another. The program draws on modeling software for multidimensional scaling.

Exhibit 1.1. MLB, NBA, and NFL Player Salaries (R)

# MLB, NBA, and NFL Player Salaries (R)

library(lattice)  # statistical graphics

# variables in contract data from spotrac.com (August 2015)
#   player: player name (contract years)
#   position: position on team
#   team: team abbreviation
#   teamsignedwith: team that signed the original contract
#   age: age in years as of August 2015
#   years:  years as player in league
#   contract: dollars in contract
#   guaranteed: guaranteed dollars in contract
#   guaranteedpct: percentage of contract dollars guaranteed
#   salary: annual salary in dollares
#   yearfreeagent: year player becomes free agent
#
#   additional created variables
#   salarymm: salary in millions
#   leaguename: full league name
#   league: league abbreviation

# read data for Major League Baseball
mlb_contract_data <- read.csv("mlb_player_salaries_2015.csv")
mlb_contract_data$leaguename <- rep("Major League Baseball",
    length = nrow(mlb_contract_data))
for (i in seq(along = mlb_contract_data$yearfreeagent))
    if (mlb_contract_data$yearfreeagent[i] == 0)
        mlb_contract_data$yearfreeagent[i] <- NA
for (i in seq(along = mlb_contract_data$age))
    if (mlb_contract_data$age[i] == 0)
        mlb_contract_data$age[i] <- NA
mlb_contract_data$salarymm <- mlb_contract_data$salary/1000000
mlb_contract_data$league <- rep("MLB", length = nrow(mlb_contract_data))
print(summary(mlb_contract_data))
# variables for plotting
mlb_data_plot <- mlb_contract_data[, c("salarymm","leaguename")]

nba_contract_data <- read.csv("nba_player_salaries_2015.csv")
nba_contract_data$leaguename <- rep("National Basketball Association",
    length = nrow(nba_contract_data))
for (i in seq(along = nba_contract_data$yearfreeagent))
    if (nba_contract_data$yearfreeagent[i] == 0)
        nba_contract_data$yearfreeagent[i] <- NA
for (i in seq(along = nba_contract_data$age))
    if (nba_contract_data$age[i] == 0)
        nba_contract_data$age[i] <- NA
nba_contract_data$salarymm <- nba_contract_data$salary/1000000
nba_contract_data$league <- rep("NBA", length = nrow(nba_contract_data))
print(summary(nba_contract_data))
# variables for plotting
nba_data_plot <- nba_contract_data[, c("salarymm","leaguename")]

nfl_contract_data <- read.csv("nfl_player_salaries_2015.csv")
nfl_contract_data$leaguename <- rep("National Football League",
    length = nrow(nfl_contract_data))
for (i in seq(along = nfl_contract_data$yearfreeagent))
    if (nfl_contract_data$yearfreeagent[i] == 0)
        nfl_contract_data$yearfreeagent[i] <- NA
for (i in seq(along = nfl_contract_data$age))
    if (nfl_contract_data$age[i] == 0)
        nfl_contract_data$age[i] <- NA
nfl_contract_data$salarymm <- nfl_contract_data$salary/1000000
nfl_contract_data$league <- rep("NFL", length = nrow(nfl_contract_data))
print(summary(nfl_contract_data))
# variables for plotting
nfl_data_plot <- nfl_contract_data[, c("salarymm","leaguename")]

# merge contract data with variables for plotting
plotting_data_frame <- rbind(mlb_data_plot, nba_data_plot, nfl_data_plot)

# generate the histogram lattice for comparing player salaries
# across the three leagues in this study
lattice_object <- histogram(~salarymm | leaguename, plotting_data_frame,
    type = "density", xlab = "Annual Salary ($ millions)", layout = c(1,3))

# print to file
pdf(file = "fig_understanding_markets_player_salaries.pdf",
     width = 8.5, height = 11)
print(lattice_object)
dev.off()

Exhibit 1.2. Payroll and Performance in Major League Baseball (R)

# Payroll and Performance in Major League Baseball (R)

library(ggplot2)  # statistical graphics

# functions used with grid graphics to split the plotting region
# to set margins and to plot more than one ggplot object on one page/screen
vplayout <- function(x, y)
viewport(layout.pos.row=x, layout.pos.col=y)

# user-defined function to plot a ggplot object with margins
ggplot.print.with.margins <- function(ggplot.object.name,
    left.margin.pct=10,
     right.margin.pct=10,top.margin.pct=10,bottom.margin.pct=10)
    { # begin function for printing ggplot objects with margins
      # margins expressed as percentages of total... use integers
    grid.newpage()
    pushViewport(viewport(layout=grid.layout(100,100)))
    print(ggplot.object.name,
    vp=vplayout((0 + top.margin.pct):(100 - bottom.margin.pct),
        (0 + left.margin.pct):(100 - right.margin.pct)))
    } # end function for printing ggplot objects with margins

# read in payroll and performance data
# including annotation text for team abbreviations
mlb_data <- read.csv("mlb_payroll_performance_2014.csv")
mlb_data$millions <- mlb_data$payroll/1000000
mlb_data$winpercent <- mlb_data$wlpct * 100

cat("\nCorrelation between Payroll and Performance:\n")
with(mlb_data, print(cor(millions, winpercent)))

cat("\nProportion of win/loss percentage explained by payrolls:\n")
with(mlb_data, print(cor(millions, winpercent)^2))

pdf(file = "fig_understanding_markets_payroll_performance.pdf",
     width = 5.5, height = 5.5)
ggplot_object <- ggplot(data = mlb_data,
    aes(x = millions, y = winpercent)) +
    geom_point(colour = "darkblue", size = 3) +
    xlab("Team Payroll (Millions of Dollars)") +
    ylab("Percentage of Games Won") +
    geom_text(aes(label = textleft), size = 3, hjust = 1.3) +
    geom_text(aes(label = textright), size = 3, hjust = -0.25)

ggplot.print.with.margins(ggplot_object, left.margin.pct = 5,
    right.margin.pct = 5, top.margin.pct = 5, bottom.margin.pct = 5)

dev.off()

Exhibit 1.3. Making a Perceptual Map of Sports (R)

# Making a Perceptual Map of Sports (R)

library(MASS)  # includes functions for multidimensional scaling
library(wordcloud)  # textplot utility to avoid overlapping text

USE_METRIC_MDS <- FALSE  # metric versus non-metric toggle

# utility function for converting a distance structure
# to a distance matrix as required for some routines and
# for printing of the complete matrix for visual inspection.
make.distance.matrix <- function(distance_structure)
    { n <- attr(distance_structure, "Size")
      full <- matrix(0,n,n)
      full[lower.tri(full)] <- distance_structure
      full+t(full)
    }

# enter data into a distance structure as required for various
# distance-based routines. That is, we enter the upper triangle
# of the distance matrix as a single vector of distances
distance_structure <-
    as.single(c(9,11,10,5,14,4,15,6,12,13,16,1,18,2,20,7,3,19,17,8,21))

# provide a character vector of sports names
sport_names <- c("Baseball", "Basketball", "Football",
    "Soccer", "Tennis", "Hockey", "Golf")

attr(distance_structure, "Size") <- length(sport_names)  # set size attribute

# check to see that the distance structure has been entered correctly
# by converting the distance structure to a distance matrix
# using the utility function make.distance.matrix, which we had defined
distance_matrix <- unlist(make.distance.matrix(distance_structure))
cat("\n","Distance Matrix of Seven Sports","\n")
print(distance_matrix)

if (USE_METRIC_MDS)
    {
    # apply the metric multidimensional scaling algorithm and plot the map
    mds_solution <- cmdscale(distance_structure, k=2, eig=T)
    }

# apply the non-metric multidimensional scaling algorithm
# this is more appropriate for rank-order data
# and provides a more satisfactory solution here

if (!USE_METRIC_MDS)
    {
    mds_solution <- isoMDS(distance_matrix, k = 2, trace = FALSE)

    }
pdf(file = "plot_nonmetric_mds_seven_sports.pdf",
    width=8.5, height=8.5) # opens pdf plotting device
# use par(mar = c(bottom, left, top, right)) to set up margins on the plot
par(mar=c(7.5, 7.5, 7.5, 5))

# original solution
First_Dimension <- mds_solution$points[,1]
Second_Dimension <- mds_solution$points[,2]

# set up the plot but do not plot points... use names for points
plot(First_Dimension, Second_Dimension, type = "n", cex = 1.5,
    xlim = c(-15, 15), ylim = c(-15, 15))  # first page of pdf plots
# We plot the sport names in the locations where points normally go.
text(First_Dimension, Second_Dimension, labels = sport_names,
    offset = 0.0, cex = 1.5)
title("Seven Sports (initial solution)")

# reflect the horizontal dimension
# multiply the first dimension by -1 to get reflected image
First_Dimension <- mds_solution$points[,1] * -1
Second_Dimension <- mds_solution$points[,2]
plot(First_Dimension, Second_Dimension, type = "n", cex = 1.5,
    xlim = c(-15, 15), ylim = c(-15, 15))  # second page of pdf plots
text(First_Dimension, Second_Dimension, labels = sport_names,
    offset = 0.0, cex = 1.5)
title("Seven Sports (horizontal reflection)")

# reflect the vertical dimension
# multiply the section dimension by -1 to get reflected image
First_Dimension <- mds_solution$points[,1]
Second_Dimension <- mds_solution$points[,2] * -1
plot(First_Dimension, Second_Dimension, type = "n", cex = 1.5,
    xlim = c(-15, 15), ylim = c(-15, 15))  # third page of pdf plots
text(First_Dimension, Second_Dimension, labels = sport_names,
    offset = 0.0, cex = 1.5)
title("Seven Sports (vertical reflection)")

# multiply the first and second dimensions by -1
# for reflection in both horizontal and vertical directions
First_Dimension <- mds_solution$points[,1] * -1
Second_Dimension <- mds_solution$points[,2] * -1
plot(First_Dimension, Second_Dimension, type = "n", cex = 1.5,
    xlim = c(-15, 15), ylim = c(-15, 15))  # fourth page of pdf plots
text(First_Dimension, Second_Dimension, labels = sport_names,
    offset = 0.0, cex = 1.5)
title("Seven Sports (horizontal and vertical reflection)")
dev.off()  # closes the pdf plotting device

pdf(file = "plot_pretty_original_mds_seven_sports.pdf",
    width=8.5, height=8.5) # opens pdf plotting device
# use par(mar = c(bottom, left, top, right)) to set up margins on the plot
par(mar=c(7.5, 7.5, 7.5, 5))
First_Dimension <- mds_solution$points[,1]  # no reflection
Second_Dimension <- mds_solution$points[,2]   # no reflection
# wordcloud utility for plotting with no overlapping text
textplot(x = First_Dimension,
    y = Second_Dimension,
    words = sport_names,
    show.lines = FALSE,
    xlim = c(-15, 15),  # extent of horizontal axis range
    ylim = c(-15, 15),  # extent of vertical axis range
    xaxt = "n",  # suppress tick marks
    yaxt = "n",  # suppress tick marks
    cex = 1.15,  # size of text points
    mgp = c(0.85, 1, 0.85),  # position of axis labels
    cex.lab = 1.5,  # magnification of axis label text
    xlab = "",
    ylab = "")
dev.off()  # closes the pdf plotting device


pdf(file = "fig_sports_perceptual_map.pdf",
    width=8.5, height=8.5) # opens pdf plotting device
# use par(mar = c(bottom, left, top, right)) to set up margins on the plot
par(mar=c(7.5, 7.5, 7.5, 5))
First_Dimension <- mds_solution$points[,1] * -1  # reflect horizontal
Second_Dimension <- mds_solution$points[,2]
# wordcloud utility for plotting with no overlapping text
textplot(x = First_Dimension,
    y = Second_Dimension,
    words = sport_names,
    show.lines = FALSE,
    xlim = c(-15, 15),  # extent of horizontal axis range
    ylim = c(-15, 15),  # extent of vertical axis range
    xaxt = "n",  # suppress tick marks
    yaxt = "n",  # suppress tick marks
    cex = 1.15,  # size of text points
    mgp = c(0.85, 1, 0.85),  # position of axis labels
    cex.lab = 1.5,  # magnification of axis label text
    xlab = "First Dimension (Individual/Team, Degree of Contact)",
    ylab = "Second Dimension (Anaerobic/Aerobic, Other")
dev.off()  # closes the pdf plotting device
  • + Share This
  • 🔖 Save To Your Account