# Getting Started with Data Science: Hypothetically Speaking

• Print
This chapter is from the book

## Analysis of Variance

Analysis of variance, ANOVA, is the prescribed method of comparing means across groups of three or more. The null hypothesis in this case states that the average values do not differ across the groups. The alternative hypothesis states that at least one mean value is different from the rest.

I use the F-test for ANOVA. If the probability (p-value) associated with the F-test is greater than the threshold value, which is usually .05 for the 95% confidence level, we fail to reject the null hypothesis. In instances where the probability value for the F-test is less than .05, we reject the null hypothesis. In such instances, we conclude that at least one mean value differs from the rest.

I will repeat the comparison of means for the three age groups using the ANOVA test. The R code and the resulting output (see Figure 6.38) follow.

Note that the value reported under Pr(>F) is 0.0998, which is greater than 0.05. Thus, we fail to reject the null hypothesis and conclude that the teaching evaluations do not differ by age groups.

Let us test the average teaching evaluations for a discretized variable for beauty, which in raw form is a continuous variable. I convert the continuous variable into three categories namely: low beauty, average looking, and good looking. The R code and the resulting output (see Figure 6.39) follow.

```x\$f.beauty<-cut(x\$beauty, breaks=3)
x\$f.beauty<-factor(x\$f.beauty, labels=c("low beauty", "average
looking", "good looking"))
cbind(mean.eval=tapply(x\$eval,x\$f.beauty,mean),
observations=table(x\$f.beauty))
summary(aov(eval~f.beauty, data=x))```

The probability value associated with the F-test is 0.0276, which is less than .05, our threshold value. I therefore reject the null hypothesis and conclude that teaching evaluations differ by students’ perception of instructors’ appearance.