In this two-part series, part 1, "Using Principal Components in Excel," showed how principal components analysis (PCA) can be used to derive, or extract, underlying and unmeasured components that are expressed overtly in measured variables.
The analysis in part 1 used an Excel workbook that contains the rates of seven different types of crime in each of the 50 states in the U.S. We created a correlation matrix from that data set and ran it through a freeware add-in to identify two underlying components (factors). Those two components accounted for between 62% and 91% of variance in the seven original variables. That result is in line with the main goal of PCA and factor analysis: reducing the number of dimensions, and therefore simplifying the analysis without losing too much information.
Structure of Principal Components
In this case and many others, a problem remains. If you look at how the individual variables are related to the two components, it's not clear what those components represent. One characteristic of principal components analysis is that most of the variables tend to have strong loadings (which are in fact correlations) on the first factor. That's not surprising, because principal components analysis works by extracting as much variance from the measured variables as possible, assigning that variance to the first component. Only the leftover variance is available for the second, third, and subsequent components. Figure 1 shows the sort of structure we obtain, in the range C5:D11.
Figure 1 Original and rotated factor loadings for the U.S. crime rate data set.
Loadings is just another term for the correlations between the measured variables and the components or factors. Notice in Figure 1 that the first extracted factor has strong loadings with six of the seven variables; only the Murder variable has a strong loading on the second factor. If we stopped here, we might well conclude that the two underlying factors are something like "Everything Except Murder" and "Murder." Not a very illuminating conclusion.
Notice the pattern of loadings in the lower half of Figure 1, shown in the worksheet range C16:D22. Burglary, Larceny, and Auto Theft (and perhaps Robbery) load fairly strongly on Factor 1; Murder and Assault load strongly on Factor 2. Rape has a moderately strong loading on both Factor 1 and Factor 2. Despite the ambiguity of the loadings for Rape, it seems fairly clear that Factor 1 can be thought of as a Property Crimes factor, and Factor 2 can be thought of as a Personal Crimes factor. That structure is cleaner and clearer than the structure we get from the original principal components.