Thursday, 24 January 2013

Sometimes thinking visually is slow: PCA and Regression

I was in a discussion over planning a course on Statistics for Linguists where someone said they did not understand factor analysis and then went onto describe problems over how it decided  what was important. The problem seemed to be all about scaling, I came away feeling oddly as if I knew the answer but I could not think of it.

Right the first thing to realise is Principle Component Analysis does exactly what it says on the can. If you see linear regression as minimizing the distance from the line to the points when the vertical distance. Then the equivalent for Principle Component Analysis is minimizing the perpendicular distance from the line!

The problem is then there problem there always has been, how to scale the X and Y axis to allow for the variation. There are two ways to do this. The easiest is not to do anything! That way you do the Principle Components of the Covariance Matrix. The main alternative is to standard for the standard deviation in X by the standard deviation in Y or, in other words, to use standardised scores. To do this you just find the Principle Components of the Correlation Matrix.

Sometimes once I have things visually, things are very easy but it has probably taken the best part of twenty years for that visualization to coalesce together.

No comments:

Post a Comment