The second one is when there the data is known to be highly correlated but the actual patterns are not known. The classic example is for food intakes, not nutrient intakes, these are the actual food items such a bread, sugar, milk, eggs etc. Nobody really believes there are factors underlying this but there are patterns and distinguishing patterns should work. We normally have to standardize the factors, most people eat a lot more bread than drink in fruit juices! Also quite often it is a good idea to use log measurement to standardize. In this case I will apply a rotation and then hope the patterns make sense.
Now last week I was co-teaching a class for linguist with Nick Fieller. This was fun. Partly because oddly enough Nick and I are fairly similar with our attitude to statistics although he is by far the brighter guy and partly because it was in an area of his expertise. He knew far more about MDA than me but his use of PCA was interesting. What he suggested and what I had not thought of was to use PCA and plotting as a method for doing exploratory statistics before carrying out a Repeated Measures ANOVA. Let me explain, the big problem with Repeated Measures ANOVA is there are few graphs that are actually useful because the data has at least two levels of variation. Therefore I have tended to suck it and see approach. The Principle Component actually summarises the variation within subjects into several dimensions. The graphs are for a statistician but I will work through an example. Here I am looking at a variable UMMA measured at 5 time points. The first was prior to being given medication and everyone was given the same, the aim being to lower their UMMA. There were however two groups of subjects. So what we got was the following table:
|
Total
Variance Explained
|
|||||||
|
Component
|
Initial
Eigenvalues
|
Extraction
Sums of Squared Loadings
|
|||||
|
Total
|
% of
Variance
|
Cumulative
%
|
Total
|
% of
Variance
|
Cumulative
%
|
||
|
Raw
|
1
|
2.972
|
80.789
|
80.789
|
2.972
|
80.789
|
80.789
|
|
2
|
.454
|
12.345
|
93.134
|
.454
|
12.345
|
93.134
|
|
|
3
|
.109
|
2.971
|
96.104
|
|
|
|
|
|
4
|
.092
|
2.509
|
98.614
|
|
|
|
|
|
5
|
.051
|
1.386
|
100.000
|
|
|
|
|
Now when we look at the components we get:
|
Component
Matrix
|
||
|
|
Raw
|
|
|
Component
|
||
|
1
|
2
|
|
|
umma_v1
|
1.249
|
-.454
|
|
umma_V2
|
.692
|
.204
|
|
umma_V3
|
.671
|
.241
|
|
umma_V4
|
.537
|
.328
|
|
UMMA_V5
|
.441
|
.201
|
The first component is basically the mean with some weighting towards the first time point. The second compares the first value which is pre treatment with the rest. So a fairly nice interpretation. Lets have a look at the scatter graph of the first against the second but this time with the two groups C & D.
The first thing you notice is that two members of group C are a lot higher than the rest. Also there are about two values where the second component is much. these points are outliers. There however does not seem to be much difference apart from this between Group C and Group D which is precisely what we concluded in the Repeated Measures ANOVA.
So thanks Nick that is another tool in my kit for tackling some of the trickiest forms of analysis there is.
No comments:
Post a Comment