Thursday, 19 July 2012

Why we did not use Canonical Correlations in the end

Last week I wrote about learning to use Canonical Correlations. This week I am going to tell you why in the end we did not use them for the study.

Let me give you some background to the study first. It comes from modern languages and looks at the acquisition of English by students in Pakistan, this is a doctorate. The student has collected multiple data sources and among them are two surveys, one of teachers and one of students. The student is hard working and is really trying to link up the data. Both questionnaires had sub-questionnaires covering such as beliefs and classroom practice, the students one also contained questionnaire covering their own learning practice. Despite the fact that these sub-questionnaires often had thirty or more questions the number of teachers answering was only forty, while students were over three hundred.

Firstly I did actually spend quite a bit of time reading up on Canonical Correlation  had persuaded me that I really did need to get the Canonical Correlation macro running in SPSS. This was not as straightforward as it sounds. There have been changes to the way SPSS handles files over the years and it had not gone through all the iterations. This meant I had to sit down and actually work out why it was going wrong. In the end I cludged a solution so it ran, by cludged I mean I put in a solution so it would run effectively in this case. It does not mean that I can pass this file to anyone else and it will run for them. It won't and if I am honest I really should go back and revise it so it does. I got it working for a test set.

Second stage was to run it on the teachers data. There were two ways we could run it. Firstly we could use it with the raw questions, secondly we could use it with the factors that we had previously calculated from the data. Canonical Correlations did not work for the initial approach as the questionnaires are too long for the number of teachers so the correlation matrix was indeterminant. You need at least one case per question. So we went onto the factors, this worked but the results were uninterpretable because you were dealing with an equation that was dealing with already rotated factors.

So we did a rethink.

The factors were created using Principle Components Analysis with a Varimax rotation to aid interpret-ability.  Now the nice thing with this analysis is that the factors that you get out at the end are uncorrelated. So we had two sets of factors where within the sets there was no correlation. So we very simply did a correlation of one set of factors with the other. Remember within sets each factor is independent of the other so the results were easily interpretable.

Its almost Canonical Correlation but not quite and the student could write about it quite clearly. One happy student, one lots more relaxed Statistician.

No comments:

Post a Comment