Thursday, 26 July 2012

One of the highs of Research Support

This really should have been last weeks blog, as it happened last week but I needed to finish off the already told story so it had to wait and nothing better has happened this week. So I think I will tell an incident that is one of the highs of doing my sort of research support and keeps me passionate about it.

I have worked on and off with students from the Sheffield Kidney Institute and last week one of them I had seen before when doing his masters came with another student in her final year of doing her doctorate while he was only starting his. Anyway it was one of those cases when the data was not finalised yet. Actually that in itself is a good sign, a student who knows that they can talk with a statistician before they finish entering their data can save themselves an awful lot of time and effort later on.

Anyway it was good to a have a play, and I was able to demonstrate some of SPSS abilities including the ability to draw ROC Curves which are the ability to distinguish signal from noise correctly. This is important because the research they are doing is on finding ways to distinguish someone who is likely to deteriorate from someone who is going to maintain their current level of kidney disease. The are really dealing with people who are at various stages of the disease from various causes.

The focus of the final years student's doctorate had been to develop a new and better predictor of a test. There already exist standard tests. She only has a small part of the data in so far but I did a ROC curve for the standard test and then I went on and chose one of the other options just by random, well I think it was first in the list but apart from that there was nothing to tell it from any of the others. The curve came back and it was functioning clearly better than the standard and her face lit up.

She had spent the last three plus years developing that test, she had no idea whether it would be better/as good/worse than the current test. What that one simple graph was doing was telling her, that her hard work had paid dividends. No doubt the analysis for the thesis will be a lot more detailed and a lot more complex and a lot of hard work, but to know that in the end it worked somehow makes the statistics appear less of a chore.

Lets be clear, its her hard work that has done it, none of my bits of wizardry would be any good without it, but being able to tell people they have succeeded in what they have been working hard at is a great buzz!

Yes one of them is back already asking for further help.

Thursday, 19 July 2012

Why we did not use Canonical Correlations in the end

Last week I wrote about learning to use Canonical Correlations. This week I am going to tell you why in the end we did not use them for the study.

Let me give you some background to the study first. It comes from modern languages and looks at the acquisition of English by students in Pakistan, this is a doctorate. The student has collected multiple data sources and among them are two surveys, one of teachers and one of students. The student is hard working and is really trying to link up the data. Both questionnaires had sub-questionnaires covering such as beliefs and classroom practice, the students one also contained questionnaire covering their own learning practice. Despite the fact that these sub-questionnaires often had thirty or more questions the number of teachers answering was only forty, while students were over three hundred.

Firstly I did actually spend quite a bit of time reading up on Canonical Correlation  had persuaded me that I really did need to get the Canonical Correlation macro running in SPSS. This was not as straightforward as it sounds. There have been changes to the way SPSS handles files over the years and it had not gone through all the iterations. This meant I had to sit down and actually work out why it was going wrong. In the end I cludged a solution so it ran, by cludged I mean I put in a solution so it would run effectively in this case. It does not mean that I can pass this file to anyone else and it will run for them. It won't and if I am honest I really should go back and revise it so it does. I got it working for a test set.

Second stage was to run it on the teachers data. There were two ways we could run it. Firstly we could use it with the raw questions, secondly we could use it with the factors that we had previously calculated from the data. Canonical Correlations did not work for the initial approach as the questionnaires are too long for the number of teachers so the correlation matrix was indeterminant. You need at least one case per question. So we went onto the factors, this worked but the results were uninterpretable because you were dealing with an equation that was dealing with already rotated factors.

So we did a rethink.

The factors were created using Principle Components Analysis with a Varimax rotation to aid interpret-ability.  Now the nice thing with this analysis is that the factors that you get out at the end are uncorrelated. So we had two sets of factors where within the sets there was no correlation. So we very simply did a correlation of one set of factors with the other. Remember within sets each factor is independent of the other so the results were easily interpretable.

Its almost Canonical Correlation but not quite and the student could write about it quite clearly. One happy student, one lots more relaxed Statistician.

Thursday, 12 July 2012

Getting my head around Canonical Correlations

This week has been easier in that I have actually found sometime to do some development work. Not much but some. Not much as session run by our Creative Media team, although if I want to hire any of the equipment it is go the Audio and Visual as I am staff. That is not going to be a problem come the autumn as they will be working from the same building as me. Also an introduction to Screenr which may be a quick and dirty way to get videos on how to do things out to people. I still need to think about that.

The other thing I have been doing is looking into Canonical Correlations. I know they existed before hand, they were briefly mentioned in a Multivariate Analysis Course I did as part of my Masters at Reading University over twenty years ago but really we were introduced to a huge number of techniques in a very small time and all I retained was they existed and there are some ways of carrying them out in SPSS.

At first glance they look a beguiling sort of solution to a lot of problems and a natural extension of General Linear Models which most people use. Examples of General Linear Models include t-test, ANOVA, Linear Regression, Multiple Linear Regression. The extension that includes them would also include factor analysis and Discriminant  Analysis. They basically allow you to have multiple dependent as well as explanatory variables in a Regression. As such they seem to do for Regression what MANOVA does for ANOVA. This is reinforced in SPSS with the fact that the "easy" way to carry out this analysis is with MANOVA procedure.

The problem is that the complexity of the results seems to make it difficult to interpret exactly what is related to what. I think there are three sets of equations, one is the "factors" from the dependent, another is the "factors" from the explanatory and finally there is the relationship between these. I found a paper by Alissa Sherry and Robin Henson which gives a fairly gentle introduction and am now working through what Tabachnick and Fidell which is a standard text book for people like me. It teaches you all the things to worry about.

To some extent I am beginning to get there. The challenge is to use it in anger next week with real data, this time on learning English in Pakistan and see if it will now give us interpretable results.

Thursday, 5 July 2012

The Challenge of an introductory course

For good or ill, I have been teaching what I have called "An Introduction to NVivo" for the last two years. It was put on due to popular demand and at present I have to run it every two months to keep up with demand (technically I am not doing that).

The odd thing is that in many ways a two hour talking head course is NOT a good way to teach NVivo, it would not be my preferred way. It is meant really as a taster, something that can give people a flavour for NVivo and get them to explore it for themselves. As that it is barely acceptable and leaves me drained when I do it. The fact that I am doing it on out of date software does not help.

Yet this course is hugely popular and I regularly leave people enthused for the package. It is one of the ironies of life. The group I taught today was relatively easy, there was at least one other user and his enthusiasm was infectious. I think we might have even persuaded some of the people who were computer shy that it was worth a try.

What I have to remember for next time is to be quite deliberate about pointing people to other resources to help them. Even the guy who was a user was interested to hear of the two day courses where you could take along your own data and get help from an NVivo expert. The problem at Sheffield is that I am the NVivo expert and I am very aware of huge chunks where I just don't know about the package. The student who was overwhelmed by the course was guided by him to the help on the QSR website and was able to see that they had videos and tutorials that might really help her and which she could take in small bite size pieces. I think I have at least three more users this time which is all I aim to achieve.

If my aim with SPSS course is to get people out from being paranoid to where they feel its a chore, then my aim with this course is to get people to the stage where they feel it is worth having a trial with NVivo. To do that I have to overcome a level of scepticism and persuade users that the program is not going to take over the analysis (and disappoint them that it won't produce p-values) also I have to persuade them that being computerised does not imply distance. On my last course where one guy says "this is not about taking you away from your data, its all about keeping you close to it". If I can get that across no matter how tired and drained I am, then the course has suceeded.