Thursday, 28 June 2012

Transforms Endpoints and fixing things

Statisticians are often slightly inconsistent and I am going through one of those times just at present. Yesterday I was dealing with a data set in which the outcome variable was 1-5, and I was quite happy ignoring of my sensitivities about underlying distributions and just putting the numbers in. Well that is not quite true, I was having difficulty getting SPSS to give me results that allowed for this scale (whether 0,1,2,3,4 or 5 plants had survived in a block) and when I got one that was vaguely right it refused to give estimates. So I went back to the "lets pretend its normal and see what we get". Two things made me confident this was OK. We had three replicates and that means the estimates were not only for five plants but out of three plots each with five plants in it. That means the data is likely to be closer to normal. Secondly the overall results were consistent with the more complex analysis.

Today I am having problems with a different set of data, this has a 30 point scale although nobody scores below 4.  However there are many people who score 30. Now when groups start hitting the top end of the scale, the scale becomes insensitive. There are ways of "fixing" this to some extent. The logit transform and the allied transformation logistic transform both help to a certain extent, by placing further out the points that are hitting the end of the scale. They are not perfect, because you have to keep everyone who score 30 together and you know that some of them really should have scored 31, 32, etc. The problem is you do not know which.

However there is more problem with this data set than that. There is a condition and people with that condition automatically score 27 or less. That is normal/controls score 28, 29 or 30. Now I really start asking questions about whether this scale is measuring anything except perhaps what sort of a day a normal person is having. In other words the sensitivity of the scale with normal people is so limited that I suspect that it is swamped by other effects. They do have pretty big data sets, but there does not seem to me to be any evidence that there is anything else going on in controls. I do not think that the logit or logistic transform is doing anything here.

So there I am with one dataset quite merrily ignoring what many would see as obvious problems and with another I am worried by it to the extent where I wonder if there was any point in collecting the control data. Oh the contrariness of being a Statistician


Friday, 22 June 2012

Struggles with getting things from the Web in NVivo 9

Alright NVivo 10 is out and it has web integration, but I have not tried it out except very peripherally. What I do have is a project I am collaborating with that is using the web.This is largely about static web pages and it should be fairly easy to do that right? Well my experience is wrong.

I started off on the preliminaries for this using fileshot pro, you'd think it was ideal, all sorts of extra facilities from your normal pdf printer specially designed for websites. You'd think it. However when we came to using it on commercial websites which listed products we had searched for we got completely black pdfs. Needless to say I have not shelled out for a copy when the trial period expired and when I remember I will be uninstalling from my browser.

So I went to Bullzip pdf printer and this works fine or at least it does on my machine, it produces pages made up of text and pictures which are simple enough to import successfully into NVivo and I can use in vivo coding facilities from NVivo. When the student who is actually doing the hands on stuff comes to use it, it produces pure pictures. Why? I do not know. I have not found any settings that I can change which will allow her to produce pdfs like I do on my machine. Cutepdf does exactly the same trick. This means I have to download all the webpages.

Then we come to problem number two, on the twenty or so pages we are actually looking there are three which have a video on. Having those videos would be nice. What is more it is fairly simple to download the videos as mp4 format, and officially NVivo reads mp4. Should be straight forward then to include.Only when we try we get the message that NVivo does not recognise the format.  Oh dear now we have to deal with that. So we looked for a converter that would work. The first didn't and we promptly uninstalled it, but FormatFactory  does and will convert these files to avi format which NVivo 9 will read.

Of course this is all technical stuff, it won't get mentioned in any paper or thesis we write, it will not appear at all in the how to books, but it has taken about three hours of work for two of us to get that far and many wasted hours because the software was not working together by the student. Maybe next week we will be making progress on the actual analysis.

Thursday, 14 June 2012

New Versions of Research Software is coming up

This week has been dominated by new version of software. It never rains but it pours in this area, and all I need now is for Minitab to announce that they have a new release. There are new versions coming of both the major packages I support, SPSS and NVivo

Firstly I have been helping to beta test SPSS 21.  That is right there is another version coming soon, I expect it before Christmas, and I hope they are sticking at a once a year renewal. Some of the new bits are very nice. You can now easily get descriptives of any variable by just clicking on the variable row header and selecting the option. This is nice for people like myself who end up dealing a lot with other people's data. When something goes wrong in an analysis quite often the first thing I want to do is look at the basic frequencies and descriptive statistics of the data to see if the number are what I was expecting. They also have intergrated SQL terms into their ability to merge datasets. SPSS "Add cases" and "Match Files" commands pre-date SQL. I can remember a presentation back in around 1990 where someone was talking from one of the big database companies about how they had just introduced SQL into it. Although I was relatively new to SPSS the merging abilities had been there for quite a time already and were mature technology. They were admittedly a poor cousin to those in SAS (even then in SAS it was difficult to think of way to reorganise the data that you could not do) but they were there. If this work from the menus then it will be good but at present there are definite glitches and I am not holding my breath. It also looks like for the Statistical Techie Forecasters there may be some simulation algorithms attached. My guess from CICS point of view, SPSS 21 will be available about February 2013 for purchase with looking at the possibility of an upgrade to it on the managed desktop for September 2013.

Second QSR are just in the final stages of launching NVivo 10, I have downloaded a trial copy and am having a play with it on my machine. It has quite a lot of things to work with the Web including Evernote.Now I have not used Evernote but there are quite a few people promoting it for academics. This will be useful if one of the research bids I am involved in comes off. The other thing is that it claims to handle more data and do it quicker. It is still fairly slow on my machine, which is a decent machine and large dataset did cause problems with version 9. The problem is that QSR added functionality without realising the resource demands of large datasets when they are things like videos. So yes I will be glad if this makes in more reliable.  You can find out more QSR What is New in NVivo 10 or find out more. However if you used 9 and upgrade to 10 your knowledge is still applicable, the only thing you need to watch is that you do not revert to working in 9 and 9 can't read NVivo 10 Data sets. It is not on campus yet but if you want to play then you can download a trial version for thirty days. I expect there will be a wait for the full license copy to come on site. We have just renewed for five years so don't worry it will come.

Then there are other bits and pieces. Research has been revolutionised by the web in ways it is difficult to comprehend. When I started twenty years ago, most researchers spent a fair amount of time searching for articles physically in the library. It meant that your information was controlled very much by what other people were talking about in your discipline as you had to get reference to a paper to find out that it existed. First came big databases of references that were searchable so that you could look for key words, then came the likes of Google. Today if I want to know what is written on a topic my first port of call is "Google Scholar" and only if I find I can't get it online do I start using the elibrary and only when that fails does a trip to the library follow. So it is with interest that I hear about scholr.ly which uses Google but tries to represent it is a scholar friendly way. I will give it a go and see if it adds anything.