Statisticians are often slightly inconsistent and I am going through one of those times just at present. Yesterday I was dealing with a data set in which the outcome variable was 1-5, and I was quite happy ignoring of my sensitivities about underlying distributions and just putting the numbers in. Well that is not quite true, I was having difficulty getting SPSS to give me results that allowed for this scale (whether 0,1,2,3,4 or 5 plants had survived in a block) and when I got one that was vaguely right it refused to give estimates. So I went back to the "lets pretend its normal and see what we get". Two things made me confident this was OK. We had three replicates and that means the estimates were not only for five plants but out of three plots each with five plants in it. That means the data is likely to be closer to normal. Secondly the overall results were consistent with the more complex analysis.
Today I am having problems with a different set of data, this has a 30 point scale although nobody scores below 4. However there are many people who score 30. Now when groups start hitting the top end of the scale, the scale becomes insensitive. There are ways of "fixing" this to some extent. The logit transform and the allied transformation logistic transform both help to a certain extent, by placing further out the points that are hitting the end of the scale. They are not perfect, because you have to keep everyone who score 30 together and you know that some of them really should have scored 31, 32, etc. The problem is you do not know which.
However there is more problem with this data set than that. There is a condition and people with that condition automatically score 27 or less. That is normal/controls score 28, 29 or 30. Now I really start asking questions about whether this scale is measuring anything except perhaps what sort of a day a normal person is having. In other words the sensitivity of the scale with normal people is so limited that I suspect that it is swamped by other effects. They do have pretty big data sets, but there does not seem to me to be any evidence that there is anything else going on in controls. I do not think that the logit or logistic transform is doing anything here.
So there I am with one dataset quite merrily ignoring what many would see as obvious problems and with another I am worried by it to the extent where I wonder if there was any point in collecting the control data. Oh the contrariness of being a Statistician
No comments:
Post a Comment