There was a time when I was ready to look at every new statistical package that came along, well almost. The longer I have been supporting statistical packages the more cautious I become. Let me be straight with you, even the big grown up packages like SPSS get it wrong surprisingly often and new packages that are aimed at those doing simple statistics get it wrong a lot more of the time.
Now let me be honest there are some superb statistics packages out there that are newer. I think of Stata for starters. The thing that distinguishes them is that they are not trying to produce a simple tool for researchers but take statistical complexity seriously! There packages are not any easier than SPSS to use, but when used properly are powerful tools. What really gets me is the people who think "lets put together something simple for the researcher and forget about the rest of the statistics".
Today I had a first query on a package that is popular with the medical school. The person was doing a simple oneway ANOVA and then doing post hoc tests. He came to me with his data because he could not get p-values out for the Bonferroni post hoc tests. I suppose I could of explained that Bonferroni does not produce p-values but alters significance levels and that was why he was not getting any results. He also had a significant F value and no significant post hoc.
Well Bonferroni is highly conservative, even the more accurately calculated p-value of sidak is less conservative and yes the difference between the two is that Sidak calculates actual p-value while Bonferroni uses an approximation. Fortunately it gave t-values (but no degrees of freedom) so I was able to find LSD p-values and then carry out a Holm Sequential test and calculate an inverse Sidak (many of the inverse Bonferroni would have been greater than 1, I told you it was an approximation). All this is do-able if you are going to kludge it in Excel.
What would have been more sensible is if it had done what packages like SPSS do and that is give a variety of post-hoc tests. He could have done the standard one such as Tukey and got p-values out straight out. This is what I did when I established that was what he wanted and I also showed him how to draw the graph using SPSS to estimate the terms and a graphics package to actually draw it (well Excel in this case but any graphic package does this). That was so simple he was delighted.
So the first thing against the package is that it does not even give the standard comparative post hocs that many of the main ones instead using an old fashioned (think over fifty years ago) technique which has long been discarded unless there are no other options.
However I saw more, to do this ANOVA the user had to put each group into a separate column. This immediately sounds warning bells with me. The one thing I find nearly all these packages don't do is deal with levels of variation. When they come to ANOVA they basically treat a paired t-test as a group t-test. NOT GOOD. This leads to wrong interpretation. In fact the wrong presentation of within subject comparison is a major bug bear in the literature and in this case confidence intervals have not been an improvement!
To be polite, I will not be recommending this particular package for statistical analysis. I suspect if you want to do a proper statistical analysis of all but the most constrained of research then you REALLY do need to need to put in the time and effort to learn a proper statistical package. I am not fussed which, I will even allow SPSS* but please do not bring me something that you say is easier. So far all such packages I have seen in their attempts to simplify mislead researchers into doing WRONG analyses.
*When I was a postgrad SPSS had a reputation for doing this sort of thing, now some twenty years on, SPSS has developed and there are so many more very poor packages out there that I do not feel this can be sustained.
No comments:
Post a Comment