Thursday, 14 February 2013

Testing for Normality, Power and The Central Limit Theorem

This week I am being the working statistician and having a grumble at a lot of the other statistical experts and their idea of what is important when carrying out parametric tests. Lets be clear, parametric tests are tests that depend on the parameters of the model following an assumed distribution. For a number of common tests the assumption is that the estimated values follow normal distribution and that the standard deviation is constant throughout the value range although not all values may have the same mean. The tests that assume this are t-test, anova and linear regression or for the more statistical literate; all that come under the glass of general linear models (please note that is general, not generalised).

Now let me also explain about Power. Power is the ability of a test to detect difference from the null hypothesis. Those who are used to doing medical research will be used to doing power calculations to work out how many units you need to detect a specific difference. However the concept of power does not stop with designing experiments, any test that you do has a level of power which you could calculate in advance if you knew what level a result became interesting and it would tell you what chance you had of finding for the result rather than for the null if the actual measure of interest was so different.

Now when testing normality, you actually have a null that the data is normally distributed and you are interested only if it differs from this. Of course with a p-value you get a 1 in 20 chance of concluding this even if it is not there but lets leave that too one side. What you need to take from this that there is one rule with power. The larger the sample size the smaller the difference the test can detect from normality.

So lets consider the often suggested approach of plotting a histogram of your raw data and then running a Kologorov-Smirnov test. This approach is WORTHLESS. There is no relationship between the normality of the original data and your tests unless there is nothing going on in your data. In other words if you have a significant result your raw data will not be normally distributed!

Let me go back to the assumptions:
1) is that the estimated values are normally distributed
2) that there is constant variation

The central limit theorem (its not an actual theorem but holds in the vast majority of cases) says that if you have enough cases then your estimates will be close to normally distributed regardless of the underlying distribution. The problem is when do you have enough data. When you have enough data is determined by the distribution of the residuals. That is right not the raw data but the actual value-estimated value. You can only find this value once you have carried out the test! Basically the closer these are to normality the fewer cases you need to assume normality of the estimates.

So statisticians plot histograms, normal probability graphs of the residuals.

We could carry out Kolomogorov-Smirnov Tests or Shapiro Wilks but there is a snag and it is down to power. Basically the vast majority of the time these are either not sensitive or too sensitive. Remember the larger the sample size the smaller the difference from normality that these tests can do, but the larger the sample size also the less we are interested in small differences from normality.  Basically it is sensitive to differences from normality when you are not interested in them and it is not sensitive enough when it really matters.

A while ago I developed a rule of thumb based on a hunch I have and I am calling it nothing more than that.What I tend to use is  the number of parameters estimated for the alternative hypothesisand divide it by the residual degrees of Freedom . My feeling is that:
  • If the ratio is > 0.2 then you should cite other studies to establish normality of the data. It is going to be rare for either Kolmogorov Smirnoff or a Q-Q plot to be informative.
  • If the ratio is < 0.2 and is > 0.05 then do both the Kolmogorov-Smirnoff and the Q-Q plot.
  •  If the ratio is < 0.05 then do a Q-Q plot and only consider transformations when there is a very clear deviation from a straight line. 
So it is only in fair to middling sized studies that a test is sensible.

However there is more, certain distributions that are clearly not normal tend to normality quicker than others. So plotting a histogram is a good idea. Firstly the big bug bear are extreme outliers. If you want a crude estimate of then take biggest absolute residual whether positive or negative and divide it by the SIQR of the residuals. The bigger this value the more cases you need to assume normality. More importantly bounded distributions (ones that have set minimums and maximum values) tend to normality more quickly than unbounded ones! This is counter-intuitive as the normal distribution is not bounded!

The other thing is that symmetric distributions become normal quicker than skewed distributions. If you want to test this out the Online Statistics Text book has a good demo in which you can make as horrible a distribution as you like, for instance a Uniform Distribution looks nothing like a normal but it is both symmetric and bounded so with taking the mean of just five cases then if I run it 1000 times I get the following graph
which to me looks pretty normal!

The other thing is that there is a problem, so far we have assumed all the way through equal variance. All the investigations show that these tests are far more sensitive to not meeting that criteria than they are to lack of normality but that is much harder to test for.

In other words when it comes to dealing with assumptions for these tests, we are like the drunk who is looking for his keys under the street lamp because the light is better there than where he dropped them! We are spending a lot of effort on something that has some fairly simple approaches and not looking where the problems may really lie.

No comments:

Post a Comment