Friday, 11 January 2013

Fiddle Factors and Adjusting logged data

I have had a query on what sort of an offset you should use when taking logarithms of variables with zeros in them. I have spent quite some time searching for papers on this and not turning up anything. So today I decided to have a go and see what would happen if I looked at a situation where normally a log is the suggested transform.

Thus I decided to look at generated Possion data and to see if I used fiddle factors which actually behaved best. My hunch was that the smaller the mean the smaller the fiddle factor would work best. This was a hunch.

So I generated six samples of possion data with means  0.1,0.2,0.5,1,2,5,10,and 20. I then looked at taking log transforms with offsets of 0.05, 0.1,0.2, 0.5 and 1 as well as the raw data. I then calculated the Shapiro-Wilks statistic for all these transformed data, the untransformed data set as zero. The results are not as expected:
Right the first thing to say is that a lot of the time the untransformed data performs as well as any of the log transforms. The only case where it seems to be effective is when it may be valid is for Poissons around 0.5 and 1 i.e. those with a fairly high level of skewness. Beyond this it does nothing to improve the data and when the data is less skewed then the logarithm actually performs worse than normal data.

This is very preliminary findings but my advice given these would be:
  • If the majority of your data are zeros consider using a binary logistic model or some other complex way of modelling zeros
  • If the data is reasonably normal then taking a log is unlikely to add anything and may weaken the transformation.
  • It is only in cases where data is skewed and the minority are zero that a transform worth doing. In these cases working with a fiddle factor of the lowest score as offset may well be the best option.


No comments:

Post a Comment