So a dummy variable is a simple way to assign values to particular groups. Lets suppose we have a choice of lunch boxes and we want to know what role colour and menu plays in choosing children's choices. So we have red, blue and green lunch boxes and we have healthy (wholemeal bread, low fat spread, tuna and sweetcorn sandwich with a plain low fat yoghurt and some grapes), normal (brown bread ham sandwich with a fruit yoghurt), bad (white bread jam sandwich with a chocolate mousse pudding) menu options. Kids are given a lunch box at random and asked to rate it. We also want to check girls from boys.
| contrast | |
| Boys | 0 |
| Girls | 1 |
| contrast blue | contrast green | |
| Red | 0 | 0 |
| Blue | 1 | 0 |
| Green | 0 | 1 |
| contrast normal | contrast bad | |
| good | 0 | 0 |
| normal | 1 | 0 |
| bad | 1 | 1 |
The interaction between colour and menu are calculated by just multiplying one by the other or if you prefer the matrix as below.
| colour | menu | blue-normal | blue-bad | green-normal | green-bad |
| red | good | 0 | 0 | 0 | 0 |
| normal | 0 | 0 | 0 | 0 | |
| bad | 0 | 0 | 0 | 0 | |
| blue | good | 0 | 0 | 0 | 0 |
| normal | 1 | 0 | 0 | 0 | |
| bad | 1 | 1 | 0 | 0 | |
| green | good | 0 | 0 | 0 | 0 |
| normal | 0 | 0 | 1 | 0 | |
| bad | 0 | 0 | 1 | 1 |
This sort of recoding is time consuming and most statistics packages now do it for you! That is right they still do it but the computer does it for you rather than you needing to think about it and they can choose the contrasts with nice properties and then encourage you to use multiple comparison tests to find where the differences lie. This is even true of R. However if you really want to you can always use dummy variables rather than trusting the computer.
However what happens when you have structural zeros, that is combinations that do not occur. Packages like SAS and SPSS actually do the thinking for you set the contrast to zero and take it off the degrees of freedom but others say "sorry we can't fit this"
Well you can, because you just have to calculate the dummy variables. Then you do a frequency on each dummy variable and those who only have 0 in them get dropped and you just add the other terms. Simple really isn't it. I need at some stage to do the R code to demonstrate this!
No comments:
Post a Comment