Now I tend to handle these in SPSS and up to recently SPSS was reliable in handling them. That is if the files got corrupted while running a procedure 99% of the time it was me who had messed up! Most of the time when I merged files if there was a problem it was either:
- Due to the fact I had not sorted the data
- Unexpected duplicate cases, either extra blank cases at the base or a duplicate where there should not be one.
Sorting data sets especially large data sets before you merge is tedious but that is all. Yes I have been in the make a cup of coffee crowd and see how much you drink while SPSS is sorting. I handled data sets with over 600,000 cases almost two decades ago on pcs. The machine used to sit in the office chuntering as it processed the data and I got on with other work. I would check it every time I wanted a coffee but normally it ran quite happily on its own. Due to the way that analysis work I had to invert the file at one stage by sorting.
Now SPSS decided that forcing people to do this sort of tedious work and wait around was not on so it developed "star sort". Not just that they put it up as the default option via menus with SPSS. It may work for small datasets with a few cases in it. It does not work for large ones in my experience. The result is that instead of a tedious sort command before I now have to either:
- Do a frequency and other checks that the data is as expected
- Sort the data as I always did and remember to uncheck the boxes so sort dialogue actually uses the old way that worked.
No comments:
Post a Comment