Thursday, 28 March 2013

R and ANOVA

I have been spending time trying to prepare a course on ANOVA. Now I know what I am teaching, this is not a problem but there are two approaches and I am wondering which is better. There is the traditional approach that clearly distinguishes ANOVA from Regression and there is the more technical approach that sees them as two versions of the same thing. The problem is that in R you have both approaches programmed in.

aov is the traditional approach and a lot of the specific things people who like to use ANOVA are available only for this approach. So there is an easy way to do multi-comparison testing and also to do inspection graphs. It therefore is easier for a beginner

lm is the technical approach and has the power of being generalisable to a wide number of situations but a lot of the things that are implemented for aov are not implemented for lm as easily. So I end up writing a lot more complex set of coding.

I think I am going to write two scripts one which uses the aov approach and one that uses lm and this will let people choose but I will just run through using aov.

Thursday, 21 March 2013

Have You Checked Your Software is Up to Date?

I had a frustrating day yesterday, don't get me wrong, I got quite a bit of work done but there were a lot of glitches in doing so. I was working on two very different projects and both ran into problems because I had not done one thing. That thing was to check that I was using the latest version of the software.

Let me start with the first one. I spent sometime on Tuesday working on R. In particular I wanted to use the CAR  (Companion to Applied Regression) package which does some pretty basic things like automatically calculate Type II and Type III sums of squares which are pretty important if you do not want your ANOVA results to vary according to the way you specify the model. Although the previous week I had managed to load it, it was refusing to do so. To be more exact I had installed it earlier but because I was writing a script for others to use I had to make sure I had the code in it to install and load it. I was wondering whether it was a funny factor of R Studio which I like to work in rather than the basic R package. It does sensible things like not overwriting graphs that I have created!

Well slept on it, and the error message I kept getting started to float to the surface. It seemed to say that what it did not like was that the R I was using was an earlier version than that which CAR had most recently been compiled for. So I decided to get the latest version of both it and RStudio. The result was suddenly the code I was struggling with yesterday was working. Now this seems to me a problematic to me as it looks as if R packages are reliant on versions of R. The only way to be safe therefore is to download the latest R and all the packages you want to use apart from the core ones every time you work in R.

The second equally annoying. I was drawing graphs for a publication, and using Sigmaplot. Let me be clear Sigmaplot is not the best graphics package in the world, I have even on occasions used Excel in preference but in this case the graphs were straightforward, and I wanted beyond anything the ability to be able to put them into a wide range of export forms. I have run into so much hassle recently with getting graphs ready for publication that I wanted something that at least did eps and different graphics files. I exported the graphs once and thought that was it.

Later I came back to tidy them up, aligning them and getting them evenly spaced. This takes time and is fussy. It is not the thing that I like to do. I then noticed that whereas I had more then the page length exported it was not exporting the full width. So I went and tried to change the export parameters. I could change the size but if I altered the width it altered the length proportionately and visa versa. In the end I fell back on the computer technicians old stand by and Googled it. It immediately led me to the fact that the software has bugs in it which had been fixed by a later release. So I uninstalled the old version and reinstalled the newer one. Then exported the graphs again which this time went out correctly.

So the moral is, if you are doing a major piece of work and relying on software please check that it is a version that is up to date! Do not rely on software producers to tell you when to. Most of them do not (QSR are the one exception I know).

Wednesday, 6 March 2013

How many researchers are using software to analyse data?

I got asked that question by a colleague this morning, I did not give him the answer but I gave him a method to try and determine a guesstimate.

There are problems with estimating this apart from the fact we never thought about trying to answer the question so never kept proper records. Let me start unpicking the question and you will see the problems. I will work through them in the order they come in the question.

What do we mean by researchers?

This is a research intensive university but I bet you that nobody in the entire institution could tell you how many researchers we actually have. There are some numbers that are fairly easy to estimate. Academic staff are divided into Research Specialists, Research and Teaching, and Teaching  Specialists. So we just add up the Research Specialists and those employed in Research and Teaching.

Well given current employment data we might manage that box, but then there is a whole lot of staff who are crucial to research who are not on Academic grades. The lab technician who keeps the equipment running, people like me who are in Administrative grades but work with researchers etc. I am not sure that the University even knows how many of them are.  Then there are also emeritus staff.

Now those are staff but they are not all the researchers. There are also students. Every doctoral student of the University is also a researcher, anyone doing a Research Masters is also. However what of taught postgraduates and fourth year students doing research projects, are they researchers? Given that some PhD students are also employed by the University we need to be careful about double counting. While there are groups that are clearly researchers there are huge number of people who may or may not be a researcher.

What do we mean by software?

Many of you will have automatically thought of that as research software. There are some packages that easily spring to mind SPSS, Matlab, NVivo etc. These packages are used to deal with specific problems and to analyse data. These are the ones we have some ideas of user of.

However there are also, probably hundreds of packages that are used to analyse data in specialist fields. They are probably not widely used, but when you are talking about four or five people using each of hundred packages then you are talking between 400 & 500 users. Then there are packages which we would have no way of counting such as R, which is free software and one of the top statistical packages.


Those however are the obvious packages. What about the not so obvious ones? Such as Excel (there is some pretty serious data analysis done at this University using Excel). The problem is that I would guess that more people use Excel than all the other packages put together. You have some simple data, you do not want to do much so you put it in an Excel spread sheet and draw a graph. That is data analysis.Or take another example Mendeley, it claims to be a pdf and reference manager, but it has the tools in there to do a rudimentary analysis of texts. You can comment, tag, rate and organise your pdfs into folders. It certainly is possible to analyse your papers for your literature review, what's that papers are not data?

What do we mean by data?

The spreadsheet of data collected from a series of experiments, from conducting a survey or by other measurement technique are what people usually think of as data.

However there are huge lot of data used that do not fit that simple idea. A set of newspaper articles is a dataset, video and audio recordings of interviews, samples from corpora (collections of texts), pictures (both photographs and drawings), maps. All this is non-traditional data.

Here computers may even help us, they give a very neat definition of data. Data is anything that can be stored on a computer. Actually that is not quite correct, the data is the underlying computer coding but it gives a picture. This of course only deals with computer data but the extension can go on to real world data. Detailed handwritten notes of performances of pieces of music are data and data that almost could not be captured in any other way.

Basically all researchers handle data.

Applying this to a real example

I am writing up my thesis for another University and yes I am a doctoral student and yes I would be counted for a couple of tools that have been subsidiary analytic tools for my thesis but the main one. That is simple it is Word. The analytic part of my thesis has been through a process or writing and rewriting until I have something clear to say. Nothing particularly unusual about this, it is a common process in the arts and humanities. Perhaps I was slightly unusual because the doctorate is anthroplogical and that means that the basis for data collection is observation. So I knew I had collected data but data which was only in part recorded on paper, audio recording, video and photograph and which I relied heavily on my memory to recall. So how many researchers use Word as the tool for carrying out the analysis of their data?

Friday, 1 March 2013

From first ideas to first research paper

I am rarely in at the start of a research process. Often for me researchers come when they get stuck and need a specialist to help them out of a situation; sometimes they come when they have collected their data and want someone to guide them through the analysis and some even come at the design stage of the research. When the final starts happening I know I have a good research relationship. However me being there at the germ of a research idea is rare.

So here is the first one where I was. I have for a while attended the Broomhall Breakfast which is a community breakfast held every Friday that cooks a breakfast for all who comes. There is no charge but donations are welcome and as result it has mixed clientele. There are alcoholics, artists, police officers, homeless people and asylum seekers.  Storying Sheffield also made a video about it, if you want to know more. Well it occurred to me that actually the Breakfast would be interested in knowing more about what people were eating and how they might improve the breakfast. They already provide bananas and cereal bars for people to take away and tend to be quite fussy about what they cook. So as one of the people I regularly work with was in Human Nutrition and they run a masters course, I asked whether she would be interested in investigating and she said yes.

We got a student, indeed we were fortunate and got a very bright student who managed to cope with both analysing food diaries and also conducting qualitative questionnaires. The focus changed once she started going to the Breakfast and realised how many different sorts of people go there. She decided to focus on those who are homeless or temporarily housed. This meant that actually with the mix of the people at the Breakfast, there were not enough people and she needed to find another place to recruit. She found that in the Archer Project which really does specialise in providing food for people on the margins. This gave her the ability to recruit enough members to actually get some idea what homeless and vulnerably housed people were doing for food in Sheffield.

Importantly it also meant she began to get ideas about food aspirations and patterns of food. For instance one of the crucial things was that although there were quite a few places to get food on a normal working day, weekends and particular bank holidays were another story all together. That meant that sometimes these people were going for almost forty eight hours without food.

This got written up as a paper and eventually submitted to the Journal of Human Nutrition and Dietetics, and today we heard it has been accepted. However this is only the start, there are still big questions about how, given the resources and restrictions so many of the charitable places that provide food have, do we go about improving the diet. Also work on what motivates and enables people who create these places to function.