Rumours of Research: August 2012

Wednesday, 29 August 2012

Playing with Google Scholar Profile

Now I know I am academic related staff, but because I am a statistician for a while I have quite often been a minor author on a number of academic papers. There is advantages to this for the other authors, the statistical reviewers are far less likely to decide to pick a fight over statistical niceties if they know you have consulted a statistician and if a statistician is there as an author then that is clearly the case. Well I can think of at least one case where I have had to redo tables because although the authors used me for the statistical analysis which was all carefully adjusted, they then went and produced the tables from the raw data without any adjustment.

Well a few weeks ago I found out that Google now has Google Scholar profile. This actually keeps tabs on the times the papers are cited. So I decided to have a play and see what I could turn up. I am using my work email, which is NOT recommended by Google as Academics move around and Universities tend to close email account when staff leave. Remember I am academic related, not an academic and that therefore I am not into playing the same games as most Academics, really my performance is not assessed by my citation index. Secondly, when I am doing my own research it rarely has any relationship to any of my papers. I suspect I will have to at some stage put together my other research papers but as at present they are exactly one in a very small journal I am in no hurry.

So what do I get, well you can view my Citation Profile which will give you an idea of the papers I have written and yes I really am second author on that first paper. It was my first ever paper and I was a very junior statistician who happened to be good at writing databases. That study need ongoing statistical and database support so I was allocated by my boss. It also was very much a one person team and A Webb really is the sole researcher.

Now here is a couple of niceties, when I go onto Google Scholar and am logged in on that account I get the following screen:

Google Scholar page

Now there are some things you should note firstly the bit that says "New! Scholar Updates Recommended articles for you" which links to this entry on the Google Scholar blog and secondly the "My Updates" at the top which actually links to a list of papers I might be interested in see:

Now I do not mind showing you this, it will not tell you anything about my research interests and it should be unfocussed, as I work with a number of research teams. I really can't seem myself using this regularly although I must admit I can see some areas of interest in some of the studies which one research group I work with may develop. However I suspect when the next paper is out, things will change again as it suddenly starts thinking I am working in another direction. However if I was an established researcher in a single field it might well be a way to keep up to date with who is working in my field.

Thursday, 16 August 2012

Cautions over comparing coding in NVivo

This is a blog in two halves, the first is totally practical and will tell you somethings you must note if you are going to compare codings. The second is theoretical and I want to sound a warning over the use of a particular statistics use in Qualitative analysis for comparing codings by different raters.

Right the first part is probably best told as a cautionary tale. A student turned up this week wanting to compare his coding with that of another student. The problem was that when he imported it into NVivo 9 it saw the documents in the two files as different documents. Thus making it very difficult to comparing coding schemes. It took us a while to find out why it was doing it and it was in the end because he had made minor edits to the files after the copy was taken for his friend to code on. That meant he had to go away and transfer all the coding from his friend onto his data files before he could do the comparison. There are ways to avoid this.

Firstly if you have coded your data and are now going to take a clean copy for a friend to continue coding on I suggest you also take a backup copy of your set with which to compare it with, just in case you get itchy fingers.

The second is a tip for getting around this in NVivo and making sure NOBODY edits the files including yourself and that is to print them to pdf before you import and only import the pdf. To do this you would need to download a free print to pdf utility such as CutePDF and use it to produce the pdfs. When these are imported into NVivo you cannot edit them. So there should be no problem with anyone editing them.

However now my concern, he went onto say he was going to calculate a Cohen's Kappa and was surprised when I knew what it was. I know it because it has come up regularly in my statistics degree and I have calculated it a fair number of times. There has nearly always been debate about its applicability and when I was a junior statistician it was usual to report percentage agreement as well. The thing is I have always used it for agreement on the categorical data where I have clearly defined units, such as children grading by two teachers. Or whether two radiologist classified tumours in the same way from xrays. Child and xray are clearly defined coding units and the grades are normally exclusive.

Now the big problem I have is that there is no nicely defined coding item and it quite often happens that the same bit of text is coded to two different codes. Think about it you might want to code the short sentence "John wore a red dress to the party" to both cross dressing and the use of red as a colour. This is perfectly sensible qualitative coding. One simply might look at present and absence of coding on a sentence but then there is a problem of coding with the un-coded units, easy perhaps to deal with if the coding unit is defined as a sentence but if you have "red dress" coded to red and "John wore a red dress" codes, then what is coded in both instances is less than a sentence. Equally there may be times when the code only covers several sentences. If coding unit implies that it is coded to that then how do we know how many code units there are that neither rater has found? I don't have a clue?

So I checked the literature and came across the paper "Content Analysis Research: An examination with Directives for Improving Research Reliability and Objectivity" by Richard H Kolbe and Melissa S Burnett which says:

"However, the use of kappa is difficult in content analysis because a key value, the number ofchance agreements for any particular category, is generally unknown."

which is basically what I am getting at, the underlying fluidity of the coded unit means I would have grave cautions about using Cohen's Kappa. You can read them to see what alternatives there were already, and I suspect more have come about since. I think I probably need to do some serious literature searching on this when I have the time so I can give accurate advice.

Thursday, 9 August 2012

Finishing an analysis

What does it mean to finish an analysis? There are perhaps three different times when you finish an analysis and I have had them all this last fortnight

When you stop analysing: This is really the last one to happen this week, but it is when you draw a line under then analysis and say I am not going to explore anything more no matter how interesting it appear. I had this with someone where I just have to do the confirmatory factor analysis for them and write it up, that will be the last stage of an analysis that has covered everything from simple descriptives, principle components analysis, correlation and so on. I am not writing this up for which I am truly grateful as it is a massive sprawling analysis and I am not sure it tells us anything particularly new.

When you write the report: this is different, in some sense I got to the previous stage before, all I am doing was repeating it, making sure I dotted the "i"s and crossed the "t"s. I know the story the data tells now all I am doing is thinking how to present it so that other people can incorporate an analysis I have done into their work. Almost essential for writing publications. This I do only when I am analysing the data for someone. I had this with a dataset I got about four months ago and it just needed me to do things with a clear head.

When you do the bits after a journal review: When you make the alterations to a paper after a review. The report was written months ago, the academic (in this case a prof) has written it up and submitted it to a journal. The journal has got back with reviewers comments and some of those involve the stats. Normally at the point at which they decided to present the basic data because you had not included the nice tidy table that they wanted and rather than trouble you they go and do it themselves. This means that you have to sit down and produce the results as they came from the analysis. If you are organised you have all the analysis neatly put together in a folder. Unfortunately all I could find was the output and not the exact output so had to recreate it. This is always risky as there are so many nuances around an analysis.

So which is the end? I hope the first but often it is the several times through the third before you can put the data away safely.

Anyway I also started one analysis this week, and I am quite sure there will be more along shortly

Thursday, 2 August 2012

When Answering a Query is not straightforward

This is the story of what should have been a straightforward query, which still has not been answered, well over a week after asked.

Way, way back (many centuries ago, well not quite) but well over a decade, Microsoft Windows included a technology called ODBC (well if I believe the link it is actually a C language API but it came with Windows). If a program in Windows could use ODBC then it could access all other sorts of data, particularly for the current query, Microsoft Access databases. It allowed you to write SQL queries to a database without being in that databases native environment. This was great for getting data into SPSS when the driver existed but there were some awkward bits to it. It was definitely the sort of programming bit that you tended to leave to programmers. I used to have notes on how to do it.

So a query came in saying this was not working for them. So I think, go into SPSS, have a play and see if you can remember what the problem is. First the good news it is easy to duplicate the fault. Now the bad news I find the work around isn't working because the original drivers are not installed.

Next step try installing the drivers again. As far as I can recall the drivers came on a SPSS installation disk. Only they do not come with the latest one, nor do they appear to have come with the previous version. Hmm. It is time for me to go and see if I can download them.

Then I hit trouble big time. Let me explain, I am the basic support person for this University for SPSS, but the person who looks after the license (because that is finance is different). When sorting out the transfer to IBM from SPSS, the transfer was done totally in the name of the finance person who also downloads the software for distribution. All would be fine but I am on IBM SPSS system as a Beta Tester. The result is confusion in the system. I am there but I am not there. Added to which the support system is complex and I am never sure where I am supposed to go and where not.

After two days and numerous phone calls involving a whole range of people both here and at IBM I am able to download the drivers. I still have to install them and then see if I can get it work. Once I have got there I can then try and and sort the query.

Just thought I wonder if they can get the query out as comma or tab delimited from Access and then read it into SPSS. When in doubt the cludge often works.