Beki Grinter

CHI 2012: Reviewing

May 15, 2012

I attended a few sessions devoted to discussing reviewing for CHI.

In the end I feel that there are two “camps” of ideas for improving the reviewing process and I do not think that they are reconcilable.

One set of suggestions I heard was to conduct experiments with papers and reviews. Several were mentioned. For example, take papers and their reviews and then have other people review them and see whether you can come up with same set of reviews. Another set of thoughts are around the generation of reviewing metrics. Metrics about how long a reviewers review is, how timely they are, and so on and so forth with the goal of creating a record of their behavior that can be used in the future to assess their reviewing ability. Behind these, and other experiments, seems to me at least to be a firm belief that reviewing should be treated as a quantifiable science.

But, then there are counter arguments.

For example, Danyel Fisher made the very astute observation that averages are a relatively meaningless concept in reviewing, even though we make use of them. As he put it, a score of 3 is not the same as a score of 5 and another score of 1. But when we average that’s what we turn those scores into. And he made me reflect on how we can and do talk about the scores in this way…

Jeffrey Bardzell makes an equally compelling case that reviewing is not a science with a comprehensive and  fantastic series of articles (1,2, and 3) in which he argues that it is a process of providing expert judgement. Danyel and Jeff are both, in my mind, getting at the same thing, which is that reviewing is a subjective act, based on expertise and such both its processes and its outputs should be understood and treated in such terms.

And it doesn’t stop with reviewers and ACs. Being a Program Chair is also a matter of expert judgement—one of assigning papers to AC’s and reviewers. Making decisions about how to compose the program committee are all not matters of science but of judgement.

I think the reviewing as science model is doomed to failure, and along the way it will create more work for everyone involved as we try to pursue a set of metrics that do not accurately characterize the work that we do, but become a substitute for it, with all the problems that that can bring. I think we need to take up more seriously the question about how we come to think of ourselves and practice a critical review practice based on a belief that we are experts not participating in a scientific process and what it means to handle not just the process but its products in those ways.

  1. I agree with many of your points but not the claim that the two perspectives are irreconcilable. Indeed, reviewing is about expert judgement. But we can work as scientists to figure out the best way to get the necessary expert judgements with the minimum effort. For example, data ( suggests that we can get the *same* judgements while using fewer reviewers per paper. Also, “equations” doesn’t automatically mean “science”: I’ve always felt that averaging scores is a terrible idea, but that doesn’t mean we should give up on experimenting scientifically to identify a better scoring method—for example, taking the maximum score, which I think would encourage papers of great interest to *someone* over papers acceptable to everyone. And finally, I think it’s perfectly reasonable to use data to assess whether certain individuals are helping our hindering the development of expert judgements—i.e., to measure people’s quality as reviewers.

    • Maybe I was being a bit more provocative than is warranted. BUT, I think that we have a tendency to one approach that overwhelms considerations of others. I’m also very opposed to metrics for reviewers, particularly if they are kept over the long term. I think the risks outweigh the advantages.

