Another day and more PGR silliness from Leiter about the protecting his Camelot from hordes of SPEPers–proletariat rabble, Nietzsche’s last men–seeking to undermine the good, the true, and the beautiful. But in all seriousness, I can’t fathom how people who claim that what they admire most in philosophy, its rigor, can simply look the other way regarding Leiter’s unwillingness to submit the PGR to standard peer review. I can speculate about what motivates this sort of head-in-the-sand behavior, but first let’s look at some basic facts.
It’s standard procedure to have instruments used for evaluation checked by experts, in the plural, because, well, that’s how science should operate–community of investigators, etc. This is all the more true if there are genuine concerns about biases and methodological problems in a study. Leiter has not done this. He is satisfied with relying on one sociologist, Kieran Healy, who, according to Leiter, has given his stamp of approval to the PGR. But if one looks at Healy’s posts on Leiter’s Blog, his analyses deal with correlations internal to the survey, for example, between various groupings of evaluators, who were hand-picked by Leiter and the Advisory Board. There is no discussion about whether this pool is representative of the profession–and Healy has a disclaimer of sorts: I am a sociologist–or whether evaluators are given detailed and coherent criteria for evaluation. The PGR cannot be defended on the basis of Healy’s work against trenchant criticisms that have been brought to bear, and I don’t believe that it was intended to do so. Healy would have had to respond to a whole list of problems that he doesn’t touch, at least not as a guest blogger on Leiter’s site. In any case, he is one person, and this is not the way a controversial survey should be defended. At minimum, it should be reviewed by an impartial panel of survey specialists, and the results should be published. There is no scientific basis for accepting the PGR. It’s like creationism: all appeal to authority and no science.
However, Healy occasionally provides useful analysis that should be an eye-opener to believers and non-believers alike. Let’s look at one of Healy’s observations.*
Respondents love rating departments. A small number of respondents rated 25 departments or fewer, but the median respondent rated 77 departments and almost forty percent of raters assigned scores to 90 or more departments of the 99 in the survey.
In another post Healy says that the median for U.S. evaluators is 81. So almost 40% of the evaluators feel comfortable ranking 90 or more departments, and in the U.S. half ranked 81 or more. “Respondents love rating departments.” True words! But unless people are spending their careers studying the virtues and vices of other philosophy departments, I don’t know how they can claim comparable knowledge of 90 departments, which is what they would need to evaluate them fairly and judiciously against each other, which, in turn, is what a ranking is about. Has anyone bothered even to examine whether those who rank so many departments have the knowledge to do so? Has anyone done a survey of the surveyers? But the problems don’t end here.
Two other preliminary points on Healy. First, he repeatedly uses the term “reputation” when discussing PGR-evaluated departments, acknowledging that it’s a reputational survey, which is exactly the way many of us see it. We who are skeptical about the PGR don’t believe that reputation equals quality–Leiter thinks it does–especially when so small a slice of the profession does the evaluating. (Leiter from the PGR site: “This report ranks graduate programs primarily on the basis of the quality of faculty.”) Second, it’s worth noting the source of Healy’s data.
The data I’ll be relying on come partly from information available on the PGR website itself, and partly from rater-anonymized versions of the 2004 and 2006 waves provided to me by Professor Leiter.
Shouldn’t a scientist require some independent verification of the data that is being provided? I won’t pursue this here, although with the latest iteration of the PGR, given all of the controversy, transparency should be Leiter’s middle name.
As many of us have pointed out, the criteria for evaluating are neither rigorous nor well-defined. Leiter himself acknowledges that people have used different philosophies of evaluation. Let’s take an especially timely example. Yesterday we heard from one evaluator, Eric Schliesser, about the criteria he used this year for evaluating departments in his area of specialization. “What did I rank? Well, some vague mixture of (a) quality of work; (b) personality; (c) sense/evidence of advising capability.”
This “vague mixture” is idiosyncratic, or, better, unknowably idiosyncratic, because, once again, evaluators have so much leeway to choose their own criteria. But let’s suppose that Schliesser’s list is acceptable for departments in your area of specialization. How would it possibly work for the overall rankings with up to 99 departments to rank? Does Schliesser provide additional information about the criteria he used for the overall evaluations? No. Instead, he mentions some items he thinks worth considering while making overall evaluations:
Finally, on the department-wide scores: part of me still strongly thinks it is an extremely dubious exercise — my knowledge of the give-or-take-1000+-names is superficial –, I also came to think that it was not so hard to distinguish between, say, the obviously understaffed or very narrow departments from the departments filled with reasonably well known and good even excellent people. What I found most difficult, in fact, was to rank the departments that I know really well (because of a recent visit or stay) and that have some terrific people but that are also uneven in various ways (coverage, quality of members, known-to-me-predators on faculty, etc.).
This is honest and I appreciate Schliesser’s candor. I wish other evaluators would tell us about all of the different ways that they approached the process. There may be a unique set of criteria for each evaluator, thus, these reports would take on the character of independent surveys. Nevertheless, my immediate response to Schliesser is this: man, if you felt this way, why did you do it?
I am not seeking to single Eric out here, but to reinforce a point that has been made many times: the PGR is not a rigorous survey, not even close. It’s just the opposite–and intentionally so. The same star-struck mentality that its founder exhibits when talking about “major faculty moves” infects the whole process of evaluation. Instead of having defined criteria, we get a religion of (certain kinds of) well-known figures in the profession. Why would anyone play buzz-kill and insist on explicit criteria for evaluation? Not Brian Leiter. The devotional culture must be catered to, respected, and cherished, because–who knows?–its members might decide not to participate if tasked with following specific instructions and using standardized, definite criteria.
Now to the question I raised at the start: why would people committed to rigor participate in this unscientific affair? Of course I don’t have a single, definitive answer. People do things for different reasons, including their own job worries. But given that the PGR’s supporters have not called for the survey to be objectively and rigorously examined, and given how much they profess admiration for rigorous analysis, including scientific analysis, it’s clear that something extraneous to the PGR’s methodological merits is a factor. My guess: peer pressure plays a role, even if indirectly. If you are part of what has come to be known as the PGR ecology–perhaps supporters use the phrase hoping that it will be treated as a protected species/ecology–it’s not easy to opt out if other members of your club, or your tribe, are in. You fear people will see you as a less-than-fully committed member of the club, which means that your work may not receive the same kind of acknowledgment as your more enthusiastic clubmen (and women). In a culture given to the commodification of recognition, this must be a concern, a deep one for many people.
*Here is another Healy observation that should raise some eyebrows.
It’s clear that not all specialty areas count equally for overall reputation. In 2006, NYU and Rutgers were weak or had very little reputation to speak of in a couple of areas, but still outranked Oxford. Similarly, the other top departments all have gaps in their coverage.
As many of have suspected, not all specialty areas are created equal. Further, it seems that by getting more votes your Department’s score rises. Being more popular, that is, more “worthy” of votes being cast in your favor, correlates with a higher score. (We can speculate here about the effects of the halo phenomenon and name recognition in reputational surveys.) I should also mention something peculiar. Healy typically discusses the charts that he presents. There is no discussion here, except to say that the correlation is much tighter than that found in the previous discussion on the question, “Might it be the case that how many votes a rater casts is related to the PGR score of their home department?” Why wasn’t there a discussion? This is certainly an interesting finding.
“Further, it seems that just by getting more votes your Department’s score rises. Being more popular, that is, more “worthy” of votes being cast in your favor, correlates with a higher score.”
Correlation is not causation.(And I think Healey suggests the opposite causal direction: people are more likely to rate departments that they clearly regard as good.)
David, Thanks for your comments. Of course correlation isn’t causation. I use the word correlates. Some context here. Before Healy introduces the chart that I reproduced he has a pretty long discussion revolving around, “Might it be the case that how many votes a rater casts is related to the PGR score of their home department?” After this discusion he has one line introducing the chart, “By contrast, consider the association between the number of votes a Department receives and its PGR score in 2006:” and then the chart, followed only by the words, “Much tighter, as you can see.” Referring to the chart (compared with the results of the previous discussion). This would have been the natural place to engage in a discussion of the point you mention. But he doesn’t do it, and yet this is clearly worthy of some discussion. I was bringing my readers attention to the correlation. Whether people are more likely to rate departments they think are good is an interesting question, to be followed by, why do they think they are good? People who analyze reputational surveys will tell you that halo effect and name recognition skew results, even when people are unaware that they are being influenced. Hence, my phrase, “more ‘worthy’ of votes,” with “worthy” in scare quotes. However, my use of the word “just” may be misleading. I will edit it out.