I have heavily revised this post.  The new version appears on New Apps on November 2, 2014 under the same title.


Dear Colleagues,

According to a recent post on Brian’s Leiter’s Blog, you will soon be receiving surveys to fill out for the PGR.

My co-editor Brit Brogaard (Miami) and her RA have done a great job finishing the evaluator and faculty list spread sheets, and the IT professionals here should have a testable version of the survey ready for us to try out during the weekend.  If all goes well, Brit will send out the invitations to evaluators early next week (Monday or Tuesday is our goal).  We agreed to a somewhat shorter window for responses (two weeks, rather than three weeks) due to the late start date this year and our goal of getting the results out in time for students applying in the current cycle.

UPDATE:  The IT folks are still working out certain bugs in the survey program, so we won’t be able to test it before Monday.  That means, at the soonest, Prof. Brogaard will be sending out invitations on Tuesday or perhaps Wednesday of next week (Oct. 21 or Oct. 22).

I am sure that you must be aware of the controversy surrounding the PGR’s rankings, which appear legitimate because philosophers are responsible for them. There have been many persuasive pieces written about the biases inherent in surveys of this sort. I write as someone convinced that rankings do more harm than good. A comprehensive informational web site with a sophisticated search engine would be my personal preference. But I will not try to convince you of this here. I write to run some numbers by you and ask that you consider them before filling out this year’s survey. I am not claiming my concerns are original.  But I do want to highlight some of them as you consider whether to fill out the survey. Many philosophers do not fill out the survey when they receive it, and there are good reasons for you to take a pass on it this year.  Here’s why.

According to Leiter, he is currently working from a list of 560 nominees to serve as evaluators for the 2014-2015 PGR. During the last go-around in 2011, 271 philosophers filled out the part of the survey dealing with overall rankings, and a total of 300 filled out the overall and specialty rankings.   Leiter claims that in 2011 the on-line survey was sent to 500 philosophers. So many philosophers decided NOT to fill it out even after receiving it.

Let’s consider some of the numbers. Three hundred may seem to be a reasonable number of evaluators, but the total number of philosophers obscures crucial details, and one doesn’t need any sophisticated form of statistical analysis to judge how problematic they are. If you look at the thirty-three specializations that are evaluated in the PGR, slightly more than 60% have twenty or fewer evaluators. That’s right, twenty or fewer. Please think about this for a moment, twenty or fewer philosophers, in one case as few as three, are responsible for ranking 60% of the specializations found in the PGR, what many consider to be the most important feature of the PGR.

But it is actually worse than this.   There are certain areas that have many fewer evaluators than other areas. For example, the PGR lists nine specializations under the History of Philosophy rubric. Six of the nine have twenty or fewer evaluators. And one of the specializations, American Pragmatism, has only seven. As a matter of fact, the only general category to have the majority of specializations with more than twenty evaluators is “Metaphysics and Epistemology.” Five of its seven specialties have more than twenty.   But none of the others–Philosophy of Science and Mathematics, Value Theory, and the History of Philosophy—have a majority of specializations with more than twenty evaluators. And in the three specializations outside of these rubrics we find: eleven evaluators for feminism, three for Chinese, and four for philosophy of race. (Yes, the PGR actually provides rankings for Chinese Philosophy with three evaluators.)

But don’t take my word for this problem. Here’s what Leiter says on the 2011 survey site.

Because of the relatively small number of raters in each specialization, students are urged not to assign much weight at all to small differences (e.g., being in Group 2 versus Group 3).   More evaluators in the pool might well have resulted in changes of .5 in rounded mean in either direction; this is especially likely where the median score is either above or below the norm for the grouping.

I’m sorry. The urging of students “not to assign much weight at all to small differences” does not solve the problem. No weight should be assigned to specializations ranked by so few people. This is not rocket science. This is common sense. You can’t evaluate the quality of specializations that have so many facets with so few people, who themselves were selected by another small group of people, the Board, which clearly favors certain specializations given the distribution of evaluators. (This is especially true when there hasn’t even been a public discussion about what should constitute standards for rankings of specializations in philosophy.) Yet Leiter’s advice makes it appear that one should take the specialization rankings seriously, that is, if one just doesn’t assign too much weight to small differences.  This is a shady rhetorical move.

I honestly don’t know how one could fill out the survey in good faith knowing that so few people are participating in ranking so many specializations. When you fill out the survey you are making statement. You are providing your expertise to support this enterprise. The fact that you might be an evaluator in M & E, with more evaluators than the other areas, doesn’t lift the responsibility of involvement. At minimum, you are tacitly endorsing the whole project.

Ah, you say, but perhaps this year’s crop of evaluators will be more balanced. However, the way that the PGR is structured undermines this hope. The evaluators are nominated by the Board, which has roughly fifty members. Most of the same people are on the Board this time around as last time. But here’s the kicker: Brian asks those leaving the Board to suggest a replacement.   The obvious move for a Board member here would be to nominate a replacement in his or her own area, probably from his or her own circle of experts. In Leiter’s words, “Board members nominate evaluators in their areas of expertise, vote on various policy issues (including which faculties to add to the surveys), serve as evaluators themselves and, when they step down, suggest replacements.” So there is no reason to believe that the make up of the pool of evaluators would have markedly changed since the last go around.

The 2014-2015 PGR survey will be in place for at least the next two years, maybe more, given the difficulties that the PGR faces. There are a lot of young people who will be influenced by it. Please consider taking a pass on filling out the survey. If enough of you do so, the PGR will have to change or go out of business.  Given the recent and continuing publicity surrounding the PGR, we should try to avoid embarrassment, which is likely to occur when those outside of philosophy, especially those who know about survey methods, discover our support for such a compromised rating system.


Three disclaimers:

1) I purposely sought to keep the statistics as simple and as straightforward as possible in this post in order to raise basic questions about imbalances and sampling size in the current PGR.  Based on these and other considerations I ask prospective evaluators to reconsider filling out the survey.   Gregory Wheeler has a nice series on some of the more in-depth statistical work in “Choice & Inference.”  See, the series and concluding piece, “Two Reasons for Abolishing the PGR.”

2) If there is public content regarding changes to the PGR that is available, and that I somehow missed, I would appreciate being informed about it. As far as I know, no fundamental change is taking place in this year’s PGR.

3) I counted the number of evaluators in the different categories. Of course I could have made an error in the count somewhere. But the numbers are certainly correct enough to back up my concerns.

21 thoughts

      1. I would like to see this too, and see Noesis built. Genuine metrics could well be made available that are evidence-based and inclusive. The time is right. So, who should support such an initiative? The APA? At one point the NEH was not permitted to fund projects that could alter the shape of a profession out of concern for apparent political motives, or something to that effect. But you are right, Mitchell, in your supposition that such an engine is possible. I have no doubt that it is possible.

  1. It *is* worrisome that some specialities are ranked by so few people. But of course, there aren’t too many experts in, e.g., Chinese philosophy. And sometimes worries come to nothing. Do you have any actual evidence that the 2011 rankings in some of these speciality areas were egregiously wrong? That would be a *problem*, not just a worry. And of course, there are a variety of ways to solve that problem that do not involve fewer people participating in the survey.

    1. Jane, I’m not sure how anyone could provide evidence that the rankings were “wrong”. The rankings are the averaged opinions of those who do the ranking. Your question seems to suggest that there is some “correct” ranking that, given enough information about the discipline, everyone would agree upon. This, of course, is almost certainly not the case. And there are NOT a variety of ways to solve that problem.

      1. Hi Emily! I certainly don’t think that there is some ranking that everyone, given enough information, would agree upon! Philosophers can’t agree about anything. But I still think there are rankings that are better than others. You seem to think that there is no correct ranking. The concern in the post, however, is that there aren’t enough evaluators for the rankings to have a decent chance of being right, or at least good enough. That is what I was responding to. If the rankings aren’t any good at all, then we should be able to give arguments to that effect. I think I could give a compelling argument that a ranking that was the inverse of the PGR would be a very bad ranking of philosophy programs indeed. I also think that I would be able to give powerful arguments that most random rankings of Ph.D. programs were awful rankings.

      2. What about a rating system that would allow users to set their own criteria for ranking and evaluation depending on what type of program for which they were looking? This is certainly possible.

      3. Jane,
        Hi. Thanks for your comments. I want to clarify a point about my post. I wasn’t attempting to make a general argument against rankings or the PGR here by raising issues regarding the lack of evaluators and the imbalances in areas that are evaluated. (I do believe that such a case can be made and has been made, namely, that the harms of rankings outweigh any goods. Gregory Wheeler’s series, linked on my post, is a good place to review some of the problems with the PGR in particular, not only because of the issues that he raises but because there are links in his series to other discussions.) The point of my post is that evaluators should take a pass on participating this year because the procedures in place can not assure a fair and accurate assessment of departments. This is based not only on the number of evaluators but the manner in which they have been selected, as I note in the post. The bottom line for me: the alleged “objectivity” of PGR is compromised and we shouldn’t proceed with business as usual. We don’t like sloppiness in our thinking as philosophers. The PGR is “sloppy” as it stands.

      4. Hi Mitchell,

        Thanks for your reply. I agree that there are many different concerns about the PGR, but on the “sloppiness” concern, which I took to be your main one, I am asking for evidence of the effects of sloppiness. I mean, someone might claim, or have evidence, that a philosopher is sloppy–she just writes things quickly, doesn’t scrutinize her arguments before sending them out, etc. But if her work was all impeccable, it would (a) seem sort of out of place to chastise her for producing impeccable work via a sloppy procedure, and (b) provide evidence that maybe her procedure was less “sloppy”, at least in any objectionable sense, than we might have thought.

        Now, I don’t claim, and don’t believe, that the PGR is impeccable. That would be an unrealistic standard. But I think the best evidence that the procedure for producing it is grievously flawed (sloppy) would be grievous flaws in the outcome. Of course, if you don’t think there could be grievous flaws in the outcome (since there aren’t better or worse ways to rank departments), then there couldn’t be such evidence. But if there aren’t better or worse ways to rank departments, then the number and makeup of PGR evaluators, and the sloppiness of producing the ranking, are just red herrings, no?

        Does that make sense? Sorry if I’m misunderstanding something, and thank you for hosting a discussion of this important issue.

      5. Jane,

        I do hope that we are not talking past each other. There are at least two questions on the table: 1) are the PGR’s deficiencies sufficiently problematic that we should step back from another round of evaluations? and 2)should we be ranking at all if there are alternatives to rankings? I have addressed the latter question in posts on this site and so have many others over the years. Given that a comprehensive, information site with a sophisticated search engine is doable, it seems foolish to continue rankings for the alleged benefit of graduate students, given the numerous problems that “quality” rankings create, e.g., the marginalization of those working in different areas and traditions from mainstream analytic areas such as metaphysics and epistemology, the creation and perpetuation of a halo effect that biases hiring decisions, and the reliance on so many evaluators who hold degrees from the same schools, leaving at minimum the impression that the we have an old boys network operating, etc., I think we can do better with an information system.

        Regarding the question of better or worse ways to rank departments: for those inclined to rank departments, I do think there are better and worse ways, that is, there are less sloppy ones. And since I was addressing myself to those who might be believers in rankings, I was saying: hey, before you proceed, consider if you want to support the system as it is. Here I can defer to those who have spent time on the methodological issues. The following is a quote from Richard Heck, which follows a list by him of problems with the PGR.

        “Anyone with any experience conducting serious studies that rely upon such surveys—and yes, I’ve talked to several such people—would know how dangerous, even potentially crippling, such flaws are. I’m still as puzzled as I always have been why such glaring methodological flaws are tolerated by people—Leiter, by his own account, anyway, and members of the Advisory Board—who claim to have only the best interests of undergraduates at heart. Frankly, the oft-trumpted fact that some students are so hungry for information that they would take to rejoicing when even a scrap of crust fell from the table doesn’t much impress me. Most defenses of the Report come down to “It’s better than nothing”. Well, maybe it is, and maybe it isn’t. But either one cares about providing reliable information or one does not, and the apparent lack of concern about the sorts of problems just mentioned makes me wonder.” From “About the Philosophical Gourmet Report.

        I was asking those who consider it to be better than nothing to reconsider. I am asking if they want to provide an imprimatur for a model that is flawed. I am asking if they might want to take a deep breath and consider the options before signing on to the survey this year. Do they really want to support another round of a flawed rankings system, one that has exacerbated divides in our profession? (In one post I made a simple plea: let’s have a real public debate on this issue before we proceed with any more rankings. Philosophers who respect public argumentation should engage in a debate on this topic and not force others to accept their view of the profession and the worth of rankings by de facto establishing them.)

      6. Hi Mitchell,

        Thanks for your reply. Maybe I was confused about how the argument in your original post was supposed to work. I’ll present what I took to be your argument and then explain why I don’t find it compelling (and don’t think anyone else should either).

        1. The PGR ranks specializations with as few as three evaluators.
        2. “No weight should be assigned to specializations ranked by so few people.”
        3. Therefore, no weight should be assigned to those PGR specialization rankings.
        4. If no weight should be assigned to those PGR specialization rankings, they are worthless, or worse, and a waste of time to produce.
        5. Therefore, those PGR specialization rankings are worthless, or worse, and a waste of time to produce.
        6. We shouldn’t help produce things that are worthless, or worse, and a waste of time.
        7. Therefore, we shouldn’t help produce the PGR (or at least certain specialization rankings).

        I’m objecting to (2). There are at least two ways of thinking about what those rankings are supposed to reflect: actual quality, or perceived quality. If the rankings are supposed to reflect actual quality, then whether we should assign any weight to those rankings depends, primarily if not exclusively, on how good the rankings are. I certainly agree that it is reasonable to worry that those rankings will not be any good, given how few evaluators there. And I’m sure there *are* PGR rankings that are not perfectly correct. But if the PGR rankings were drastically off–such that, e.g., the bestselling authors on Chinese philosophy who publish with the best presses are not at the best PGR ranked schools–that should be relatively easy to establish. But I have read a decent amount about this (although not everything on your blog), and I have not seen Heck, or Velleman, or Ernst, or anyone else give any sort of substantial argument that the PGR rankings, speciality or otherwise, are significantly wrong. (If such arguments exist please point me to them!)

        If, however, the rankings are just supposed to reflect perceived quality, then the mere low number of evaluators seems more relevant. Of course, those three evaluators might be a representative sample of expert opinion (where maybe ‘expert’ just means influential). But the fact of the matter is–it is a fact that many lament–the PGR influences our perceptions of quality. If a student wants to be perceived as having gone to a good program in, e.g., Chinese philosophy, it seems like a pretty good idea for her to go to a PGR top ranked school in Chinese philosophy, even if, were the PGR not to exist, those school’s programs in Chinese philosophy would not be top regarded. Again, maybe the best choice for her would be to go to #2 or #3 rather than #1 or whatever. But if the PGR is giving *drastically* wrong rankings in terms of perceived quality, such that going to a top PGR school would be a bad idea, that should be relatively easy to establish. But again, I haven’t seen Velleman, or Heck, or Ernst, or anyone else do that. (If such arguments exist please point me to them!)

        Anyway, that’s my worry about your argument. Sorry if I’ve misunderstood it, and for any earlier lack of clarity.

      7. Jane, Hi.
        I am afraid that we are not going to agree here. First, under the principle of charity, I would have assumed that you had read my other posts on the topic, which cross-reference each other. Second, you failed to quote the rest of the passage in which I mentioned the weight issue, your #2. Here is what I said, “No weight should be assigned to specializations ranked by so few people. This is not rocket science. This is common sense. You can’t evaluate the quality of specializations that have so many facets with so few people, who themselves were selected by another small group of people, the Board, which clearly favors certain specializations given the distribution of evaluators. (This is especially true when there hasn’t even been a public discussion about what should constitute standards for rankings of specializations in philosophy.)”

        My argument was never solely about the number of evaluators, although I do believe that this is a serious issue. But we must look at why it is serious. As you can see from the rest of the passage, I mention several points, the complex object we are ranking (many facets), the manner in which the evaluators are selected, the bias in favor of certain specializations, and the lack of public discussion about what standards we should have for evaluating. Other commentators on this thread have raised the problem of standards, but your response appears to be, you can’t prove that what we have is significantly wrong, while at the same time acknowledging, “I’m sure that there *are* rankings that are not perfectly correct.” However, you have no way of saying why a program is good or bad, except to appeal to proxies (e.g., best selling authors), but the proxies themselves are what people are divided about. (And there is also a lack of information about proxies, e.g., we don’t yet have good, comprehensive placement records for all graduate programs. And even if we did, we would still have to decide on how to value certain placements. Is it obvious that a placement at a Research I institution should be more valued, ranked higher, than a placement at an Amherst, Williams, or a Smith? Why and who decides, etc.?) You are assuming what needs to be proved, that is, we have standards for evaluation and you know what they are, certain proxies and how to value them. And Leiter implicitly if not explicitly recognizes the standards problem because he is so vague on his “Criteria and Methods” page about spelling out definitive criteria. This is not at an accident. He can’t define them without creating a great deal of debate, so instead he allows the evaluators tremendous room in deciding on how to evaluate, even suggesting that they have different philosophies of evaluation. This is a very poor survey method. There should be definitive criteria and clear guidelines for judges regarding their roles. Instead the two are run together by the PGR.

        Since the issue of standards, and how we define quality, has yet to be resolved, or even adequately addressed, I do think we had better err on the side of caution, not only for the benefit of students but for the sake of having a democratic public process to discuss these questions.

        I will also say that you are defining the harms too narrowly. There are ways in which the PGR has an impact on the profession that cannot be easily dismissed, including creating biases in hiring and unnecessary divisions. (I address the bias issue in one of my posts, but other people have as well, and from different angles.) Because the process is lax it gives rise to hasty generalizations. (Just look at how you were willing to comment on Chinese Philosophy without knowing much about such programs, according to your own comments. Without strict criteria and a guide for the judges, this sort of thing is bound to happen in the PGR’s survey process.)

        Finally, I will say, I don’t think that the burden of proof is on those who are advocating stepping back and looking at the PGR’s methods and outcomes before proceeding with another set of rankings. Too many serious people with good arguments have raised hard questions about the PGR. Those who want the evaluators to proceed right now need to provide evidence that it’s a genuine good, one that outweighs potential harms, and they must do so in a non-circular fashion. Anything less can be read as special pleading.

  2. Mitchell, Thank you for the efforts to improve public information about all PhD programs. I stepped down as an evaluator in July. There are numerous problems with the PGR, not the least of which is that the evaluators are self selecting, many programs are not even among those considered worthy of ranking, and the board decides on which fields are worthy of evaluation – they recently voted that X – phi was not ready for primetime. PGR is neither science not journalism. It is backslapping.

    1. “In this connection, I cannot resist the need to mention here The Philosophical Gourmet, compiled by Brian Leiter and, hence, also known as the Leiter Report, a website that purports to provide metrics on “faculty quality and reputation” for Ph.D. granting philosophy departments in the English speaking world. (See The fact that the report has become the standard metric of overall departmental ranking, despite its claims to the contrary, in just a short time attests to the fact that metrics are needed. At the same time, it demonstrates how quickly the Internet can act in getting information out that can restructure our profession, since it is no longer uncommon to hear professionals speak of “Leiter rankings” in the hiring of faculty or to find graduate students acknowledge that the Leiter report weighed heavily in their selection of a graduate program.

      Reports such as these may present themselves as descriptive, but their use makes them prescriptive. If philosophy has anything to offer the success of our species and the well-being of its individuals, it is imperative that we represent it adequately, optimally, fairly, and accessibly. Given that how philosophy is repre-sented determines in no small measure what philosophy is, both in theory and in practice, and what it will become, attempting an accurate and adequate represen-tation might be something of a moral imperative as well, and this applies no less to metrics about faculty quality and reputation than it does to other aspects of the profession. Such reports therefore should invite caution and careful scrutiny. In the case of The Philosophical Gourmet, for instance, 95% of the advisory board for the 2009 Leiter report work at Leiter ranked schools, and 57% of them work at schools that are ranked in the top ten for their respective countries. More im-portantly, however, is that 80% of the respondents surveyed to determine the rankings have connections to schools that ranked in the top ten for their respec-tive countries, while a full 97% of them have connections to schools somewhere in the rankings. 81% of the respondents work at Leiter ranked schools, and 93% of them went to ranked schools. Furthermore, the institutions to which respond-ents are connected are distributed across the top of the rankings. I just noted that 80% have connections to top ten schools. But 15% of them have connections to second-ranked (overall) Oxford, 10% to fourth-ranked Princeton, 8% to third-ranked Rutgers and another 8% to seventh-ranked Harvard. Even though respond-ents are not allowed to rank their home institutions or those from which they re-ceived their degree, it is still quite clear that people from a handful of institutions are handing out high marks to other institutions in that same handful to the neglect of several institutions that are not presented for assessment in the first place. Lei-ter explains:

      The survey presented 99 faculty lists [from institutions up for assessment], from the United States, Canada, United Kingdom, and Australia and New Zealand. Note that there are some 110 PhD-granting programs in the U.S. alone, but it would be unduly burdensome for evaluators to ask them to evaluate all these programs each year. The top programs in each region were selected for evaluation, plus a few additional programs are included each year to ‘test the waters’. re-portdesc.asp

      Top programs are pre-selected to determine which of the existing programs are in the top. Certainly, something smells a little fishy, and even an undergraduate in a critical thinking class would be tempted to see several fallacies operating here. Notwithstanding the possibility that Leiter may have hit on a heuristic of some sort that does in fact track faculty quality and reputation across the profession, the fact remains that his anecdotal survey approach can only leave us guessing about this possibility. We need some way either to verify or debunk the report, and here is one place where Noesis may be able to help. Furthermore, given the wealth of information about the profession that is available online and what is at stake, soon there will be simply no need to fall back on such simple anecdotal measures in any case. I suspect that we will find they are not worth much, or perhaps more so, that genuine evidence may provide us with better intuitions about the profession and reshape our anecdotes to fit better with the actual state of affairs.”

    1. Oops. I’m sorry I was unclear. What I meant is: the proportion of evaluators for Chinese philosophy to experts in Chinese philosophy in Anglophone philosophy departments with grad programs is at least roughly the same as the proportion of, e.g., M&E evaluators to experts in M&E in Anglophone philosophy departments with grad programs.

      That’s what I meant, but I should admit now, since this claim is under scrutiny, that I didn’t do any research before I made it. But of the philosophers and philosophy departments with which I am familiar, I would think that there are easily 8x as many doing M&E than doing Chinese philosophy. Of course, my experience is probably not representative, and maybe my claim was incorrect. I’m sure you’ll let me know if it was.

  3. Jane, your comments seem prima facie reasonable. The problem with your argument, as I see it, is that it requires us to have an independent standard for the quality of the work available to judge whether the sloppiness is harmful or not. In the case of the PGR specialty rankings, we don’t have such a standard (yet?), and I think that means that we have to rely on methodology alone.

    (There’s also a worry here about what an independent standard would be, I think, and what question is being answered by the survey.)

    1. I can’t stress this enough – there are computer mechanisms that can allow user defined criteria to rank programs according to the needs and desires of the user. There need not be a one-size-fits-all master ranking scale. What is needed is someway to determine who is best or better at what where “what” is defined by the person using the rankings. No independent standard is necessary.

      1. As you know, I very much want to see such a search engine. Hopefully, we can make it happen.

    2. I think this is a fair argument, but I object to the premise. I think there are a bunch of pretty reliable proxies for quality and/or perceived quality: publications in top journals, how well one’s books sell, the publishers who publish one’s books, citations, just straight up evaluation of the philosophy being done. (Even when I read outside my field, I can often tell if papers/arguments have obvious holes or flaws in them. If the work done by philosophy X has obvious holes and flaws, while the work done by philosopher Y doesn’t, that’s a prima facie reason to think that philosopher X’s work is better than philosopher Y’s. Maybe it isn’t–maybe X’s work is shallow and uninteresting and unoriginal, and Y’s is deep and fascinating and groundbreaking, and I’m insensitive to that because I don’t work in their area. But it is still *some* evidence, and I think some pretty good evidence.)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s