On student evaluations of instructors

I have always been a supporter of student evaluations of instructors. I consider mine public property, I worked for more consistency across campus in their treatment, and I pledged my support publicly (in a letter published in the Imprint) when a Federation of Students executive member tried to arrange for them to be publicly available. Nonetheless, I think they are only one aspect of instructor evaluation, and one which in its current implementation is not as effective as it could be.

Evaluations in the Math Faculty started out, in the mid-'70's, as a MathSoc/mathNEWS offshoot called the Anti-Calendar, or AntiCal for short. This was a survey taken by setting up a table and having students fill out forms, published in a format similar to mathNEWS. It published both numerical ratings and anonymous comments, and some students started making theirs more outlandish in a bid to get published (and, perhaps, to inflict more damage on certain professors). The targets of such comments threatened to sue for libel, and tempers rose.

The Dean's Office stepped in to mediate, and suggested to MathSoc that they could take over administration of the evaluations, keeping the potentially libellous comments private for the instructor, but making the numerical results public, and (probably the most convincing argument they made) having the numbers play a role in the tenure and promotion process. ``Public'' in this case means there is a printout of reams of numbers available in the MathSoc office and in the Dean's Office. I have stopped suggesting to every new Associate Dean that this information be put on the Web or otherwise more widely disseminated, and stopped suggesting to every new MathSoc regime that they just scan the printouts, put them through OCR, and put the results on the Web. I have also stopped suggesting to MathSoc that they negotiate to have a high-quality long-term summary of evaluations with suitably chosen comments (such as the Harvard Course Evaluation Guide) published. No one seems that interested in changing the status quo.

So what are we left with? Mark-sense forms filled in by students in the second-last week of class, with written comments on the back. The quality and quantity of these comments have been going down from year to year, as anyone who cares to paw through my accumulated pile going back to 1988 can verify. I get a small number of actually useful comments, a few more hastily dashed-off ones like ``Good job'' or ``Lectures were boring'', and a lot of completely blank ones. (I don't get the comments from this term's class until some time in January, so I don't know whether or not they've bucked the trend.) My numbers, in the crucial ``Rate the overall effectiveness of this instructor'' have ranged from a high of 1.05 the first time I taught CS 492 to a low of 3.41 in one section of CS 341 in fall 2002 (as opposed to 2.30 in the other section which I also taught), the latter almost entirely due to my refusal to adjust the midterm marks right away, instead of after the final exam. (The average rating on this question for CS instructors is something like 2.3.)

And do the numbers really have any effect? For tenure and promotion decisions, the numbers are certainly there in the dossier, but so is a lot of other stuff, much of which comes into play. My sense of it is that bad numbers may pull a marginal case down, but that good numbers aren't going to pull a marginal case up, and bad numbers aren't going to affect a really good case (in research terms, which is where the weight of the decision really lies). For annual reviews, the numbers are used by the chair/director, in a vague and nonformulaic sense, to come up with a numerical rating between 0 and 2 (of which the only permitted values are multiples of .25) for teaching, which is then averaged out for ratings for research and service to come up with a weighted share of the Faculty's merit pool. Conclusion: the numbers have a little effect, not much.

That's probably not something we want to increase without also addressing some of the concerns with student evaluations. The optimum instructor strategy for maximizing evaluations is probably to teach just enough, and design tests just nontrivial enough, so that students don't feel their time and money is being wasted, but not so much that they feel their averages are being threatened or they aren't being treated fairly with respect to other students (as in ``It's not fair to us that they learned less than we are learning''). There are studies showing that attractive instructors score higher, and that men and women are treated differently. Women who are perceived as being ``nurturing'' or ``caring'' score higher; men who project authority (through dress or manner) score higher. (As someone who spends a lot of time trying to deconstruct authority, and deny its automatic conferral, this does not make me happy.)

A student evaluation done in the second-last week of class should be only one aspect of instructor evaluation. There should be peer review (instructors assessing other instructions), graduation interviews, and alumni interviews. Students need to be given more information on roles and responsibilities of instructors, and what to look for in an instructor; above all, they need to be given the sense that this is an important responsibility that they should take seriously, and that can't be done without convincing them that what they're doing really matters. All of this takes work, and thought, and actual change in the way people think. It's simpler to just hand out forms, churn out a bunch of numbers, and argue that this provides accountability.

But at least we have that much. In the largest faculty on campus, the Arts Faculty, evaluations are optional at the discretion of the instructor, and the Faculty Association (an organization which I support, but which unfortunately has to represent the views of faculty members even when they are idiotic) will fight hard to prevent any increased weight being given to evaluations. So students have to get their information where they can, which usually means through street talk and unsubstantiated rumours, or commercial sites which are of even lower quality and responsibility than the Anti-Cal was so many years ago. --PR

(Adapted from a blog posting made November 21, 2003.)