Grade Inflation

Norm- verses Criterion-Referenced Grading

Norm-referenced grading measures performance relative to other students; consequently only a few students receive 'A's regardless of how many others successfully complete assignment.

Problems with Norm-referenced Grading

  • Norm-referenced grading tells us nothing about standards. Since students are ranked from best to worst, the top student receives the A+, even if s/he only achieved 35% in the course. (For example, the only reason I passed my initial statistics course was that the instructor 'curved' the grades, so I saw my failing grades turned magically into a pass, because the other students were even worse than I. But the truth is, I had mastered frieghtenly few of the skills until I retook them in grad school...)
  • Students' actual achievement may not be acknowledged We are all familiar with the horror stories of students receiving 95% on an examination but nevertheless being ranked a 'B' because the A+ went to the student who received 99% and the "A"s all went to the five students who got 97s and 96s. Since it is highly doubtful that the assessments designed by faculty (few of whom have any formal assessment training) can be considered statistically reliable, the differences between 97% and 95% are not meaningful. The grading here therefore reflects luck rather than achievement.
  • Students do not know where they stand until the course is over. If students receive 90% on every examination, and so reasonably conclude that they are doing well in the course, they may still end up with a "D" if it turns out that others received yet higher grades. This is particularly problematic for students trying to priorize where to put their energies during examination week, etc.
  • Norm-referencing encourages poor evaluation practices. As long as instructors receive a normal distribution of grades, they believe their assessment is accurate, but the truth is, one could as easily rank order students using their locker numbers as with some of the examinations I have reviewed in my research on post-secondary evaluation.
  • Norm-referencing encourages adoption of the talent-hunt model. In the talent-hunt model of evaluation, the goal is to identify the one or two students per semester with the potential to go on to become a professional _____ (fill in the blank with any hard science or esoteric specialty), and the rest are regarded essentially as chaff. Consequently, these instructors often include 'trick questions' on their examinations that the majority of students have no opportunity of answering (because they require out of course knowledge), but which the instructor feels justified including as long as one or two 'specially talented' students are able to answer. Unfortunately, the research suggests that these students are merely luckier, not more talented, and the approach violates basic principles of fair and accurate assessment. Nevertheless, instructors who subscribe to the talent-hunt model often appear to take great pride in the numbers of students they fail, believing that this demonstrates their high standards. In reality, the approach usually results in lower standards, because the pool of recruits is too quickly diminished, the majority of students being actively discouraged from pursuing further studies in the field. If one's job is to teach students a particular subject, and one is given a class of students who meet the enterance requirements for that course, the question must be why one has failed to bring the majority of qualified students up to the required standard.

Criterion-Referenced grading measures performance against defined criteria, so as many students as successfully meet criteria may achieve 'A's.

Problems with Criterion-referenced Grading

  • Standards may be set too low If everyone in the class is receiving an A+, then the bar may be set too low (though the criterion should ultimately reflect required professional standards). The toughest requirement for a Canadian drivers' license, for example, is the ability to parallel park. In Sweden, applicants have negotiate specially designed skid pads on the test road that simulate severe black ice conditions.
  • Criterion-referenced assessment can encourage "template learning". Since good criterion-referenced assessment requires explicit targets, poorly designed rubrics can sometimes become templates that weak students complete with little thought, effort or ownership. (This is not a problem with well designed assessments.)
  • Criterion-referenced assessment may inflate grade labels such that 'A' becomes a minimal pass. In some subjects, students may either "get" a skill or concept, or not. Such courses should be pass/fail, but the need to generate grades for competitive scholarships, graduate faculties, and jobs means that few instructors are willing to make their courses pass/fail. Consequently, 'A' becomes a 'pass'.


© Robert Runté 2005. This site last updated: May 3, 2005