Harvard, If You're Listening...

May 30, 2026

Grade inflation is among the oldest running complaints in American higher education, and for nearly as long, it has been understood as a problem concentrated at elite private universities. Data assembled by Stuart Rojstaczer and Christopher Healy in 2011 and updated in 2016 drawn from hundreds of institutions document the ongoing problem. Rojstaczer, who made a career researching this issue, speculated about the reasons for this phenomenon on his website:

“Faculty attitudes about teaching and grading underwent a profound shift that coincided with the Vietnam War. Many professors, certainly not all or even a majority, became convinced that grades were not a useful tool for motivation, were not a valid means of evaluation and created a harmful authoritarian environment for learning.”

Harvard has long occupied a seat at the center of the story. In 2001, the Boston Globe reported that more than 90 percent of its students graduated with honors, and an attempted correction left the mean GPA essentially unchanged at about 3.41. Rojstaczer, who studied grade inflation for 35 years, reported that Harvard’s 1999 average GPA was 3.42.

The trend has only steepened in the 2020s. At Yale, a faculty report found that 78.97 percent of undergraduate grades in 2022–2023 fell in the A-range, with the mean GPA rising to 3.70. At Brown, 67 percent of grades were A's in 2020–21, up from 39 percent in 1993. And at Harvard, Undergraduate Dean Claybaugh’s (2025) internal report found that 60 percent of grades were A's by 2024–25, up from 25 percent two decades ago, the finding that prompted the cap on the issuance of A grades in undergraduate courses to be implemented in Fall, 2027.

Some in the media think that concerns about the high density of A’s at Harvard are foolish because they believe Harvard students are high-achievers who merit high grades as proven by their high school records. Others believe this assumption is complicated by K–12 grade inflation and declining academic standards as measured by standardized tests.

The 2024 Nation’s Report Card found that twelfth-grade average reading scores fell to their lowest level on record—3 points below 2019 and 10 points below the first such assessment in 1992—even as more seniors than before reported being accepted to a four-year college while fewer were apparently academically prepared for it.

The decline, however, was not evenly distributed: scores fell across most of the distribution but held steady among the highest-performing students at the 90th percentile, even as a record 32 percent of seniors scored below NAEP Basic in reading. It’s likely that Harvard enrollees are still among the strongest readers available.

These are national figures, keep in mind, not a measure of Harvard’s particular intake; they describe the pipeline, not the matriculants. Indeed, Harvard’s own report gestures at the same erosion, observing that undergraduates “struggle with readings that students completed with ease just ten years ago.” Perhaps.

Much of the argument among professors about the wisdom or the folly of “fixing” the grade inflation problem by raising the bar on what it takes to earn an “A” in college is shrouded in emotion. Faculty discussions often spill over into online commentary. One threaded discussion online from the mid 2010s, which I selected because it invited contested viewpoints and heated debate, I use here to illustrate this point. I make no claim that it is representative of the larger debate beyond its emotional tenor.

The reasoning is personal and affective throughout, and the nostalgia register dominates. One commenter recalls his motto at his “academically demanding private college in the early 80’s”—”Two-point-oh; good to go!”—and contrasts it with students who feel they’ve “failed their professor, their parents and themselves” for earning less than an A. Another professor reminisces that in “the distant past” only “the top 2–3 students” earned an A, treating his own memory of a bygone norm as the standard betrayed.

The resentment register is equally pronounced. The framing of students as entitled—that “almost every student believes they can ‘earn’ or ‘deserve’ an ‘A’”—carries evident irritation, and one commenter’s image of the indulged undergraduate (”An A for you! And an A for you!”) is pure affect doing the work an argument should.

Crucially, the one commenter who argues from a criterion—a strong student advocate, who insists that a student meeting the syllabus’s stated requirements has earned the A and that calling this inflation is the instructor confessing he set the bar too low—is treated as naive and brushed aside. The norm-referenced nostalgia carries the day not by evidence, but by shared feeling.

And where data surfaces, it’s deployed casually and then overridden by sentiment. When one commenter pulls actual Maryland accounting figures (57% A’s in one course), another simply declares the number “WAY too high”—a verdict with no benchmark, no account of the work, just the gut sense that it feels like too many. Even Rojstaczer’s own careful findings enter the thread mostly as fodder for the participants’ prior convictions about whether students are smarter or lazier.

At some point, university professors may be forced to confront the core problem without emotion: The letter grading system is irreparably broken as it currently functions for the same reason the standardized test system is broken, and neither is showing signs of revitalizing education in America.

For that, professors need criterion-referenced assessment designed and implemented locally.

The Solution Dean Claybaugh is Circling

Dean Claybaugh at Harvard deserves more credit than the headline reforms suggest. Buried in her Re-Centering Academics (2025) report are the pieces of a far more ambitious and promising solution than the simple mechanical cap on A’s, a solution the report circles without quite naming. If the problem is as serious as Claybaugh argues it is, the fact that a genuine solution may require reallocation of funds shouldn’t be a problem.

Her report documents that grade inflation is driven by the instructor’s position rather than by qualities of students’ work. The Q-score pressure from student evaluations of teachers, the low course enrollment anxiety teachers experience, the junior faculty member who cannot grade honestly without losing students will either intensify or go underground and fester as the capped A game unfolds.

The report concedes that the standard itself has gone unspecified for students, observing that students “don’t know what constitutes ‘excellent’ or ‘extraordinary’ work in a discipline” and that only one surveyed student spoke of grades in terms of mastery at all. Dean Claybaugh is close to another truth: Faculty themselves may not have a clear image of what extraordinary looks like.

The report already endorses distributed, calibrated grading, noting approvingly that some courses have teaching fellows “grade exams collectively” or “norm their grading on papers,” and that such strategies “insulate TFs from the complaints of students disappointed by their grades.” And it assigns the Bok Center for Teaching and Learning a major role in “the process of course and assessment design.” The Bok Center may need additional funding to carry the load.

Let me stipulate at the outset what the fix I propose is not. It is not ungrading. Relying on a second sense about faculty views on grades I developed through my job as a university assessment coordinator at a state university in California, I feel confident in saying that Harvard is not ready to leap the chasm from grading to ungrading, virtuous as that leap might be, and it may never find the opportune time to completely shake its addiction to having to be better than Yale.

The pragmatic path is not to abolish the grade, but to change what the grade refers to, to keep the letter while shifting its purpose from ranking students against one another to certifying their performance against a clearly specified standard. The grade stays; its referent moves from scarcity to mastery. Who cares if everybody gets an “A” that was demonstrably earned?

Here is my recommended mechanism, plainly. I give Harvard permission to use this idea without personal remuneration nor even citation.

Every General Education course carries a capstone assessment—not a separate course, but a required component within each GE course. The capstone is not an exam. A timed written or oral examination, however rigorously graded, still models learning as performance; it asks what a student can produce under pressure, in a quiet room, proctored, on a single occasion, from memory.

That is precisely the century-old conception of learning that has been roundly indicted by learning science—the mind as a bank to be filled and drawn down on demand. To certify mastery by written or oral examination would reproduce the most egregious unintended consequence of grading at the very moment of trying to correct it.

And in the age of large language models, the timed exam is the format most warped by AI panic, herding faculty back toward a surveillance-and-recall regime that crowds out the learning it purports to measure.

The capstone is instead an occasion to learn; drawing on the substance of the course, a student researches a question, an idea, or a problem in greater depth, producing creative multimodal work that is itself an act of inquiry.

We keep faith with Dewey who believed that assessment ought to be a learning experience rather than a verdict appended after it, and with the criterion principle, since the standard is what the work reveals about the student’s grasp of the course core content, judged against a public rubric and not against the performance of other classmates.

The capstone is heavily weighted in the final-grade formula—weighted enough that it, and not the teacher’s accumulated coursework marks, effectively determines the grade. And it is not graded by the teacher of record, nor by the teaching fellow of record. This is the heart of the design.

Because the decisive component of the grade is assessed by someone with no Q-score and no enrollment stake in this particular student, the inflation engine the Claybaugh report identifies is severed at its source. Her report itself refers to my mechanism when it observes that collective grading “…insulate[s] TFs from the complaints of students disappointed by their grades.” This proposal takes that insulation and makes it structural, moving it from within a single course to across the General Education curriculum.

The raters are the center of gravity of this whole proposal, and they must be treated as such—not as an administrative afterthought, not as adjunct labor scraped together at the end of term. They are the people who will pull Harvard back from the cliff the cap on A’s is walking it toward. The fallout after this cap experiment is going to be brutal, mark my words.

They should be faculty who have taught the course before, or who at minimum hold genuine expertise in its learning outcomes; only such a person can judge inquiry against a standard rather than against a checklist. They should be a stable corps, not a rotating convenience, with the rating work written into their assigned workload and recognized as real academic labor rather than donated time—money well spent to solve the problem graduate admissions officers are complaining about.

And they should debrief at the end of each semester—convening to compare what they saw, to surface where the standard held and where it frayed, and to report back to the General Education faculty what the student work revealed about the teaching, the rubric, and the courses themselves.

So they become more than graders. They are the institution’s eyes on its own standard, the feedback loop by which a criterion stays meaningful instead of drifting, and by which the GE faculty learn, semester over semester, what their courses are actually producing. A reform that buries these people in a footnote will fail. A reform that honors them and uses them is the reform.

The standard they grade against is a shared GE rubric, written and calibrated collectively by faculty working with the Bok Center’s facilitation role, making possible harder, more valuable work than any grading logistics. It must be public and prior, visible to students before they begin, so the capstone is something they can aim at rather than a complication sprung on them after the fact. That visibility is what makes it a standard to be met rather than a cohort of peers to be beaten in competition.

A still richer version of the regulatory artifact replaces the single capstone with an electronic portfolio. Across the course, the student assembles work and reflects on their own growth, so that the record captures the studying-in-progress—the verb ‘to learn’—rather than the frozen noun of a final letter.

Scaled to General Education, this portfolio becomes an architecture students rely on to create coherence in their GE work. A GE portfolio with a compartment for each course means that on completing the requirement a student holds onto two things: the transcript, and an online repository of their actual work across every GE course.

The transcript abstracts; the portfolio shows.

An admissions committee can read what a student can do in their own words, instead of inferring it from a letter whose meaning is doubted. The signal-collapse problem the Claybaugh report worries over becomes the enriched signal. Where Harvard proposes to restore the signal by making the letter scarce again, the portfolio makes the evidence itself available, so the letter no longer has to carry the whole communicative burden alone.

The portfolio is the criterion made visible.

Alfie Kohn, whom some credit with fomenting the ungrading movement, endorses portfolios, but only when they replace grades rather than generate them, warning that a portfolio “…dominated by worksheets so that every portfolio looks the same” is worthless.

My design threads that needle. The portfolio illuminates a criterion-referenced grade rather than serving as a device for manufacturing one, and its value depends on the GE standard specifying genuine inquiry rather than uniform deliverables. Named openly, that distinction removes the powerful objection that portfolios-plus-grades, portfolios-for-show, combine the weaknesses of both.

For evidence that criterion-referenced external grading of extended inquiry can work at scale, there is one clean precedent: the International Baccalaureate. Its moderation system grades the Internal Assessment—an extended research investigation, not a timed exam—by having the teacher mark first and an external examiner check the marks against an absolute standard. If the financial cost of dedicated faculty assessors is too great, something like IB moderation could work.

The shape of the alternative is now clear. Ungrading has the right anger but the wrong reality; Harvard kept the grade and doubled down on it. The third path keeps the grade and changes its referent—certifying mastery against a public standard, through inquiry rather than examination, judged by a stable corps of expert raters who answer to the standard rather than to the student, with the student’s actual work preserved for anyone who wishes to see it.

Dispersion of grades across the range, if it comes, arrives as a symptom of a demanding standard reliably applied to a valid culminating task, never as the goal. The aim is not to make the A rare. It is to make it mean something again and to let the work itself say what.

Share Learning to Read, Reading to Learn

Learning to Read, Reading to Learn

Discussion about this post

Ready for more?