What We Forgot About Teaching Reading Comprehension and How Remembering Can Help with LLMs

May 18, 2026

In this era when knowledge matters once again in the sense E.D. Hirsch theorized many years ago, when the loudest voices in literacy teaching and learning on the web are insisting that children must be taught to summarize passages as a way to comprehend them, it’s worth scrutinizing how teachers are being instructed to teach summarizing. We’ll begin with a focus on advice from Reading Rockets, Lexia Learning, and Timothy Shanahan.

Why These Three?

The three sources are not a random selection. I chose them because, taken together, they cover the institutional terrain over which advice to teachers actually travels in 2026. If you are a third-grade teacher in a Title I school trying to figure out how to teach summarizing, the path from your search bar to a printable anchor chart runs through one of these three kinds of places.

Each occupies a distinct position in the ecosystem, and the fact that they all say the same thing makes the convergence diagnostic rather than incidental.

Reading Rockets is the nonprofit clearinghouse funded by WETA, the Washington-area public broadcasting affiliate, with additional federal and foundation support, and it has been the closest thing American elementary literacy has to a neutral authority since the early 2000s.

Teacher-prep programs assign its strategy library. School-based reading specialists send links to colleagues. Parents land on it when they Google why their child can’t summarize a chapter. It is the public-square version of the field’s settled wisdom, presented without a sales pitch and without an individual author’s voice. When Reading Rockets characterizes summarizing in a particular way, for many teachers, that characterization carries the weight of the field’s consensus.

Lexia Learning is a commercial vendor, one of the largest providers of structured-literacy programs to American school districts, owned by Cambium Learning Group, with installations in tens of thousands of schools. Districts adopt Lexia, and Lexia’s framing of what summarizing is and how to teach it shapes the in-service training that comes bundled with the contract.

The vendor layer is where the field’s consensus gets operationalized into purchasable, scriptable, scalable products. If Reading Rockets tells you what summarizing is, Lexia tells you what summarizing is for sale as.

Timothy Shanahan is an eminent researcher. He directed reading for the Chicago Public Schools, served on the National Reading Panel, chaired the National Early Literacy Panel, and past-presided over the International Literacy Association. His blog, Shanahan on Literacy, is read by curriculum directors and literacy coaches the way clinicians read up-to-date guidelines from a specialty society.

When Shanahan endorses a practice, he is not just giving an opinion; he is signaling that the practice has research warrant. The researcher layer is where the field’s consensus gets its claim to scientific authority.

What is this online expertise actually telling teachers to do?

Digging In

Reading Rockets

Reading Rockets is the closest thing American elementary level literacy instruction has to a clearinghouse. It is funded by WETA, the public broadcasting affiliate, and it sits behind a great many in-service workshops and graduate-school syllabi. Its page on summarizing opens this way:

“Summarizing teaches students how to identify the most important ideas in a text, how to ignore irrelevant information, and how to integrate the central ideas in a meaningful way. Teaching students to summarize improves their memory for what is read. Summarization strategies can be used in almost every content area.”

The student-facing definition follows, and the move it makes is worth slowing down for: “In student-friendly terms, summarizing is telling the most important parts of a text, in your own words, in a much shorter way.” Summarizing is a thing the student does to the text. The text holds the important parts. The student finds them and tells them back, only shorter. The student does not assign importance to text parts.

Lexia Learning

The Lexia Learning version of the advice is more frankly commercial — Lexia sells a structured-literacy program to districts — but the underlying ontology is identical and, if anything, more explicit about its appeal.

Lexia presents “five summarizing strategies” as a “framework,” each with an acronym. The most prominent for summarizing narrative structures is SWBST: Somebody Wanted But So Then. The site says of it:

“SWBST is one of the most popular summarizing techniques for students, especially in elementary grades. It is easy to remember and uses an acronym to uncover key story elements so early readers and struggling readers can easily identify the main characters, what they want, and the outcome.”

The example Lexia offers is The Cat in the Hat. The student writes: Somebody — the children. Wanted — to have fun. But — their mother was gone and the Cat made a mess. So — Thing One and Thing Two helped clean up. Then — the Cat left before mother returned.

Professor Timothy Shanahan

Timothy Shanahan is the one to take seriously, because he is not a vendor and not a content site. He is one of the most cited literacy researchers in the United States. When Shanahan tells teachers how to teach something, teachers — and curriculum directors, and superintendents — listen.

Shanahan’s post on summarizing opens by chastising his own field for spending too much time arguing about whether to teach comprehension strategies and not enough on how. He affirms that summarizing has, in his words, “a big payoff,” and that “of all of the literacy activities that you could have focused on, summarizing is the most powerful for elementary students.”

He then turns directly to method. He cites the foundational research — pay attention to this because we’ll come back to it — by writing: “Brown [Ann Brown, another professor] and company reasoned [in 1983] that summarization required six basic steps.” Those steps, he explains, come down to “delete what isn’t necessary, collect into groups ideas that fit together, and then find or compose a sentence that describes the important ideas that are left.”

Three voices speaking from three institutional positions say the same thing structurally. Summarizing is a procedure, the procedure has steps, and the steps can be taught. The student becomes a summarizer by executing the steps on progressively harder texts until the execution is fluent. The teacher’s job is to demonstrate the steps, scaffold their use, and assess the product against a rubric.

Where the steps came from

The “Brown and company” Shanahan cites is Ann Brown, working with Jeanne Day, in a 1983 paper in the Journal of Verbal Learning and Verbal Behavior titled “Macrorules for Summarizing Texts: The Development of Expertise.” The rules Shanahan compresses into three are six in the original. They are worth stating in Brown and Day’s own framing, because they really are the source code for everything Reading Rockets, Lexia, and Shanahan are now telling teachers to do.

The rules are, in order:

First, delete trivial material. Anything unnecessary to understanding the passage gets cut.

Second, delete redundant material. If something has already been said, don’t say it again.

Third, substitute a superordinate term for a list of items. If the text says “robins, sparrows, and cardinals,” the summary says “birds.”

Fourth, substitute a superordinate term for a list of actions. If the text says “she got up, washed her face, brushed her teeth, and got dressed,” the summary says “she got ready.”

Fifth, select a topic sentence. If the author has already provided a sentence that captures the gist of a paragraph, use it.

Sixth, invent a topic sentence. If the author has not provided one, the summarizer constructs one.

Brown and Day discovered across a study of fifth-graders, seventh-graders, tenth-graders, and college students, that the deletion rules came early and easily, that the superordinate-substitution rules came later and unevenly, and that the topic-sentence rules — particularly the inventing one — were the latest to develop and the hardest to do well. Expertise in summarizing, on their account, looked like a developmental sequence of rule acquisition, with the higher rules building on the lower ones.

The paper was also enormously consequential. It gave the field something it had badly wanted: a characterization of what good summarizing consists of at the level of cognitive operations with developmental data showing how the operations evolve. The rules were teachable and assessable, and they looked exactly like the kind of thing instruction could target.

The six rules — sometimes compressed to four or five, sometimes dressed up with an acronym, sometimes printed on an anchor chart with cartoon illustrations — became the operational core of summarizing instruction in American elementary schools, and remain so.

Why the steps failed and continue to fail in 2026

Knowing what the mechanisms are is not the same as knowing how to teach a reader to perform them in real time, on an unfamiliar text, for reasons of her own. Summarizing is not paint-by-numbers. It is not do this, then this, then this, then voilà. The macrorules described what happens when summarizing goes well. They did not — and could not — describe how a struggling seventh-grader gets from where she is to the place where those operations happen fluently inside her own reading.

These are two different questions. The first is a question about cognition: what are the operations? Deletion and compression are reasonable descriptors. The second is a question about pedagogy: how does a person learn to do them, in the wild, on texts that matter, when no one is watching? Brown’s 1983 paper answered the first question elegantly. Palincsar’s 1984 paper with Brown was an attempt to answer the second — and the answer, when it came, required leaving the laboratory entirely.

The 1984 paper

By the early 1980s, Annemarie Palincsar was a doctoral student working with Ann Brown on a problem the field could not solve. Seventh-graders who decoded adequately but comprehended poorly had been the subject of strategy-training studies for a decade.

The students learned the trained operations but did not transfer them, did not maintain them, did not recruit them when reading on their own. The cognitive science was sound, but the instruction it seemed to point to was inert. Children executed the procedures during the lesson and abandoned them the moment the lesson ended.

Palincsar’s instinct was that summarizing had been separated unnaturally from real reading. A reader who summarizes well is not just executing the macrorules. She is also asking herself questions as she goes, noticing when something doesn’t make sense and stopping to clarify it, anticipating where the text is heading.

Summarizing is one move among several that together constitute what skilled readers do when they comprehend, and a written or verbal “summary” is an artifact of comprehension. Pulling it out of that process, holding it up as the point of reading a passage, and drilling it in isolation was, Palincsar suspected, part of why it would not stick. So she put it back in its active context. She identified four moves that skilled readers make in concert — questioning, summarizing, clarifying, predicting.

Instead of teaching the four comprehension strategies as procedures the student performed on a text, she taught them as moves in a conversation about a text unfolding as the text is processed. An adult and a student read a passage together. They took turns being the discussion leader.

Whoever was leading asked a question, gave a summary, clarified what was confusing, predicted what was coming next. Then leadership passed to the other person. That was the whole procedure. No worksheets. No graphic organizers. No anchor charts. No rubrics. Just a structured conversation about a real text, with an expert participant who could take the turns the novice could not yet take.

The theoretical apparatus Palincsar and Brown brought to this was Vygotskian, and it reorganized what was being taught. Skilled comprehension, on this view, was not a set of operations performed silently inside an individual mind. It was an internalized conversation — the kind of running dialogue a mature reader carries on with herself about a difficult text.

What does this paragraph actually say? What part didn’t I get? Where is this going? Children who could not yet hold that conversation internally needed to be hosted inside it externally, with an expert taking the turns the child could not yet take, and handing the turns back as soon as the child could take a little more.

What got internalized was not a procedure but a role — the position of the person who, at this point in the conversation, summarizes. The four strategies were not the content of the instruction. They were the recognizable utterance types that gave the conversation its structure, the scaffolding that made it possible for a novice to find her way into a discursive practice she had never been inside before.

The 1984 paper made its argument largely in transcripts. Brown and Palincsar reproduced extended exchanges between teachers and individual case-study students across days of the intervention, and the transcripts did work the prose could not. They showed the scaffolding fading in real time.

On Day 2, a student named Sara, asked to summarize a passage about why snakes are flexible, offered: “Like, if a snake is turning around, he wouldn’t break any bones because he is flexible.” Not a summary. The teacher did not correct her. She prompted the next move: “And the reason he is so flexible is...”

When Sara tried again and still missed, the teacher named what this summary would need to do — explain the mechanism — and eventually modeled one, crediting Sara for the part she had contributed.

By Day 11, Sara was producing summaries that worked. The teacher’s role had shrunk to praise. The same student, with the same cognitive equipment, had moved from outside to inside the practice.

The instructional principle Palincsar and Brown articulated was deceptively simple. The teacher relinquishes the task to the novice only at the level the novice can negotiate at this moment. As the novice becomes more competent, the teacher increases her demands — one stage further into the zone of proximal development.

The instruction is not a sequence laid out in advance. It is a responsive practice, tuned moment to moment to what the student has just produced. The teacher’s professional judgment is not the noise in this system. It is the signal. No script can replace it, because no script can anticipate what this child, on this text, in this moment, requires.

The 1984 paper made two claims, and they need to be separated because the field has mostly absorbed the first and ignored the second.

The first claim was about ontology. Comprehension is not a property of an individual mind operating on a text. It is participation in a particular kind of discourse about a text, eventually internalized as the inner conversation a mature reader has with herself. Teaching comprehension means hosting students inside that discourse, not delivering its operations to them.

The second claim was methodological, and it was the seed of everything Brown would do for the next decade and a half. If comprehension is a practice rather than a mechanism, then the research that characterizes it cannot be done in a laboratory. The phenomenon lives in classrooms, between people, over time. The researcher who wants to study it has to be there.

The Lethal Mutation

The 1984 paper was a hit. Reciprocal teaching traveled. It traveled into research labs, into journal articles, into meta-analyses, into teacher-prep textbooks, into in-service workshops, into curriculum materials, into the What Works Clearinghouse. By the late 1990s, every literacy specialist in the country had heard of it. By the early 2000s, you could buy a reciprocal teaching binder with reproducibles and a fidelity checklist. By 2010, the WWC had reviewed it and pronounced its effects “mixed.”

Mixed. After a decade of large effect sizes in the original studies and Rosenshine and Meister’s 1994 meta-analysis showing reliable gains, the federal clearinghouse that summarizes “what works” for American teachers concluded that the effects of reciprocal teaching on adolescent comprehension were mixed. The range of student-level improvement indices across the studies the WWC reviewed ran from minus 23 percentile points to plus 42.

Something had happened on the way from the 1984 paper to the 2010 review. Brown and her longtime collaborator Joseph Campione gave it a name in a 1996 chapter in Innovations in Learning: a “lethal mutation.”

As I understand it, the phrase originates with Ed Haertel at Stanford, who coined it in conversation with Brown and Campione, and the three of them put it into the literature together. It names the failure mode that besets any complex educational innovation as it travels from its developers to the field.

A mutation is a change. Most changes to an instructional design are tolerable. Some are improvements. A lethal mutation is a change that preserves the surface features of the original — the parts that are visible, namable, and easy to operationalize — while destroying the interior mechanism that made the original work. The mutated version looks like the thing, but superficially. And because it looks like the thing, the people implementing it cannot tell that they have lost what mattered.

Reciprocal teaching, as it scaled, mutated lethally. The lethal mutation is visible in three steps.

The first step was the operationalization of the four strategies. The 1984 paper had treated questioning, summarizing, clarifying, and predicting as moves in a conversation — utterance types that structured a developing discourse between teacher and student.

In the scaled versions, the four strategies became four things students do. Each got an anchor chart. Each got a graphic organizer. Each got its own worksheet. Students filled in a box labeled “Question,” then a box labeled “Summary,” then a box labeled “Clarify,” then a box labeled “Predict.” The conversation disappeared. The boxes remained.

The second step was the reintroduction of fidelity. The 1984 paper had treated the teacher’s responsiveness as the active ingredient. The whole point was that the teacher took the turns the student could not yet take and handed them back as soon as the student could take a little more.

This required moment-to-moment judgment that no protocol could specify. In the scaled versions, the teacher’s improvisational responsiveness was treated as the problem. It made the intervention unreproducible. So curriculum designers replaced it with a script.

The teacher now had a printed lesson plan with sentence stems. The students had role cards: Questioner, Summarizer, Clarifier, Predictor. The roles rotated on a schedule. The teacher’s job was to ensure that the rotation happened and that each student executed the assigned role.

The third step was assessment. The 1984 paper had assessed the intervention by showing students moving from outside a practice to inside it — Sara on Day 2 unable to summarize, Sara on Day 11 producing summaries that worked. In the scaled versions, assessment moved to product. Was the summary correctly formed? Did the question target a main idea? Was the prediction justified by the text?

A rubric scored each move. The student who hit the rubric was a successful reciprocal-teaching participant. The student who didn’t was sent for remediation.

By the end of the three steps, what was happening in classrooms labeled “reciprocal teaching” had no functional resemblance to what Palincsar and Brown had done in 1984. The surface features were all present. The four strategies were taught and practiced. Students took turns. The teacher modeled and faded. The lesson plan said “gradual release of responsibility.” A walk-through observer with a clipboard could check every box.

But reciprocal teaching was gone. Instead of a conversation, there was a sequence of role performances, evaluated against a rubric. There was no responsiveness, because the teacher was following a script. There was no internalization of a discourse the student was being inducted into, because there was no discourse, just a set of moves to perform when one’s turn came up.

The reciprocal teaching being implemented was a fluent execution of the observable features of the 1984 study, conducted in the complete absence of the thing the 1984 study was about.

This is what makes the mutation lethal rather than merely incomplete. An incomplete implementation of reciprocal teaching would produce smaller effects than the original. A lethal mutation produces no effects, because the active ingredient has been removed and replaced with something that looks like it.

The WWC’s “mixed effects” finding is what a lethal mutation looks like in the aggregate data. Some classrooms still had teachers doing the responsive work the original required, and those classrooms got results. Other classrooms had teachers faithfully executing the scripted version, and those classrooms got nothing. The two populations averaged into “mixed.”

Every educational innovation that requires teacher judgment as its active ingredient is vulnerable to the same mutation, because the same forces drive it. The forces are not malicious, and a non-expert may not even notice the switch up.

An innovation that works has to scale, scaling means going to teachers who were not part of its development, those teachers need materials, materials require specification, specification favors the observable over the interior, the observable features get codified, the codification becomes the program, and the program is what fidelity is measured against.

At every step, the people involved are doing their jobs competently. The lethal mutation is the cumulative consequence of competent work in a system that cannot encode what made the original work.

The macrorules of 1983 were never at risk of lethal mutation, because the macrorules were operationalizing a cognitive mechanism. There was nothing to lose. The mechanism was the procedure. When Reading Rockets and Lexia and Shanahan recommend teaching the macrorules in 2026, they are recommending a thing that survives scaling because there is nothing in it that scaling can destroy.

Reciprocal teaching had something to lose, and the field lost it. What got scaled, under the name “reciprocal teaching,” is closer to the 1983 macrorules with four operations instead of six. The students fill in four boxes instead of one. They produce four observable artifacts per session instead of one summary.

The walk-through is easier to conduct, because there are more checkboxes. And the active ingredient — the responsive hosting of a novice inside a discourse she is gradually internalizing — is awol.

Brown and Campione understood this clearly by the mid-1990s, and the lethal-mutation concept was their attempt to name what they were watching happen to their own work. It is also, not coincidentally, the diagnosis that pushed Brown into the methodological territory that became design-based research.

If your intervention can be lethally mutated by the system that scales it, then the unit of analysis for understanding educational change cannot be the intervention alone. It has to include the system. The researcher cannot just hand off the design and walk away. The researcher has to stay in the classroom, watch the design encounter the conditions of real practice, and study the encounter itself.

Reciprocal Teaching and LLMs

Teachers who want to help students find value in LLMs without surrendering their agency to them might experiment with some version of what Palincsar and Brown built in 1984. Not the scripted version. The version with the conversation still in it.

The moves transpose almost without modification. First, the LLM becomes an object of conversation, just another text, and the point is to get better at making it work in useful and interesting ways. Second, the teacher is not the source of the right answer, but a participant in the discourse. Third, the learner(s) takes the lead and has authority to ask the questions and judge the output. Fourth, there is no rubric; there is free-flowing discussion.

Question. What do we want from the model? Not “what prompt should I type” but the prior question — what are we actually trying to find out, make, decide, or check? The prompt is the artifact. The question is the move. A student who can articulate what she wants before she asks for it is a student who can evaluate what comes back.

A student who cannot is a student who will accept whatever the model produces, because she has no internal standard against which to judge it. Writing the actual prompt comes after the question and is evaluated against the robust discussion that preceded it. Students can predict how they think the LLM might respond.

Summarize. What did we get, and what does it mean? Not “did the model answer the question” but what the response actually says, in the student’s own words, with the parts that matter foregrounded and the filler set aside.

This is exactly the move Sara was learning to make on Day 11 — except now the text is a model output, generated three seconds ago, and the summary is a check on whether the student understood what the model just told her. The summary is diagnostic. If she can’t produce one, she didn’t understand the output, and any decision she makes downstream of it is uninformed.

Clarify. Where does the output make sense, and where does it not? Which parts are claims the student can verify against what she already knows? Which parts are claims she has no way to check? Which parts contradict each other? Which parts are hedged in ways that mean the model isn’t sure?

Which parts are stated with confidence the student has no reason to share? The clarifying move is where the student’s own knowledge enters the loop. It is also where the student notices that she does not know something and decides whether to find out.

Predict. What if we ask it this way instead? What if we push back on this claim? What if we ask for sources? What if we ask the same question in a different domain to see whether the answer generalizes? The predicting move is what turns a single exchange into an investigation. A student who can predict how the model will respond to a variation has begun to model the model, that is, to develop a working theory of what it does well, what it does badly, and where its outputs need to be treated with care.

And then question again. Refine the prompt, or abandon the line of inquiry, or pivot to a different source entirely. The cycle closes and reopens. The student is now doing with an LLM what a mature reader does with a text: holding a running conversation with it, asking herself what it just said, noticing when something doesn’t fit, anticipating where the exchange is going, deciding whether to continue.

Notice what this is not. It is not a prompt-engineering curriculum. It is not a set of rules for using AI responsibly. It is not a worksheet with boxes labeled Question, Summarize, Clarify, Predict. The moment it becomes any of those things, it has been lethally mutated, and we are back at the macrorules with a different label on the anchor chart.

The active ingredient is the teacher’s responsiveness. No script can specify when to push a student to clarify, when to let a confident-sounding wrong answer ride for a minute to see if a classmate catches it, when to take the turn herself to model what skeptical engagement with model output looks like.

The teacher who can do this work is doing the same work Palincsar’s teachers were doing with Sara in 1984 — hosting a novice inside a discourse she is gradually internalizing, increasing the demands one stage further into the zone of proximal development as the novice becomes capable of more.

This version will not arrive in a binder. It will not be scriptable. It will not be evaluable on a walk-through. It will require teachers who are confident enough in their own judgment to host a conversation whose content they cannot fully anticipate, and school systems that recognize that confidence as the professional knowledge it is rather than as the noise it has been taken to be for the last twenty years.

The teachers who are waiting for the binder will not get one. The teachers who recognize what Palincsar and Brown were actually doing in 1984 already have everything they need. The pedagogy is forty years old. The text it is being used on is new. The work of hosting students inside a thoughtful conversation about something difficult has not changed at all.

Share Learning to Read, Reading to Learn

References

Brown, A. L. (1992). Design experiments: Theoretical and methodological challenges in creating complex interventions in classroom settings. The Journal of the Learning Sciences, 2(2), 141–178.

Brown, A. L., & Campione, J. C. (1996). Psychological theory and the design of innovative learning environments: On procedures, principles, and systems. In L. Schauble & R. Glaser (Eds.), Innovations in learning: New environments for education (pp. 289–325). Lawrence Erlbaum Associates.

Brown, A. L., & Day, J. D. (1983). Macrorules for summarizing texts: The development of expertise. Journal of Verbal Learning and Verbal Behavior, 22(1), 1–14.

Hirsch, E. D., Jr. (1987). Cultural literacy: What every American needs to know. Houghton Mifflin.

Lexia Learning. (2026, March 31). Summarizing strategies for student reading comprehension. https://www.lexialearning.com/blog/summarizing-strategies-for-student-reading-comprehension

Palincsar, A. S., & Brown, A. L. (1984). Reciprocal teaching of comprehension-fostering and comprehension-monitoring activities. Cognition and Instruction, 1(2), 117–175.

Reading Rockets. (n.d.). Summarizing. WETA. https://www.readingrockets.org/classroom/classroom-strategies/summarizing

Rosenshine, B., & Meister, C. (1994). Reciprocal teaching: A review of the research. Review of Educational Research, 64(4), 479–530.

Shanahan, T. (2019, July 13). How to teach summarizing, Part I. Shanahan on Literacy. https://www.shanahanonliteracy.com/blog/how-to-teach-summarizing-part-i

Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes (M. Cole, V. John-Steiner, S. Scribner, & E. Souberman, Eds.). Harvard University Press.

What Works Clearinghouse. (2010, September). Reciprocal teaching (WWC Intervention Report). U.S. Department of Education, Institute of Education Sciences. https://ies.ed.gov/ncee/wwc/

Matt Renwick

17h

Terry, you have taken one particular approach to teaching/reading and demonstrated how instructional practices in general are difficult to spread and scale without losing its effectiveness. Good food for thought for teachers and leaders.

1 reply by Terry Underwood, PhD

Sam Wineburg

Dear Terry, What clear and thoughtful writing. I wish more academics could write penetrating syntheses of research, pulling out the most important points, as you have done here. I worked with Ann both on the here Community of Learners work (and classrooms that used reciprocal teaching) as well as on the "How People Learn" committee she chaired. You explain the idea of lethal mutations better than I've ever come across. Bravo!

7 more comments...

Learning to Read, Reading to Learn

Discussion about this post

Ready for more?