Generative Conjectures: Learning Science and Teaching Our Children Well

May 19, 2026

There is a conflation in much public talk about education research, a conflation of two research traditions. The Science of Reading’s argumentative grammar operates on pipeline logic — inputs (phonological awareness), throughputs (grapho-phonic analysis), outputs (oral fluency): develop these and children acquire the capacity to read a language with an alphabetic script. One would be on shaky ground to argue otherwise.

The cognitive and neural mechanics of reading are real, I’m convinced, the evidence is good even without brain imaging, and the Science of Reading is warranted to make claims about their importance in reading theory. Children need to be able to decode words. But evidence about what happens in the brain cannot, on its own, warrant claims about what should happen in the classroom.

Failure to respect the limits of cognitive science has created a cargo cult. Wearing the robes of cognitive science — citing the right studies, performing the right rigor against gold-standard p-values — and assuming that the authority of brain-level evidence transfers to classroom-level claims it was never designed to warrant, the SoR movement sold scripted phonics programs that now consume the better part of first grade in some states, while the public reads NAEP scores as the verdict on whatever policy was last enacted.

No well-prepared reading teacher skips phonics in first grade. The question is not whether to teach phonics. The question is how — under what classroom conditions, with what task and participant structures, supported by what discursive practices, sequenced over what trajectory, for which children in which settings?

Those questions are pedagogical, not cognitive, and the evidence that warrants answers does not come from brain imaging or laboratory studies of word recognition. It comes from a different research tradition entirely. Unfortunately, the Science of Reading has captured that ground with state-level legislation.

Learning science is that tradition. It is ecological, organic, committed to generative designs in real classrooms — committed, that is, to the proposition that classrooms are sites of cultural reproduction and contestation, and that any intervention lives or dies by how it thrives in that soil.

The school-to-prison pipeline and deficit framings of “remediation” are not bugs in such a soil; bugs would be good. They are the soil. Design-based research emerged in the early 1990s as a methodological response to this problem, and over three decades it has developed a stringent research methodology.

The Science of Reading makes classroom prescriptions that only design-based research is equipped to warrant. The root issue is not experimentalism; empirical evidence is necessary in the lab and in the classroom. The issue is the alignment of claims to warrants. It’s one thing to claim that children with weak phonemic awareness co-occur with difficulty in learning to read. It’s another to say, “Do this phonics activity every day in your classroom for thirty minutes.”

Three essays published in The Journal of the Learning Sciences anchor this welcome tradition, which has emerged into the mainstream in recent years. Allan Collins (1990), working in parallel with Ann Brown, names the unit of analysis — the design experiment — and identifies the variables, the grain sizes, and the theoretical ambitions that distinguish principled classroom research from the ad hoc innovation studies that preceded it.

Daniel Edelson (2002) refines the question by specifying what such research produces. Learning science generates both knowledge about how classrooms work and guidance about how to design learning environments, and Edelson insists these are different kinds of claim with different evidentiary requirements — descriptive domain theories on one hand, prescriptive design frameworks on the other.

William Sandoval (2014), working from Kelly’s (2004) critique that design research lacked an argumentative grammar, makes the inferential structure of the field explicit through conjecture mapping — design conjectures linking embodied features of a learning environment to mediating processes, theoretical conjectures linking those processes to desired outcomes. Read together, the three articulate a tradition that holds its claims to their warrants and promises improvements in pedagogy that are both scalable and sustainable.

Methodological Humility

Collins’s 1990 technical report — the precursor to the published 1992 chapter — does the founding work for design-based research by naming what had not yet been named. Before 1990, classroom studies of educational innovations existed in abundance, but they shared a set of problems Collins is unsparing about.

They were conducted by the designers of the innovations themselves, who had a stake in seeing their work succeed; they typically tested a single design rather than comparing alternatives; they reported significant effects without specifying the conditions under which those effects held; and they proceeded without theory, leaving their results “…uninterpretable with respect to constructing a design theory.”

What Collins proposes is a “systematic methodology” (p. 5) — a way of conducting design experiments in live classrooms that would produce findings worth aggregating across studies. Evidence-based research gains in power when many researchers are deploying the same methodology.

Collins separates the design experiment as a methodological instrument from the theoretical insights those experiments are meant to build. A “design theory” (p. 5) should identify the variables governing the success or failure of an innovation and specify their critical values and combinations. The methodology produces the theory; the theory is not the methodology. That distinction sounds elementary but it is subtle — the SoR routinely treats the randomized trial as if the experimental method itself is a theory of reading instruction.

Two of Collins’s methodological commitments deserve particular attention because they correct the failures he names. The first is “working with teachers as co-investigators” (p. 5), not a respectful gesture toward inclusion, but a methodological corrective. Teachers know things about their classrooms that researchers do not, and a study that ignores that knowledge cannot characterize the conditions under which an innovation works.

The second commitment follows from the first: the study must be conducted with “no vested interest in the outcome” (p. 5). Designers studying their own innovations are structurally incapable of seeing what they need to see. Collins’s methodology asks the researcher to relinquish the position of advocate and take up the position of investigator.

The most consequential concept in the report, for our purposes, is “grain size” (p. 7). Collins distinguishes among studies conducted at the level of the classroom, the grade, the school, and the district. This distinction determines which variables a study can even examine.

At the classroom level, a researcher can manipulate the technologies in use, the configuration of equipment and materials, the roles students and teachers play, and the organization of time. At larger grain sizes, other variables become available — variables Collins names directly: “cooperation between teachers, length of class period, peer tutoring across grade levels, relations of community to school” (p. 7).

“Relations of community to school” is the soil. It is what determines whether a phonics program designed at the curriculum-developer’s desk will survive contact with a particular neighborhood, a particular history of mistrust, a particular set of families’ relationships to the institution asking their children to perform decoding fluency on demand.

Collins acknowledges, almost in passing, that this variable exists and that it matters — and that classroom-level studies cannot reach it. The methodological humility is striking and, in retrospect, prescient. Collins is telling us in 1990 what the design-based research community would spend the next thirty years working out: that classroom-level studies cannot address the conditions that determine whether innovations live or die.

Different Sorts of Claim

Collins promised a design theory, but Daniel Edelson (2002) disaggregated it. Design research, he argued, produces three different kinds of theory, each tied to a different element of the design process, and each making a different kind of claim. Two of the three matter most for our argument.

The first is the domain theory. Edelson defines the domain theory precisely:

“A domain theory is the generalization of some portion of a problem analysis. Thus, a domain theory might be about learners and how they learn, teachers and how they teach, or learning environments and how they influence teaching and learning… Even though a domain theory in design research is developed through a design process, it is a theory about the world, not a theory about design per se. As such, it is descriptive, not prescriptive” (p. 113).

A domain theory generalizes from a particular problem analysis and tells us how some portion of the world is. It does not tell us what to build.

The second is the design framework, which is “a generalized design solution” for a problem in a domain. Although domain theories are descriptive, design frameworks are prescriptive: they specify the characteristics a designed artifact must have to achieve a particular set of goals in a particular context.

A design framework is “a collection of coherent design guidelines for a particular class of design challenge” (p. 113). The framework is not a recipe. It is a coherent set of guidelines for a class of challenges, goal-directed and context-bound. It does not tell a teacher exactly what to do on Tuesday morning. It specifies the characteristics a designed environment must have to produce a particular kind of learning in a particular kind of setting.

Reciprocal teaching, for example, must involve genuine transfer of cognitive responsibility from teacher to students as instruction proceeds; it must retain the qualities of collaborative dialogue; it must not devolve into round-robin reading with comprehension questions.

A theory about how children’s phonemic awareness develops is a domain-theoretic claim, warranted by cognitive science. A theory about how a phonics curriculum should be sequenced across the year to support that development in a specific population of first graders is a design-framework claim, warranted by classroom-based design research.

The two are connected — the design framework draws on the domain theory — but they are not the same claim and they do not require the same evidence. This is the distinction Edelson makes methodologically tractable.

A design framework for phonics instruction would have to address a long list of questions the scripts do not. How to sequence instructional activities across days and weeks. How to calibrate dosage. Which children, with which prior knowledge, need which sequences. How to sustain children’s enjoyment in reading rather than grinding it down. How to guard against the self-defeating self-judgments that can harden into “I am bad at reading” by April of first grade.

How to draw on other children as resources rather than treating peers as distractions. How to incorporate aesthetic experiences — poetry recitation, songs, choral reading — that root decoding in the rhythms of language children already love. How to integrate authentic writing activities so that letters and sounds connect to children’s own purposes for putting words on a page.

Each of these is a design question, not a cognitive question. Each requires evidence generated in classrooms, with teachers as co-investigators, at appropriate grain sizes. Design research as a professional learning strategy — teachers and researchers refining a framework across years of iterative practice — would pay much bigger dividends than scripts.

But Edelson’s typology, for all its clarity, left open the question of how a researcher would know, empirically, whether a design framework had done its work.

Holding Claims to Their Warrants

William Sandoval (2014) writes two decades after Collins and twelve years after Edelson, and he writes into the problem the prior two texts left open.

Edelson gave the field a typology of what design research produces — descriptive domain theories about how some portion of the world works, prescriptive design frameworks about how to build learning environments — but did not specify how researchers trace a particular finding back to a particular kind of claim.

A domain theory of phonemic awareness motivates the development of design frameworks for phonics instruction, but the problem of warranting the claim that the framework — enacted without lethal mutation — reliably produces the intended learning outcomes is a problem Edelson's typology names without solving.

Kelly (2004) named this gap directly: design research, he argued, lacks an “argumentative grammar,” which Sandoval quotes as “the logic that guides the use of a method and that supports reasoning about its data” (p. 19, citing Kelly). Conjecture mapping is Sandoval’s answer.

A conjecture map starts with a “high-level conjecture” (p. 22) — a theoretically principled idea about how to support some kind of learning in some kind of context. The conjecture is too abstract, on its own, to determine design. It becomes determinate only through “embodiment” (p. 22) in the features of an actual learning environment.

Sandoval specifies four kinds of features that can embody a conjecture: “tools and materials, task structures, participant structures, and discursive practices” (p. 22). These four categories are where Collins’s “independent variables” go to become authentic material.

Collins listed the technologies, the configurations, the roles, the time organization; Sandoval organizes those listings by function — what each kind of feature is supposed to do in the learning environment. The shift from enumeration to function is the methodological gain.

The embodiment is hypothesized to generate “mediating processes” (p. 22) — observable interactions among participants and the designed environment, or artifacts produced through learning activities. The mediating processes, in turn, are hypothesized to produce desired outcomes. The map reads left to right: high-level conjecture → embodiment → mediating processes → outcomes.

What makes the map an argumentative grammar rather than just a diagram is the distinction Sandoval draws across its arcs. The connection from embodiment to mediating processes is a design conjecture, which takes the general form: "if learners engage in this activity (task + participant) structure with these tools, through this discursive practice, then this mediating process will emerge" (p. 24). The connection from mediating processes to outcomes is a theoretical conjecture: "if this mediating process occurs it will lead to this outcome" (p. 24).

An example from a different context sharpens the distinction. A design conjecture about reluctant ninth-grade writers might read: if students arrange the main topics of an essay in a visible organizer before drafting, they will write longer, better-organized essays with less frustration. The corresponding theoretical conjecture would read: cognitive effort expended to organize ideas prior to expression supports thoughtful elaboration in advance of drafting and provides internal scaffolding.

The design conjecture is about whether the visible organizer, as a tool embedded in a task structure, generates the mediating process of pre-drafting organization. The theoretical conjecture is about whether that mediating process, when it occurs, actually produces the outcomes we care about.

The two conjectures fail in different ways and require different evidence. A design conjecture fails when the embodied features do not produce the mediating processes they were supposed to produce — students use the organizer but don't actually organize their thinking, say, or refuse to use it at all. A theoretical conjecture fails when the mediating processes occur but do not produce the desired learning — students organize their ideas thoughtfully but still write short, frustrated essays.

This distinction is what Edelson’s descriptive/prescriptive split implied but did not formalize. Edelson told us that a design framework is prescriptive and a domain theory is descriptive; Sandoval tells us how to test each kind of claim against evidence and how to know, when something fails, which kind of claim has failed.

A scripted phonics program that does not produce long-term reading gains could be failing in either of two distinct ways. The activities may not produce the cognitive engagement they were designed to produce — a design conjecture failure, in which the embodiment is wrong. Or the cognitive engagement may occur but may not actually produce reading — a theoretical conjecture failure, in which the underlying theory of how decoding develops into reading is impoverished. The script as delivered to teachers cannot tell anyone which is happening. A conjecture map can.

The conjecture-mapping framework also makes Collins’s most prescient line audible at full volume. Collins noted in 1990 that “relations of community to school” become available as a variable only at larger grain sizes. Sandoval’s framework lets us see why this matters methodologically.

Community-school relations operate on the mediating processes — they shape whether the interactions and artifacts a design hopes to generate actually emerge in real classrooms, with real children, in real neighborhoods. A design conjecture that holds in one context may fail in another not because the embodied features changed but because the community did.

Sandoval’s map gives researchers a way to be explicit about that, to ask which conjectures are claimed to hold across which contexts, and to revise the map when contexts reveal what the original conjectures missed.

What Sandoval offers, finally, is what Kelly demanded: a way to make design research’s claims criticizable on the same logical terms as the claims it competes with. The Science of Reading's argumentative grammar is the grammar of the randomized controlled trial — randomization, control, statistical inference.

Sandoval gives learning science its own grammar, one suited to the kinds of claims it actually makes about classrooms, teachers, children, and the conditions under which learning happens. The two grammars are not in competition. They warrant different kinds of claims. The conflation that opened this essay is, at root, a failure to recognize that there are two grammars at all.

Why Partnerships, Not Pipelines

Laboratory research cannot drive professional practice. It can inform practice, constrain practice, motivate practice, and rule out practices that contradict what we know about how human cognition works. What it cannot do is answer the questions that practice asks.

Those questions — about sequencing, dosage, differentiation, sustaining children’s enjoyment, recovering children’s confidence, building on what particular families and communities bring to particular classrooms — are design questions, and design questions can’t be answered in a test tube.

This is the structural insight that ties Collins, Edelson, and Sandoval together. Collins gave us the unit of analysis and the methodological humility to recognize that classroom-level studies cannot reach all the conditions that determine whether innovations live or die.

Edelson distinguished descriptive claims about how the world works from prescriptive claims about how to build learning environments and insisted these are different kinds of claim with different warrants.

Sandoval made the inferential structure explicit so that researchers and teachers together can diagnose which conjectures are holding, which are failing, and why. None of it is possible in a laboratory. All of it requires sustained collaboration between people who know research and people who know children, classrooms, schools, and communities — in other words, university-school partnerships.

Such partnerships exist. The Strategic Education Research Partnership (SERP), based in Washington, D.C., runs sustained design-based collaborations with school districts across the country and has produced fully tested instructional materials for adolescent literacy, mathematics, and science.

The National Network of Education Research-Practice Partnerships (NNERPP), housed at Rice University’s Kinder Institute for Urban Research, currently connects 79 active research-practice partnerships across the United States and provides a professional learning infrastructure for the work.

The National Center for Research in Policy and Practice (NCRPP), based at the University of Colorado Boulder, develops tools and frameworks for assessing the health, effectiveness, and equity of research-practice partnerships.

These three organizations, among others, are doing the work of translating cognitive science findings, learning theory, and disciplinary expertise into the iterative, classroom-embedded design work that produces warranted pedagogical claims.

The Science of Reading captured state-level legislation because it offered policymakers a pipeline they could legislate, bypassing all of the messiness that schools in the real world deal with. Phonological awareness, grapho-phonic analysis, oral fluency — measurable inputs, measurable outputs, scriptable interventions.

Learning science offers something less photogenic and more demanding: long-term partnerships between researchers and teachers, conjecture maps that get revised when classrooms surprise us, design frameworks tested across multiple grain sizes, and the willingness to find out that what worked in one school’s soil does not work quite the same way in another’s.

The work is also the only work that can actually warrant the claim that a particular phonics curriculum, enacted by a particular teacher, with these particular children, in this particular community, will produce readers — and not just test scores.

The conflation that opened this essay is, finally, a confusion about what the question even is. Whether children need phonics is settled; yes, the cognitive evidence is overwhelming. How to teach phonics in real classrooms, to real children, in ways that produce real readers and sustain real love of language, is an unsettled question with multiple answers that design-based research is built to answer and that university-school partnerships are built to host.

Until policymakers and school district leadership recognize the difference, scripts will continue to transform teaching into ventriloquism in first grade, NAEP scores will continue to confuse and roil the public, and the field that actually has the methodology to answer the question will continue to be mistaken for the field that does not.

Share Learning to Read, Reading to Learn

References

Collins, A. (1990). Toward a design science of education (Technical Report No. 1). Center for Technology in Education. (ERIC Document Reproduction Service No. ED 326 179)

Edelson, D. C. (2002). Design research: What we learn when we engage in design. The Journal of the Learning Sciences, 11(1), 105–121. https://doi.org/10.1207/S15327809JLS1101_4

Kelly, A. E. (2004). Design research in education: Yes, but is it methodological? The Journal of the Learning Sciences, 13(1), 115–128.

Sandoval, W. (2014). Conjecture mapping: An approach to systematic educational design research. The Journal of the Learning Sciences, 23(1), 18–36. https://doi.org/10.1080/10508406.2013.778204

Strategic Education Research Partnership Institute. (n.d.). Bridging research, practice, design in education. https://www.serpinstitute.org

National Network of Education Research-Practice Partnerships. (n.d.). NNERPP: Developing, supporting, and connecting research-practice partnerships in education. Kinder Institute for Urban Research, Rice University. https://nnerpp.rice.edu

National Center for Research in Policy and Practice. (n.d.). Research-practice partnerships. University of Colorado Boulder. https://www.ncrpp.org/research-practice-partnerships

Learning to Read, Reading to Learn

Discussion about this post

Ready for more?