AI and Scaling Compromises: Not a Technological but a Human Issue
“Yeah, that’s great, kids would love it and learn a lot, but how do you do that with 30 kids? How do you bring that to scale?” This question haunts American pedagogy, the ghost of Christmas past, the pragmatic rejoinder to every promising educational innovation. It’s a reasonable question, right? We have millions of students, hundreds of thousands of teachers, massive disparities in resources and expertise. We try to be fair to everyone. This is America. Of course we have to ask about scale.
But scale has unfortunately become an institutional excuse for embracing pedagogical distortions and resisting reforms especially needed in this moment. Scale is the escape valve that keeps the pressure cooker from exploding while the grand chef is asleep at the stove. And now AI is bleeding into every nook and cranny of that excuse, promising to finally solve the scaling problem once and for all by ignoring it. In Brownsville, Texas, and across California, Georgia, and Louisiana, we’re watching AI go to scale in real time.
The Rise and Fall of the New Standards Project
Forgive this brief look back, but looking at a decontextualized present risks a rerun of lost opportunities to improve public school for all kids. Something about those who forget history being condemned? I see this rerun happening with every blue book sold, every cry to “grade the process, not the product!” Blue books and process are weak responses.
In the 1990s, the New Standards Project represented something that had never been attempted in American education. It was an organized, well-conceived, grassroots effort to build a national authentic assessment system from the ground up, coordinated through the National Center on Education and the Economy and the Learning Research and Development Center at the University of Pittsburgh. The two professional organizations that historically have given voice to high school English and mathematics teachers—the National Council of Teachers of English and the National Council of Teachers of Mathematics—would help write the standards. They would develop curriculum-embedded performance assessments that collected evidence of what students could actually do with their knowledge and expertise. Twenty-two states and six urban school districts joined the partnership. Writing portfolios and mathematics portfolios would be piloted across these states. Middle and high school students would compile evidence of growth over time. Teachers would learn to evaluate complex work using rubrics they helped develop.
The project was massive. Just transporting paper portfolios was a big job. Those of us teachers engaged in it from 1991 through 1996 understood we were building infrastructure that would change everything about how literacy and mathematics were taught and assessed in American schools. We were helping to write literacy standards that honored classroom practice and actual student work. We were talking with other teachers back home and creating rubrics made of ideas from all over the country. Training teachers to implement experimental portfolio designs. Building the architecture for authentic assessment at scale. We believed we were creating something that would finally align assessment to the actual complexity of learning and tap into the power of assessment to contribute to self-regulated and motivated learning—at scale.
Then, literally overnight, the New Standards Project was gone. The funding dried up. I still don’t know what happened. The partnership ended in 1996. One morning there was no longer an NSP. Those of us who had been working on it were left with nothing, scattered pieces of a system that never got implemented. I have pieces of the materials I helped create somewhere in my garage, small artifacts of a collapsed infrastructure that was abandoned before it could be built. I kept the parts I was helping to make in case we got called back to finish.
Enter No Child Left Behind
What happened next was totally predictable. No Child Left Behind passed in 2002 with the promise that 100 percent of students in every public school would reach proficiency in reading and mathematics by 2014. Not 95 percent. One hundred percent. Every single student. Including students with severe cognitive disabilities. Including students who arrived speaking no English. Including students in the poorest schools with the fewest resources. All of them proficient in twelve years.
Results, not excuses.
The mathematics of the system guaranteed failure. Schools that already had 70 percent of students proficient could coast for years without improvement and still make Adequate Yearly Progress. But a school that worked tirelessly to climb from 25 percent to 40 percent proficient would be labeled as failing. The higher you climbed, the harder it became to show required gains. A school at 85 percent proficiency faced much steeper challenges reaching 95 percent than a school moving from 40 percent to 55 percent, yet NCLB treated these as equivalent tasks. As critics noted en mass using words like these, “The increases demanded by No Child Left Behind are way larger than anything we’ve ever seen in the past or that you see in other countries.”
The consequences were brutal and escalating. Schools that failed to make AYP for two consecutive years had to offer students transfers to other schools, paid from their own budgets. After three years, they had to provide tutoring services, again by diverting existing funds. After four years came “corrective action”—replace staff, implement new curriculum, extend the school day. After five years, complete restructuring: reopen as a charter, replace all staff, hand operations to a private management company, or surrender control to the state.
By 2010, the predictions came true. California saw 61 percent of schools fail to make AYP, up from 34 percent in 2006—an increase of nearly 3,000 schools in four years. In Illinois, 51 percent of schools missed targets. In South Carolina, 80 percent failed. The higher the targets climbed, the more schools fell short, exactly as the mathematics predicted.
Schools needed systems that could meet these demands. Test millions of students annually and produce scores quickly and cheaply. Generate comparable data across schools, districts, and states. Demonstrate “evidence-based” effectiveness. Be implemented with “fidelity” regardless of teacher expertise.
The scaling question got brutal: what can we measure cheaply and implement uniformly to prove adequate yearly progress toward an impossible goal? The answer systematically eliminated everything dialogical, everything that required teachers who themselves engage genuinely with texts, everything that took time and professional judgment.
Everything that made literacy actually human.
The Science of Reading’s Victory
Authentic assessment didn’t lose on pedagogical grounds. It lost on scaling grounds. The “science of reading” won because it offered what NCLB demanded: isolatable components, measurable outcomes, ostensibly gold standard research evidence from controlled studies, packaged curricula with intentions to be implemented uniformly, and the false promise of equity through standardization—every child receives the same instruction regardless of context.
Most seductively, it offered a solution to the expertise problem. If literacy is a set of discrete, trainable skills, you don’t need teachers who themselves read dialogically and can cultivate that capacity in students. You need teachers who can follow scripts, deliver phonics lessons with fidelity, administer assessments, and track data.
We optimized for transmissibility because we demanded scalability. We built an entire infrastructure around this logic—an infrastructure that the Common Core State Standards would later inherit but do little to dismantle. The federal government poured billions of dollars into this system. States enacted reading mandates. Publishers produced scripted curricula designed for uniform implementation. Professional development trained teachers in “fidelity of implementation” rather than pedagogical expertise. Assessment systems aligned themselves to testable skills rather than authentic literacy.
Now AI replicates all of it better than human teachers can, because we systematized exactly what machines excel at: pattern-matching, formula application, and monological extraction.
AI Scales Up: The Transmission Infrastructure
The infrastructure U.S. society built to meet NCLB’s demands for accountability and the Common Core’s thirst for formative data created a market hungry for solutions that could scale efficiently. Indeed, public discourse in the wake of recent NAEP test scores suggests that educators have bought into the premise that standardized test scores authentically map the reality of student learning.
AI has rushed in to satisfy this market hunger. The trajectory is clear: AI-powered curriculum systems that automate instruction, assessment, and intervention are scaling rapidly across U.S. education, driven by economic pressures, teacher shortages, and state mandates, while operating largely ahead of evidence about effectiveness or comprehensive guidance about risks.
These systems share fundamental characteristics. They deliver content algorithmically based on recognition of patterns in student responses. They often reduce human interaction in core instruction by positioning teachers as facilitators or monitors rather than as guides who themselves engage with disciplinary knowledge. They collect extensive data characterizing every student keystroke, building profiles that predict performance and prescribe next steps. They emphasize mastery of discrete, testable skills. And they promise to scale instruction efficiently by minimizing requirements for teacher expertise in the actual content being taught.
The appeal to administrators hired to keep the institution running according to specs is obvious. These platforms solve the problems NCLB created. They generate the quantitative data that accountability systems demand. They can be implemented across entire districts with relative uniformity. They promise “personalized” learning without requiring smaller class sizes or more expert teachers. They deliver instruction “with fidelity” because algorithms follow their programming perfectly—well, almost. They address the teacher shortage by reducing the role of teachers in core instruction. And they do all of this at a cost structure that looks attractive compared to investing in sustained teacher development.
But look at what these systems actually do. They listen to students decode words and provide immediate corrective feedback at the phoneme level. They track whether students can identify main ideas in short passages and match them to predetermined correct answers. They deliver pre-programmed “tutoring strategies” based on pattern recognition algorithms. They monitor engagement through continuous data collection about every click, every pause, every answer. They’re the phonics model and the five-paragraph literacy model scaled to perfection—reading as decoding, comprehension as extraction, learning as mastery of component parts, all delivered by AI that can work with infinite students simultaneously at minimal cost.
The adaptive learning revolution represents the logical endpoint of optimizing for scalability over dialogical engagement. Our public school system under federal pressure to perform for the cameras built the infrastructure—the demand for measurable outcomes, the emphasis on discrete skills, the need for uniform implementation, the faith in personalized delivery of predetermined content. AI simply perfected what our institutions were already channeled to do, making the transmission more efficient.
The teacher who might have read alongside students, might have been genuinely surprised by a text, might have modeled dialogical engagement, is rapidly being replaced by an algorithm that delivers content with perfect fidelity to its programming and absolutely no capacity for the kind of transformation that makes literacy genuinely human.
AI Scales Up: The Amira Expansion
Several AI companies are becoming one-stop-shopping centers for educators on the business side of teaching and learning. While Alpha targets private schools serving tech workers and the wealthy, Amira is scaling up through public education mandates. In December 2024, California’s state panel unanimously approved Amira as the only literacy screener for mandatory K-2 assessment requirements. Georgia rated it the highest-scoring screener for K-3 mandated assessments. Louisiana is expanding access to 100,000 students. New Jersey adopted it for K-3 screening under the state’s new Literacy Framework. Colorado approved it under the READ Act. It’s now used in over 2,000 districts nationwide, reaching more than 2 million children.
Amira is an “AI Reading Tutor” that listens as students read aloud, capturing “pauses, hesitations, fluency, and other factors” while delivering “real-time feedback” and “personalized intervention plans.” The marketing promises are seductive: Amira provides one-on-one tutoring “equal to or greater than high-dosage human tutoring at just 1/70th of the cost.” It “identifies 98% of students at risk for dyslexia.” Students who use it 30 minutes per week show “68% faster reading growth.”
The technology trains on millions of oral reading samples and uses AI to detect “nuanced reading behaviors,” delivering “micro-interventions” at “the moment of struggle.” It guides students through “interactive conversations” and “Socratic dialogue.” It continuously progress monitors, tracks growth, and estimates reading level. Teachers receive “skill-based insights” and “suggested next steps.” The system is “fully AI proctored,” saving teachers “valuable time” by handling assessment, diagnosis, and intervention automatically.
According to Amira, there is no question about how to teach reading in the early grades. Nor is there any doubt about what ought to be measured:
“The Science of Reading provides a proven framework for teaching students to read. Decades of studies confirm that explicit, systematic phonics instruction, guided oral reading, and structured comprehension strategies significantly improve literacy outcomes. Studies from organizations like the National Reading Panel and the International Dyslexia Association validate the effectiveness of these approaches when implemented with fidelity.”
This isn’t about phonics versus meaning-making. Both matter. I know that. But do the advocates of a narrow phonics-only perspective know that? The issue is how our institutions’ obsession with scalable, measurable outcomes systematically eliminates what can’t be automated—and now AI perfects that elimination.
State education agencies love it because it solves multiple scaling problems simultaneously. It provides cheap, standardized assessment that can be administered to millions of students without extensive teacher training. It generates the data required for accountability systems. It promises to identify at-risk readers early. It offers “evidence-based” interventions aligned to the science of reading. And it addresses the teacher shortage by automating the most time-intensive aspects of literacy instruction.
But look at what Amira actually does. It listens to children decode words aloud and provides immediate corrective feedback at the phoneme level. It tracks fluency rates and comprehension through standardized questions. It delivers pre-programmed “tutoring strategies” based on pattern recognition algorithms. It monitors engagement through continuous data collection. It’s the phonics model scaled to perfection—decoding as the essence of reading, immediate feedback on isolated skills, mastery of component parts, all delivered by AI that can work with infinite students simultaneously at minimal cost.
The Breaking Point
When does scaling up mean selling out? The breaking point arrives when institutional demands for scale force us to eliminate the very thing we claim to be teaching. We say we want students who can engage deeply with complex texts, encounter diverse perspectives, develop original ideas, participate in genuine dialogue across difference. But we’ve built systems—and now automated those systems—to train them to decode mechanically rather than engage dialogically, extract predetermined meanings rather than encounter surprise, apply formulas rather than risk genuine response, match patterns rather than undergo transformation.
Vygotsky warned us to heed our humanity. Higher mental functions emerge through participation in social practices with more capable others. Prolonged engagement in authentic literacy events with adults who themselves read dialogically surrounds children in what it looks like when consciousness encounters consciousness across pages. Decades of research on dialogic instruction from early studies of reading as interactive and transactive to contemporary work on literacy as embedded and enacted social discursive practices—demonstrates that comprehension and critical thinking develop through sustained interaction with capable others who themselves engage dialogically with texts.
What cannot be scripted cannot be standardized nor measured cheaply. What cannot be scripted cannot be implemented with fidelity across contexts. What cannot be scripted cannot be teacher-proofed nor demonstrated to be effective through controlled studies of discrete outcomes.
Alpha and Amira represent the logical endpoint of the scaling up of NCLB and Common Core while systematically starving the dialogical practices that constitute powerful literacy capacities. Alpha eliminates teachers entirely, replacing them with adaptive algorithms and surveillance systems. Amira automates the teacher’s instructional role, reducing literacy to phonemic accuracy and fluency rates monitored by a machine. Both promise to solve the scaling problem by perfecting what public schools have been flirting with but not quite enrolling in: literacy instruction as purely monological task performance, now efficient enough to be delivered by machines.
Scaling Up or Scaling Back?
The AI crisis forces the question: what were we scaling all along? For decades, we could claim that five-paragraph literacy and phonics-based decoding were “real enough.” Students who decoded fluently and identified thesis statements seemed adequately prepared. The gap between transmitted skills and dialogical consciousness remained largely invisible.
AI makes it visible. When Alpha’s algorithms can deliver mathematics instruction more consistently than human teachers, when Amira can monitor oral reading fluency more precisely than human ears, when both can scale to millions of students at a fraction of the cost, the emperor has no clothes. We didn’t scale up literacy. We scaled up literacy’s mechanical shadow. And now machines cast that shadow better than we do.
The venture capital flooding into AI education isn’t funding technological enhancement of human teaching. It’s funding the replacement of dialogical literacy instruction with scalable monological procedures. Alpha’s billionaire backers and Amira’s state mandates represent the same bet: that we can finally solve education’s scaling problem by eliminating the need for humans who themselves engage in thinking and learning. The teacher shortage becomes irrelevant when you don’t need teachers. The assessment burden disappears when AI can test and monitor continuously. The expertise gap closes when algorithms can deliver “personalized” instruction to every student simultaneously.
Neither Alpha’s surveillance nor Amira’s intelligent tutoring can participate in dialogical events. They cannot have genuine, social encounters with texts, cannot be surprised in ways that change perspectives on the world, cannot experience the transformation that readers experiene when fictional consciousness becomes temporarily real enough to produce involuntary tears. They cannot model for students what it looks like when consciousness meets consciousness across pages across time. They can only instill step-by-step protocols, transmit procedures, monitor compliance, measure accuracy, and scale infinitely.
The breaking point isn’t about hyper scaling. It comes on when the pursuit of scale eliminates what we’re supposedly scaling up. When literacy instruction at scale means systematically preventing participation in the very social practices that develop literate consciousness, we’ve lost our moorings. Alpha’s nine-year-old, losing weight and saying she’d rather die than continue her AI lessons, reveals what we’ve sacrificed at the altar of scalability. Amira’s two million students, sitting in tiny telemarketer headsets while AI listens to them decode, shows us what we’ve scaled instead.
How can scaling up guard against scaling back? It can’t. Not within the logic that created Alpha and Amira. Not within institutional structures that demand cheap measurement, uniform implementation, quantifiable accountability, and solutions to the teacher shortage that don’t require investing in teachers who themselves engage dialogically.
The scaling question itself is the problem—not because scale doesn’t matter, but because the way we’ve answered it is systematically eliminating what makes literacy human. And now economically powerful humans are using AI to perfect that elimination, scaling it beyond anything we imagined possible, reaching the point of inversion where all of our good intentions boomerang and destroy what we set out to save.

Oh the humanity! Seriously. What we did in the 1990s did work, and inspired me to develop and test systems to discover what actually worked in my classroom, including quick feedback from students about elements I was testing. It worked. I'm sorry to see us turning over the education system to technology, instead of longitudinal research by practioners who know what they are doing.
This essay broke my heart