Computational Ethnography: A Contradiction in Terms?
If we are lucky, we can remember at least one teacher who made a difference. Not necessarily the ones who taught us things—if you are reading this, thank a teacher— but the ones who changed us, who saw something in us before we saw it ourselves. Some of us can name them even if we struggle to articulate what they did. They were present; miraculously, they are still present.
Now imagine a world in which artificial intelligence has learned to monitor this work — sort of. Imagine that the timing of a teacher’s response, the ability to assess who needs what and when, has been studied enough that software can evaluate it and even approximate it.
Not replace the teacher exactly. Sit beside her. Notice things she might miss. Tell her, on a dashboard on her Apple Watch, which group is stuck and which is on fire, which student has gone quiet and which is highly active in the problem-solving mode.
This is the bet the National Science Foundation (NSF) is making. In 2020, it funded the National AI Institute for Student-AI Teaming, a consortium of universities led by the University of Colorado Boulder, with twenty million dollars and a five-year mandate to build artificial intelligence that supports collaborative learning in K-12 classrooms.
In 2025, it renewed the institute for another five years as part of a hundred-million-dollar federal investment in AI research. The bet, in other words, has been more than doubled.
This bet is relying on AI to do participatory computational ethnography in real time. In the spirit of high-level conjecturing, the Boulder group hopes to teach AI to become a valid and reliable second opinion much like AI is already being used as a diagnostic assistant in medicine.1
Teaching In Situ
A middle school computational thinking classroom, somewhere in the rural West, Ms. Miller’s room. The unit is called “Sensor Immersion.” Students in small groups of two to four, wiring physical sensors — environmental, sound, soil — to microcontrollers, writing code in a block-based editor to read the sensor values, watching numbers scroll on tiny screens.
The teacher moves among the groups, leaning in, asking questions, sometimes calling the class together for a check-in. On a tripod near each group sits an iPad. A Yeti microphone catches the talk. The class lasts forty-five minutes. Two hundred and five videos, roughly twenty-one hours of small-group work, are eventually collected from four classrooms across two districts.
Later, the videos will be transcribed. Whisper and Google’s speech-recognition systems will be run on the audio, with tolerance for high word error rates because children’s speech is hard for machines. Human annotators will clean the transcripts.
Humans will also code the transcripts against frameworks developed by the institute: the Generalized Collaborative Problem-Solving Model for the verbal layer, which sorts utterances into three facets called constructing shared knowledge, negotiation, and maintaining team function; the Nonverbal Interactions in Collaborative-Learning Environments scheme for gaze and gesture and posture; the Moments of Support Analysis in Collaboration protocol for the teacher’s interventions, which codes each support moment for who initiated it, who received it, and what its focus was.
The codes feed models. The models, eventually, will run in classrooms. The flagship is called CoBi, short for Community Builder. CoBi sits beside small groups, displaying a digital tree on a nearby screen; if the students are collaborating and respecting their peers, blue or orange flowers may begin to bud from the tree.
A more recent system, described in a 2025 paper in the British Journal of Educational Technology, is a multimodal timeline: teachers can query the system for instances of student disengagement, and those moments are automatically marked on the timeline for further exploration. Classrooms become data; data becomes labels; labels train models; models eventually are conscripted to assess classrooms.
Teaching In the Abstract
My title, Computational Ethnography, may evoke uneasiness, discomfort. I take full responsibility for using this coinage. It puts me in mind of my dissertation title: An Ethnography of an Experiment. “Computational” promises scale, replicability, machine-readable categories. “Ethnography” promises almost the opposite — the patient, situated reading of cultural life by an observer who knows she is in a particular place at a particular time with particular people.
Training AI to execute computational ethnography is the goal of the Boulder project. The high-level design conjecture is something like this (my words, not Boulder’s): If AI can be trained to analyze and assess student collaborative behaviors in real time, its output can alert teachers to problems and opportunities for learning among small groups of students.
Let’s return to the classroom. The teacher walks by group three. The students are heads-down over a sensor that isn’t reading correctly. The sound sensor that had locked onto the HVAC vent and was reporting the classroom as a sustained 78 decibels of nothing had led Jaylen to conclude, with the calm certainty of a twelve-year-old who has just discovered epistemology, that the room was loud, but in a way humans couldn't hear.
The teacher crouches, asks what they’ve tried, listens, points at one connection, asks another question. Ninety seconds, maybe two minutes. She stands up and moves on.
Through Boulder’s coding system applied by human analysts, that’s one of 542 support moments, or 618 depending on which version of the paper you read. It will be coded as teacher-initiated, small-group recipient, task-focused. It will join the aggregate that lets the researchers report findings like these: though the teacher most frequently initiated support across all activities, students more proactively initiated support in the program and wiring activity, the most complex of the tasks, accounting for 41.5% in program and wiring compared to 23%-33% in the other activities.
Or: support across activities focused on providing directions about assignments was high for all activities, ranging from 58% in model card sorting to 73% in programming and wiring. Collaboration-focused support was rare across all activities, with a notable variation. It was highest at 18% in the jigsaw activity but was six times less, at 3%, in discussions of classroom norms.
These are real findings about real classrooms, derived from a coding scheme applied with acceptable reliability (Hoang, Bush, and Dey 2024). I have no quarrel with MOSAIC, the coding system, as an instrument. It counts what it counts honestly, and the counts reveal something — the relationship between activity design and support patterns, the surprising scarcity of collaboration-focused support even in a curriculum designed to foster collaboration.
My quarrel is with what falls outside the count. Watch the teacher walk by group three again, but this time without the codebook. She slowed down before she reached the group, half a step. She’d noticed something — what, exactly? Some posture, maybe; some hesitation in the group’s overlapping speech that told her this group was different from the next one over. Something subliminal, perhaps.
As she crouched, she did not begin with a question. She watched, briefly. Three seconds. Whatever she saw in those three seconds shaped the question she eventually asked.
The question itself was not generic; it took up something specific one of the students had said two minutes earlier, that the teacher had heard from across the room and held.
The student’s face when the teacher asked the question — there’s recognition there, a moment of “oh, she heard us.” None of the slowed step, the three-second watch, the held remark, the recognition appears in the support-moment count. The count registers that support happened. It cannot register what made the support impact learning.
This is the void. The codebook records the support moment as an event with categorical features. What it does not record is what the educational researcher William Sandoval, in his work on conjecture mapping, calls embodiments: the material, discursive, and cultural practices through which an instructional idea takes form in a particular room (Sandoval 2014).
For Sandoval, embodiments are not the surface beneath which the real learning happens; they are the learning, in its irreducibly situated materiality. The teacher’s slowed step, the three-second watch, the way the question takes up a specific earlier remark — these are not incidental in a support moment. They are the support moment.
Trying to Code the Uncodable
What kind of knowing produces such moments?
Not, I think, a knowledge that could be written down in a textbook and assigned for someone to read. The British sociologist Harry Collins, who has spent forty years studying how expertise actually works, distinguishes three kinds of tacit knowledge. The third and hardest he calls collective tacit knowledge, which exists only as social practice within a community.
Such knowledge cannot be possessed by an individual or replicated by a machine because its existence is constituted by ongoing participation in the community that holds it (Collins 2010). Native fluency in a language is Collins’s example. You don’t know a language by knowing rules; you know it by being a participant in the form of life that uses it.
Pedagogical expertise is collective tacit knowledge. The teacher is a presence in a community of practice in which “assessing a classroom” is a recognized activity, learned from practicing with other teachers in communities of distributed expertise, refined through years of being seen by colleagues, by students, and by the local community.
Her judgment in the three-second watch is the operation of that participation on this scene. A machine trained on annotated video of classrooms is not a participant in that community. It has been trained on records produced by participants. The records are not the practice. They are traces of the practice with the constitutive social dimension stripped out.
Theoretical Constraints on AI as a Classroom Monitor
Three constraints, each from a different literature, each point to the same difficulty:
The voiding of tacit knowledge
The failure to see the whole classroom
The automation paradox
The first is the inability to accommodate tacit knowledge. A community of practice generates records, but the records are not the community, and a system trained on them learns to recognize features of records rather than to participate in practice. This problem can’t be solved by collecting more records. Just as standardized test scores often leave teachers scratching their heads, standardized AI records will never be an adequate basis for making pedagogical judgments of a professional nature.
The second comes from the design-research tradition the iSAT project itself inherits. Ann Brown, one of the founders of the Learning Sciences, confessed in 1992 that her training as an experimental psychologist had given her methods that “did not evolve to capture learning in situ” (Brown 1992).
Brown’s whole career was a long attempt to understand methods that could. The design experiment, as she developed it, was an admission that learning happens in places where the laboratory’s tools can’t reach, and that the right response is to take the classroom seriously as the unit of analysis, not as a simple object to be measured and fitted like a shoe.
The iSAT project inherits Brown’s language — situated learning, collaborative inquiry, deep conceptual understanding — but the apparatus the project is building seems to work in the opposite direction. It is the laboratory’s methods seeping back into the classroom, more finely instrumented than before for sure, but instrumented to produce data that fits the model rather than to reveal the practice.
The third comes from a place no one in education research usually looks. In 1983, the British psychologist Lisanne Bainbridge published a short paper in the journal Automatica called “Ironies of Automation.” Her subject was industrial process control — chemical plants, power stations — and her argument has aged into one of the foundational statements of what’s now called the automation paradox.
Automation, she observed, may expand rather than eliminate problems with the human operator (Bainbridge 1983). The case for automating routine work depends on the human being available to handle the exceptional. But the capacity to handle the exceptional depends on having stayed engaged with the routine.
The autopilot pilots all but the worst minutes of the flight; in those worst minutes, the pilot is suddenly required to fly a plane she has not been actively flying, in conditions worse than ordinary. The radiologist who reviews AI-flagged scans rather than reading every scan from scratch might start to miss what the AI didn’t flag.
The application to classrooms is straightforward. A teacher whose role becomes overrider of dashboard readings — accepting the dashboard’s reading in most cases, intervening only when she disagrees — is no longer the primary observer of her own classroom. Her observational stance shifts. Not in one class period, but across years. The dashboard becomes the default; her judgment becomes the appellate court. The expertise she’s been asked to retain is the expertise her new role atrophies.
Put the three together. Pedagogical judgment is constituted by ongoing participation in a community of practice that cannot be transmitted by reading texts of any kind, not even graphs on a screen prepared by AI. The methods now being used to study this knowing were developed to reach where they cannot quite reach. And the artifacts being built from those methods, if deployed as intended, will risk degrading the very expertise they were built to support.
Back to Ms. Miller
Let’s go back, one more time, to the rural Western classroom. Watch Ms. Miller, who has been teaching computational thinking for nine years, walking from group three to group four. She is doing something that, in the language we now have available, looks like this: deploying years of pattern recognition built from being a teacher among teachers, in this district, with these students, with this curriculum.
Her approach to group four is itself the operation of her expertise. The pause before she speaks, the question she chooses, the way she lets one student answer before another — these are not behaviors she could enumerate. They are her judgment in motion.
Asked to explain why she slowed her step before group three, she might say “I could tell they were stuck.” Pressed on how she could tell, she might shrug; she might well have forgotten. The knowledge is operative without being articulable. It is also reliably operative — that is what makes her a good teacher rather than a beginner. She can be trusted, again and again, to make these judgments.
Now imagine the multimodal timeline running in her classroom. She has an Apple Watch on her wrist, the kind the institute’s promotional materials envision, and as she approaches group four it pings. Group three: disengaged. She had just read group three — same group, same moment — as deeply concentrating. Jaylen’s HVAC theory was wrong but it was a theory; the kids were arguing about it, which was the point.
What happens next?
If the dashboard is a supplement to Ms. Miller’s judgment, it has to defer to her in cases of disagreement, which makes it redundant in cases of agreement and an error in cases of disagreement. If the dashboard is a check on her judgment, then her judgment has been demoted to one input among others in a system whose other inputs are computational.
That demotion is the bet’s actual structure. She is being asked to weight her embodied, tacit, communally-built knowing against the categorical readings of a model trained on stripped-out traces. She is being asked to treat her own expertise as fallible in roughly the way the model’s categorization is fallible — comparable kinds of judgment, comparable kinds of error.
But they are not comparable. The model’s categorization is a probabilistic assignment of features to labels. Ms. Miller’s reading is the operation of collective tacit knowledge on a situated scene. These are different kinds of knowing, and to treat them as commensurable is to make the category error Collins spent his career warning against.
The Bet, Reconsidered
The opening of this essay invoked an analogy: AI as second opinion, much like diagnostic assistants in medicine. The analogy may be more generous to iSAT than it warrants, though I admit to a strong bias toward the centrality of professional tacit knowledge distributed among practicing experts. I am similarly reluctant to argue against development of iSAT for research purposes.
Medical diagnostic AI works — to the extent it works — because the phenomena it reads are also what the radiologist or pathologist reads. A tumor on a scan is a tumor on a scan; the AI’s pattern recognition and the human’s pattern recognition are operating on the same kind of object, with the same kind of evidence. The two readings can be compared because they are readings of the same thing. The AI is genuinely a second opinion because it has access to the first opinion’s data.
The classroom dashboard does not work this way. It is not reading what Ms. Miller is reading. She is reading a community of children she has known for months, in a room whose acoustics and light and social currents she has learned, deploying a kind of knowing that exists only because she has been a participant in the form of life called teaching for nine years.
The dashboard is reading features of audio and video data — vocal energy, gaze direction, posture, utterance categories — extracted from records of that community by a model trained on records of other communities. The dashboard and the teacher are not looking at the same thing. They are looking at different objects, even though they exist in the same room.
The medical analogy fails at exactly the point that matters. The AI second opinion in medicine is competent at the level of the phenomenon. The AI second opinion in classrooms is competent at the data extracted from the phenomenon, which is not the same competence.
The iSAT project’s researchers are skilled and well-intentioned. Useful tools will come out of this work; the institute’s outputs will find applications where they actually serve teachers and students. I want to be clear about what I am and am not saying.
I am saying that the bet at the project’s center — that computational observation can serve as a primary epistemic resource for teachers, much as diagnostic AI now serves doctors — rests on a category mistake about what kind of knowing teaching is.
The remembered teachers who saw something in us before we saw it ourselves were doing something a dashboard cannot see. So is Ms. Miller, walking from group three to group four, slowing her step. The methods being built to study her work cannot register the thing that makes her work matter, and the artifacts being built from those methods cannot substitute for the expertise they were built to support.
The federal money has been doubled down on. There will be more classrooms instrumented, more annotation schemes developed, more artifacts deployed. The bet is being made at scale. But the bet’s core premise deserves naming, so that those who live with its consequences — Ms. Miller, her students, the students’ future remembered teachers, the next generation of teachers learning what it means to teach— know what has been wagered on their behalf.
References
Bainbridge, L. (1983). Ironies of automation. Automatica, 19(6), 775–779. Open PDF: https://ckrybus.com/static/papers/Bainbridge_1983_Automatica.pdf
Brown, A. L. (1992). Design experiments: Theoretical and methodological challenges in creating complex interventions in classroom settings. Journal of the Learning Sciences, 2(2), 141–178.
Carlone, H. B. (2017). Disciplinary identity as analytic construct and design goal: Making learning sciences matter. Journal of the Learning Sciences, 26(3), 525–531.
Cohn, C., et al. (2025). A multimodal approach to support teacher, researcher and AI collaboration in STEM+C learning environments. British Journal of Educational Technology. https://bera-journals.onlinelibrary.wiley.com/doi/full/10.1111/bjet.13518
Collins, H. (2010). Tacit and Explicit Knowledge. University of Chicago Press.
D’Mello, S., Tissenbaum, M., Walker, L., Whitehill, J., et al. (2024). From learning optimization to learner flourishing: Reimagining AI in Education at the Institute for Student-AI Teaming (iSAT). AI Magazine. https://onlinelibrary.wiley.com/doi/10.1002/aaai.12158
Hoang, N., Bush, J. B., & Dey, I. (2024). Support patterns in classrooms implementing a computer science and physical computing curriculum. Proceedings of the 55th ACM Technical Symposium on Computer Science Education (SIGCSE 2024). https://www.colorado.edu/research/ai-institute/media/77
Hoang, N., Bush, J. B., Dey, I., Watts, E., Clevenger, C., & Penuel, W. R. (2024). MOSAIC protocol: Analyzing small group work to gain insights into collaboration support for middle school STEM classrooms. Proceedings of the 17th International Conference on Computer-Supported Collaborative Learning (CSCL 2024). https://par.nsf.gov/servlets/purl/10586879
NSF National AI Institute for Student-AI Teaming (iSAT). https://www.colorado.edu/research/ai-institute/
iSAT renewal announcement (July 2025). https://www.colorado.edu/today/2025/07/29/research-institute-building-ai-literate-workforce-future-receives-major-new-grant-0
CoBi description in CU Boulder Research Report 2022-23. https://www.colorado.edu/research/report/2022-23/pioneering-center-reimagines-role-ai-education
Roschelle, J. (1992). Learning by collaborating: Convergent conceptual change. Journal of the Learning Sciences, 2(3), 235–276.
Sandoval, W. (2014). Conjecture mapping: An approach to systematic educational design research. Journal of the Learning Sciences, 23(1), 18–36. https://doi.org/10.1080/10508406.2013.778204
Sun, C., Shute, V. J., Stewart, A., Yonehiro, J., Duran, N., & D’Mello, S. (2020). Towards a generalized competency model of collaborative problem solving. Computers & Education, 143, 103672. Open PDF: https://myweb.fsu.edu/vshute/pdf/chenCE.pdf
Walkoe, J., & Luna, M. J. (2020). What we are missing in studies of teacher learning. Journal of the Learning Sciences, 29(2), 285–305.
A note on the references. Several works in the list above are not cited directly in the body — Carlone (2017), D’Mello et al. (2024), Roschelle (1992), Sun et al. (2020), and Walkoe & Luna (2020) — but they shaped the thinking behind this essay, and readers who want to test my framing against the underlying work should follow the links rather than take my account on faith.
D’Mello et al. is iSAT’s own umbrella article in AI Magazine, the best place to encounter the project as its researchers describe it. The Sun et al. paper lays out the Generalized Collaborative Problem-Solving Model in technical detail — useful if you want to see how a CPS coding scheme is actually constructed and validated. Roschelle’s 1992 paper on “convergent conceptual change” is foundational to the close-reading-of-interaction tradition I lean on when I speak of embodiments. Walkoe & Luna’s 2020 piece argues, with more discipline than I manage here, that current studies of teacher learning have drifted away from the microgenetic analysis that would reveal how teachers come to know what they know. Carlone’s “Making Learning Sciences Matter” makes the larger argument I leave implicit: that social science is epistemically unlike natural science, and that trying to do the former by the methods of the latter produces something diminished.
This essay is not a takedown of iSAT. The researchers are working on hard problems in good faith, and the institute has produced real and useful work. My argument is an observation about what computational annotation can and cannot reach, offered as a contribution to the conversation iSAT is helping to convene rather than as a verdict on the project. Read the sources. Make up your own mind.

Thoughtful and brilliant observations, Terry--especially in taking the argument to its final, undebatable objective fact that shuts down the premise of AI in education and medicine (for the sake of argument). Bravo.
I really enjoyed this article. I was particularly struck to distinguish between being engaged in a thing and being engaged in data about the thing.