Turn-taking as a formal area of study emerged primarily through the groundbreaking work of sociologists Harvey Sacks, Emanuel Schegloff, and Gail Jefferson in the 1970s. Their seminal 1974 paper "A Simplest Systematics for the Organization of Turn-Taking for Conversation" established the foundation for Conversation Analysis (CA) as a discipline. Through meticulous observation, they discovered that natural conversation involves remarkably little overlap or silence between speakers with transitions usually occurring at "transition-relevant places" (TRPs). These exchanges follow rules, patterns, and practices to minimize gaps and overlaps, creating a smooth conversational flow. Appearing universally across languages and cultures, these patterns nevertheless accommodate cultural variations, suggesting an underlying human capacity for coordinated social interaction.
Between the 1970s and the 2000s, a perhaps predictable shift occurred in the field. When tracing the developmental arc of conversational analysis, I discovered that while interest in human conversation hasn't disappeared entirely, it no longer commands center stage as the focus of exploration. CA is done as an instrument to improve AI. Computational approaches now dominate the landscape, with algorithms and AI models analyzing human conversational patterns at mind-boggling scales and speed to better train AI in mimicking human conversation. This technological turn has relegated traditional human conversational analysis to the periphery of academic interest, transforming what was once groundbreaking sociological research into an antiquated methodology.
The resurrection of turn-taking analysis in computational contexts represents a fascinating evolution of this field. The shift involves several key transformations. Traditional CA involved painstaking manual transcription and analysis of limited conversation samples. Computational approaches now enable researchers to analyze thousands or millions of conversations simultaneously, process multiple modalities (text, audio, video) in integrated frameworks, work across languages, and identify patterns invisible to human analysts. This exponential increase in analytical capacity has transformed what was once a boutique research methodology into a big data discipline.
The original CA work described turn-taking qualitatively through careful observation. Computational approaches have since formalized these descriptions into mathematical frameworks that capture conversation dynamics with breathtaking precision. These include probabilistic models predicting speaker transitions, machine learning algorithms that anticipate conversational flow, and predictive frameworks identifying optimal moments for conversational entry. This quantification represents a fundamental shift from description to prediction.
Perhaps most significantly, computational turn-taking analysis now directly informs the design of interactive technologies embedded in daily life. Voice assistants like Siri and Alexa rely on these models to create more natural interactions. Social robots, meeting analysis software, conversational chatbots, and dialogue systems in gaming all draw from this research tradition, translating sociological insights into practical applications that shape human-computer interaction that are all but unavoidable.
Today's leading edge work in the field of computational conversation analysis involves multimodal turn coordination, understanding how eye gaze, gesture, posture, and facial expressions signal turn intentions along with verbal cues; cultural adaptability, creating AI systems that can adjust to different cultural norms around turn-taking; prosodic modeling, computational understanding of how intonation, pitch, and timing signal turn completion; contextual understanding, systems that recognize when a turn is truly complete versus when a pause is simply for emphasis or thought; and emotional intelligence, models that can recognize and respond to emotional states that affect turn-taking behavior.
This evolution raises profound questions: What is lost when human interaction patterns are computationally modeled and reproduced? Do AI systems that mimic human turn-taking create unrealistic expectations of machine understanding? How should systems handle cultural differences in turn-taking norms without reinforcing stereotypes? Can the "mechanization" of conversation through computational models change how humans interact with each other?
The journey from ethnomethodological observation of human conversation to computational modeling represents not just a technical evolution but a philosophical shift in how we understand the innate human activity of taking turns in conversation. Critically, research has shown that infants demonstrate turn-taking behaviors well before developing language capabilities, suggesting that this social coordination skill may be innate and foundational rather than learned as a linguistic convention as language machines do. This developmental evidence strengthens the argument that turn-taking represents a cognitive and social capability that precedes and perhaps enables language acquisition rather than emerging from it. For artificial intelligence, this insight suggests that human-like AI communication systems might need to develop from biological social coordination capacities rather than purely linguistic ones.
In addressing the challenge of distinguishing human student work from AI-generated content, educators have primarily focused on shifting to in-class writing, public speaking, class presentations, and formal debates—none of which implicates turn-taking, the one aspect that distinguishes humans from AI. While these methods offer valuable alternatives, I propose that fishbowl-style conversation assessments among human students provide a more nuanced and effective approach that directly leverages this fundamental differences between human and artificial intelligence. Let’s get together a discussion group, seat them in the center of the room, gather round, and let them converse for five minutes about Achilles or Abraham or whatever material takes center stage across the disciplines. Then let’s hear the mentor, the teacher, analyze through a collective think aloud with students contributing their spontaneous thoughts.
This distinction between human-human conversation and human-AI interaction shows up in conversational rhythms. In human exchanges, conversational partners engage in a natural, fluid cadence where each participant takes turns, cues transitions, and responds with agency to continue this reciprocal exchange of words. This mutual negotiation of conversational space reflects shared social understanding governed by implicit rules and innate intuition.
In contrast, human-AI interaction follows a rigid, asymmetrical pattern: the human speaks and cues a transition, the AI responds and then freezes, awaiting another human prompt. The AI never self-selects to speak, never interrupts, and never continues without explicit permission. This creates a fundamentally different conversational ecology where the human bears the entire burden of directing the exchange.
Fishbowl conversations—where students engage in spontaneous, improvised discussions while peers observe and critique—capitalize on these distinctions. This format requires students to demonstrate real-time critical thinking and adaptive reasoning, active, empathetic listening, responsive argumentation, collaborative meaning-making, reading of subtle social cues, appropriate interruption and elaboration, and navigation of unexpected conversational turns. These uniquely human capabilities in living color would make chronic AI dependence immediately apparent without embarrassing anyone.
Beyond revealing AI use, fishbowl discussions develop precisely the communication capabilities students need to thrive in an AI-saturated world. Rather than creating "gotcha" assignments, this approach cultivates skills that complement rather than compete with AI technologies. By implementing low-stakes inescapable fishbowl conversation events, educators can transform the classroom from a site of technological anxiety into an opportunity for developing distinctly human capacities that will remain valuable regardless of how sophisticated AI becomes.
Thanks for highlighting this. Increasing conversational opportunities for students is well supported as a method to increase language and learning. One of my favorite proponents is Gordon Wells. This lovely chapter is available online: Wells, G. (2000). Dialogic inquiry in education: building online: The legacy of Vygotsky. In C.D. Lee and P. Smagorinsky (Eds.) Vygotskian perspectives on literacy research.. New York: Cambridge University Press, (pp. 51-85).