Wisdom: Knowing When to Hold Your Tongue

Apr 08, 2025

In the milliseconds before a conversation partner finishes speaking, an extraordinary calculation unfolds in the human mind. Without conscious deliberation, we predict the exact moment when their turn will end and ours may begin. This transition—rarely marked by awkward silence or chaotic interruption—is perhaps the most overlooked miracle of human communication. Linguists call these junctures Transition-Relevant Places, or TRPs, and they represent one of the most sophisticated yet common achievements of social cognition.

The magic of human conversation lies not in what we say but in when we say it. The last time you visited your doctor, you described your symptoms, your physician nodded at precise intervals, occasionally interjecting with targeted questions that redirected your narrative without seeming rude. This delicate negotiation occurs through an unconscious system of signals—falling intonation at sentence completion, slight shifts in gaze direction, almost imperceptible pauses, the minute relaxation of gesturing hands—that together constitute the grammar of turn-taking.

"It's amazing how rarely we talk over one another," notes Dr. Elizabeth Stokoe, professor of social interaction at Loughborough University. "Even though there's no traffic light system telling us when to stop and start, most conversations flow without collision." This achievement becomes even more remarkable when we consider the timing involved: research has found that across languages and cultures, the average gap between speakers is around 200 milliseconds—faster than our conscious reaction time. We are, in essence, predicting the end of someone's turn before they've finished speaking.

The doctor's office exemplifies institutional TRPs, where turn-taking follows patterns shaped by professional roles and power dynamics. When a physician asks, "What brings you in today?" they create a deliberate opening for patient narrative. Yet physicians also exercise their institutional authority by redirecting that narrative through selective questioning. Observe carefully, and you'll notice how doctors signal their continued turn-holding through documentation behaviors—glancing at charts or typing notes—which temporarily suspend the expectation of speaker change despite moments of silence. Patients, recognizing these signals, wait patiently rather than interpreting the pause as their cue to speak.

These sophisticated mechanics operate differently across social contexts. At a dinner party, the rules of engagement shift from those in a medical consultation. Conversation becomes more democratic, and speakers employ different strategies to hold or yield the floor. A guest launching into an anecdote might raise their volume slightly at potential interruption points, signaling their desire to continue. Others around the table communicate attentiveness through backchanneling—those small "mm-hmms" and "rights" that acknowledge understanding without claiming a full turn. The enthusiastic dinner guest might employ what linguists call "rush-throughs," accelerating speech pace at potential completion points to hold their turn through syntactic boundaries where others might jump in.

Traffic stops are a form of institutional talk, where the officer and driver have specific roles and goals tied to their interaction. The officer typically initiates and controls much of the conversation, but TRPs allow for structured exchanges where the driver can respond to questions or provide information. “Do you know why I pulled you over?” asks the officer. “Why, no, sir,” responds the driver. “Are my brake lights not working?” “Drivers license and registration please.”

The dynamic shifts entirely in high-stakes confrontations. In the tense scenario of a liquor store robbery, turn-taking operates under extreme coercion. When an armed robber commands, "Put the money in the bag," they simultaneously create a linguistic TRP (the completion of an imperative) while using embodied dominance displays (brandishing a weapon, maintaining aggressive posture) to suppress normal response options. The clerk recognizes the linguistic opening but understands that only compliance, not verbal response, is permitted. Here, TRPs become instruments of power, with one participant controlling not only when the other may speak but what forms of response are allowed.

These examples reveal how TRPs function as the invisible infrastructure of conversation, shaped by cultural norms, institutional settings, power relationships, and situational demands. They represent a profoundly important pattern of linguistic turn-construction nodes of coordination—a system we've all mastered without formal instruction. Children acquire turn-taking skills through games like peek-a-boo and pat-a-cake long before they develop complex language, suggesting that the ability to recognize and respond to TRPs may be more fundamental to human interaction than the content of speech itself.

But what happens when one conversation partner isn't human at all? The emergence of conversational AI has created a parallel system of artificial TRPs that both mimic and diverge from human patterns. When we engage with bots, we encounter a paradigmatically different turn-taking framework—one that lacks the embodied dimension central to human interaction.

The most obvious distinction is the absence of physical presence. Human conversations rely heavily on bodily cues to signal turn boundaries—the lean forward that indicates attention, the raised eyebrow that invites response, the hand gestures that punctuate a thought. Artificial TRPs must function without these embodied signals, relying instead on purely linguistic and formatting features to indicate turn boundaries. This creates a peculiar communicative environment indeed where certain channels are hyperarticulated while others disappear entirely.

Recent research from Tufts University highlights this fundamental difference. According to JP de Ruiter (2024), professor of psychology and computer science, while AI excels at detecting patterns in content, it struggles to identify appropriate TRPs in conversation. When researchers tested large language models against transcribed conversations, "the AI was not able to detect appropriate TRPs with anywhere near the capability of humans." Part of the reason stems from the AI's training primarily on written rather than conversational language, the same reason it struggles to rhyme words.

Time operates differently in AI conversation as well, creating a temporal distortion in artificial TRPs. In human interaction, we experience a shared timeframe, witnessing our conversation partner's hesitations, thoughtful pauses, and moments of enthusiasm. We share the same space geographically, temporally, and contextually. With AI systems, this shared space vanishes. The user has little access to the system's "thinking process," and responses appear with an immediacy that bears no relation to their complexity. Once the machine begins “speaking,” a thoughtful five-paragraph analysis arrives in roughly the same instant as a simple affirmation, collapsing the natural rhythm that helps structure human turn-taking, decontextualizing the experience completely.

This temporal distortion extends to the user's side as well. Unlike human conversations, where extended silence quickly becomes awkward, interactions with AI systems permit indefinite pauses. A user might step away for hours before continuing the exchange, with no social penalty or need for re-establishment of connection. The effects from these Transition-Relevant Mega Gaps are unknown as near as I can find. This asynchronous quality alters the pressures and patterns that govern traditional TRPs and may completely change the outcome of the interchange.

Power dynamics and interruption rights undergo striking transformations in AI conversation. While human interaction involves constant negotiation of speaking rights—with interruptions sometimes reflecting dominance, sometimes enthusiasm, sometimes urgency—AI systems deliver complete turns regardless of length time after time. Users cannot productively interrupt an AI mid-response or use backchannel signals to shape the developing turn. This creates a peculiar asymmetry where users, despite being the ostensible directors of the interaction, lack the real-time steering capability characteristic of human conversation.

As The Conversation reports, part of what makes AI chatbots feel so human is our innate tendency to follow conversational rules, even with non-human entities. We talk to our pets and believe they talk back. Humans are "wired to anthropomorphise" their conversation partners and, even in AI interactions, follow deeply ingrained conversational habits from childhood. The success of generative AI "depends in part on the human need to cooperate in conversation, and to be instinctively drawn to interaction."

These differences aren't technical curiosities—they transform how information flows between simulated conversational participants. Human TRPs allow for continuous micro-adjustments, with speakers modifying their messages based on moment-by-moment feedback. A furrowed brow might prompt elaboration; a nod of recognition might permit abbreviation. Artificial TRPs, lacking this feedback channel, create a more rigid communicative structure where turns are discrete rather than continuously negotiated. I’ve resorted to limiting the output to a certain number of characters. If I want the gist of a larger output, I place a 200 character limit on the prompt. The resulting conversation, while efficient in some respects, loses the organic adaptability that makes human interaction so responsive.

As AI systems become increasingly integrated into daily life—serving as medical assistants, educational mentors, therapeutic presences, and workplace collaborators—the distinctive nature of artificial TRPs takes on greater significance. Users must navigate a conversational environment that mimics human interaction while operating through different mechanisms—like riding a car as if it were a horse. This navigation requires developing a new set of communicative competencies specific to human-AI exchange.

The implications extend beyond convenience. In medical contexts, for example, physicians rely heavily on non-verbal cues to gauge patient understanding, emotional state, and potential concerns. An AI medical assistant operating without access to these signals might miss critical information that would be evident to a human practitioner. Similarly, teachers must help learners how mitigate the absence of the subtle cues—confused expressions, fidgeting, eye contact—that human teachers use to monitor student engagement and comprehension.

These challenges suggest that education in AI interaction should begin with a solid understanding of turn-taking mechanics. Just as children learn the subtle art of conversation through games and social feedback, users of AI systems must develop a new set of skills for effective communication with artificial interlocutors. This education need not be complex nor lengthy—simple awareness of how artificial TRPs differ from their human counterparts can significantly enhance interaction quality, I believe.

A basic curriculum might include understanding how to structure multi-part queries, recognizing the system's textual TRP signals, developing strategies for mid-response redirection, and setting appropriate expectations for response timing. These skills help users form accurate mental models of how AI processes and responds to their inputs, leading to more productive and satisfying exchanges.

Research from MIT Media Lab has found that different AI conversation modalities (text vs. voice) and content types (personal vs. non-personal) may have varying impacts on user loneliness and emotional dependence. A longitudinal controlled study with over 300,000 messages found that "higher daily usage—across all modalities and conversation types—correlated with higher loneliness, dependence, and problematic use, and lower socialization." Such dependencies may be mitigated by deeply understood models of what bots are actually doing. As we develop more sophisticated AI conversation partners, understanding how artificial TRPs shape these interactions becomes increasingly important.

The study of artificial TRPs also illuminates aspects of human interaction that operate below conscious awareness. By observing what changes when embodied cues disappear or when backchanneling becomes impossible, we gain insight into the intricate mechanics of ordinary conversation. The absence of these features in AI interaction helps reveal their essential role in human communication—like the negative space in a painting that defines the central figure. A by-product may be that we become better, more responsive conversationalists.

As we increasingly inhabit a world where human and artificial intelligence engage in regular dialogue, understanding the distinctive patterns of these exchanges becomes not just academically interesting but practically necessary. The TRP—that invisible moment of potential speaker change—serves as a perfect lens through which to examine the differences between human and artificial communication. In the gap between how we talk to each other and how we talk to machines lies a the territory of adaptation, compensation, and emerging communicative norms.

The remarkable human ability to know exactly when to speak next—to hit our conversational mark with split-second timing—developed through millennia of face-to-face interaction. As we extend conversation beyond its embodied origins into the disembodied realm of artificial intelligence, we embark on a new chapter in the evolution of human communication. The TRP, once merely the unnoticed infrastructure of ordinary talk, now emerges as a critical site for understanding how humans and AI can effectively communicate in our increasingly hybrid social world.

Stephen Fitzpatrick

Apr 8

The start of your essay made me think how much Zoom during the pandemic completely upset this phenomenon resulting in many awkward interruptions and attempts to hold discussions.

Expand full comment

1 reply by Terry underwood

Scott Tuffiash

Apr 8Edited

Used an AP article a few weeks back with students to discuss the nursing aspect you referenced...

https://apnews.com/article/artificial-intelligence-ai-nurses-hospitals-health-care-3e41c0a2768a3b4c5e002270cc2abe23?user_email=af6274f5f1b5353eb967e863e4bb45771f2585b238e1674a85b111a81f329efc&utm_medium=Morning_Wire&utm_source=Sailthru_AP&utm_campaign=MorningWire_Sun_March16_2025&utm_term=Morning%20Wire%20Subscribers&authuser=0

Even better question - can people derive wisdom from the echo of their own thoughts, or a vaguely sourced amalgam of other people's thoughts? worth discussing if that exchange and subsequent growth is wisdom, knowledge, conjecture...not sure of the word for it yet. This was fantastic, looking forward to using it for next semester's AI and Ethics course!

1 more comment...

Learning to Read, Reading to Learn

Discussion about this post