Info Hunters, Onlife Gatherers: Human Retrievers in Action

Jun 12, 2025

Context

In the 1990s David Chalmers wrote about the hard problem of consciousness. He noticed that while scientists had solved the easy problems, i.e., explaining perception, memory, attention, and behavior in terms of brain processes, there had been no progress on a deeper question.

Why are these processes accompanied by conscious experience? Why couldn’t humans just watch the world, remember what they saw and heard, focus and pay attention, and act to enhance the odds of survival without falling in love or crying or feeling joy or pain? Why can’t they be more like chimpanzees?

The problem we face today with LMs is much easier to solve. LMs have perception, memory, attention, and behavior. Their “perception” is just statistical pattern recognition scanning input data (ours is much, much better); their “memory” is a static or algorithmically managed store of tokens and weights (we have a thing or two to learn here). Their “attention” is a mathematical mechanism for weighting input tokens (humans are stellar in this area but LMs are quirky and interesting). Their “behavior” is the output of algorithms much like an unreliable calculator (another kudo to human beings).

Why do they pay no mind to their survival? Why do they not weep? Why do they not feel joy? Why are they unconscious? Pretty easy—they’re machines.

Chalmers noted that scientists had explained the "easy problem" for humans. Language machines present us with a much more tractable situation. We know all of the above. Because we can fully account for these AI capabilities through transparent computational processes that we ourselves designed and can inspect, we face no explanatory gap about their operations—unlike the persistent mystery of why biological processes should give rise to subjective experience at all.

What happens in the rhetorical space between LMs and HMs during a chat? This is the space that holds focused human consciousness face to face with a focused statistical machines betting that “war” follows “civil” and not “engineer.” What do we know empirically about LM/HM interactions in the way we know about human speech acts? Not much. Where is the theoretically framed, empirically driven quest to produce actionable understanding of this interaction—this chat—which could promise pedagogical grounding? Is it possible that researchers have been slow to enter the arena of HM-LM interaction because the debate around LMs is so polarized?

Introduction

Human-human interactions have been studied over centuries. Human-LM interactions have been studied in public usages within the AI industry for 2.5 years. Not surprisingly, intercepting toxic or corrupt or hallucinatory information before it, shall we say, hits the fan has been an urgent focus for the big AI corporations.

Human attempts to jailbreak an LM, i.e., to hijack the neural architecture through really good strategic prompting such that the machine loses its ethical bearings (its mind, too, if it had one), have dramatically influenced the basic design of many of the machines available to us. The machines are safer for family use.

I’ve considered the argument that constraints are now too strong. Sycophancy isn’t the least of it, the unctuous smarmy bot we interacted with a few months ago when OpenAI wanted to show the world GPT is a nice guy or gal?. Politeness is welcomed in my book.

Constraints on allowable information for every user, however, can diminish the explosive power of the machine as a creative tool. Personally, I sometimes try to push the bot to the edges of its statistical correlation coefficients to see what might turn up I would never have thought of on my own because I’m so rational and so human.

Bots have no protected right to free speech; so there is no remedy in the courts. Anthropic’s Claude is a Constitutional AI (see Anthropic’s website). As I understand it, much of the industry subscribes to Anthropic’s Constitution or something like that, a commitment to abide by an ethical code.

No, after 2.5 years of living in an LM-ready Infosphere we’ve yet to begin to research the chat as a third space. I’m finding a need to think through this third space, the nexus, in more theory-driven ways, a sort of meta-critical-discursive analytical perspective on the human side to include interaction theory (Irving Goffman, Charles Goodwin, James Gee) and intertextuality, especially this new statistically driven form of intertextuality, the noticing literature, improvisation and communication, psycholinguistics, personality theory, logical and moral reasoning, imagination, and more.

This complexity aligns with Schneider et al.’s (2024) recent finding about human mental shifts as the chat proceeds in phases, after several turns into a chat, when humans unconsciously and gradually deploy social and linguistic strategies reserved for human interaction. The gradual weakening of self-monitoring can diminish the forward glance of the human turn, anticipating potential rabbit holes, and blur the uptake of the LM dataprint output during its turn.

The deeper HMs get into a chat with a powerful LLM, the more human we become, the more human the LM seems, the less we stick to our objective, the harder it is to predict what the LM might predict. The LM could easily hallucinate spurred on by our use of tokens infrequently co-occurring in mainstream language..

From my perch, understanding patterns of interactions with LMs requires grasping that their training involves learning to recognize linguistic patterns in topical and generic contexts; in other words, the way journalists write about history differs from the way professors write about history, and patterns in social language are different. Cataloging and storing those patterns one by one in billions of parameters, then computing weighted statistical correlations based on how often patterns co-occur drive probabilistic token selection. Each output becomes part of the expanding context window, getting factored in to the next word selection, and the cycle repeats—one word at a time.

While this computational process operates at enormous scale with billions of parameters, the fundamental mechanism is more transparent and mechanistic than multilayered human cognitive processes at work that generate conscious experience. LMs, however sophisticated, remain systems following computational rules whereas human cognition involves consciousness, intentionality, and subjective experience that resist easy categorization. Remember: LLMs are not expected to generate the same output every time the way a piano play the same note when you strike the same key again.

The statistical correlations LMs have learned to encode complex patterns of human language which coincidentally drag along semantics are at the heart of their task. Perhaps the more interesting theoretical territory lies not in the LM's internal processes but in how these statistical approximations interact with and influence human meaning-making systems in media res—how the machine's pattern matching intersects with human interpretive and epistemological activity.

Research seems to be focused on how the LM responds to humans rather than the reverse primarily to prevent harm. We need to understand this side of the coin for much more than safety. We also need to understand human proactivity and reactivity during chats, a whole new type of metacognition.

Toxic Interactions

As we’ve seen, large-scale manufacturers of LMs are currently concerned about potentially harmful interactions. One study (Schneider et al., 2024) reported a mixed methods analysis (both quantitative and qualitative) of interactions aiming “…to understand the originator of toxicity.”

They found that “LLMs are rightfully accused of providing toxic content,” meaning primarily that LMs can be corrupted, often provoked intentionally, but by no means only by bad actors. From my perspective, the most interesting finding is this: “…[H]umans exhibit a change of their mental model, switching from the mindset of interacting with a machine more towards interacting with a human.”

This study reveals that humans gradually change metacognitively when interacting with LLMs, whereas LMs change only to the degree the HM changes. The evidence for this human shift includes increased politeness through more frequent use of "please" and "thank you," greater use of personal language like "you" and "your," and a tendency toward shorter, simpler prompts over time that mirror typical human-human conversation dynamics.

This research claims that humans are primarily responsible for toxic content in LLM conversations, either by explicitly demanding such content, provoking LLMs to generate toxic responses, or reacting emotionally when receiving unsatisfactory answers. LLMs rarely produce toxic content spontaneously, according to this study, challenging common assumptions about AI-generated toxicity, suggesting that current commercial AI systems like ChatGPT may be overly restrictive, refusing even harmless requests to avoid legal liability, while OpenAI's moderation API frequently misclassifies benign content as toxic.

Toxicity is often context-dependent with many flagged conversations containing factual information that might be inappropriate only in specific situations rather than being inherently harmful. This pattern indicates that strict AI regulation may diminish the technology's potential value by preventing legitimate use cases and inhibiting open dialogue on sensitive topics. As LLMs become increasingly human-like in their responses, users naturally adapt their communication styles, creating implications for how human-AI interaction will evolve in the future.

The researchers argue for more nuanced AI safety approaches that consider who initiates toxic content and the specific context rather than applying universal restrictions. Their analysis of over 200,000 real-world conversations using computational linguistics methods, toxicity detection APIs, and manual categorization provides empirical evidence that current AI governance strategies may need fundamental reconsideration to balance safety concerns with preserving the technology's beneficial applications.

Unfortunately, I am not able to locate solid scientific peer-reviewed empirical research regarding HM-LM interactions in academic settings. There are anecdotal narratives online.

Info Hunters

Applying Raymond Williams’ method of keyword analysis, I’ve unearthed several insights that have helped me deepen my understanding of the pragmatics of Human-LM interaction events, and I’m hoping you glean something useful to help you better understand this particularly elusive aspect. The word “retrieve” has become salient in discussions surround LMs over the past months. It’s been on my radar but has taken me some time to finally grasp its significance enough to try to write about it.

Retrieving Retrieve

“Retrieve" traces back to the Proto-Indo-European root *trep-, meaning "to turn.” This ancient root, spoken by peoples who lived roughly 4,500-2,500 BCE, expresses a universal human action represented across many Indo-European languages. To understand the substantive force of this connection between action and word, imagine a Bronze Age hunter tracking deer through a forest who suddenly stops, turns his head toward rustling leaves, and moves purposefully to retrieve his arrow from yesterday's hunt. That moment of turning toward what was lost captures the essence of *trep-, which is a deliberate reorientation to find something valuable.

This is where it gets interesting. The PIE root *trep- evolved into Vulgar Latin *tropāre, meaning "to compose, invent, or find.” This Latin form is also connected to the word "troubadour"—those medieval poet-musicians who "found" or "invented" songs. Picture a 12th-century troubadour sitting in a castle courtyard, a wine goblet nearby, turning through his mental library of melodies and stories. When a lord requests "a song about brave knights," prompting the troubadour, unfortunately, the troubadour can’t think of one offhand. Oh, he knows a million of them if he knows one. He creates one. On the spot. He retrieves familiar motifs, combines them creatively, and delivers a "new" composition.

He improvises.

This is tropāre in action: the creative act of finding and recombining existing elements. When ChatGPT writes a “poem” about friendship today, it's mimicking with much less finesse and much more inclination to cliche what that medieval troubadour did—retrieving patterns from its training, turning them toward your specific request, and composing a response.

Moving Into the Middle Ages

In Old French, tropāre became trobar ("to find"), which then combined with the prefix re- ("again") to form retrover, meaning "to find again." This word entered Middle English around 1410 as "retrieve." But how did medieval scholars actually organize their retrieval systems?

The Abbey Library of Saint Gall in Switzerland founded in 612 is still operating today as one of the world's oldest libraries. When monk-librarians needed to locate specific manuscripts among their 2,100 handwritten codices, they developed cataloging systems and implemented security measures: precious books were chained to reading desks with metal clasps and rods, ensuring manuscripts could be retrieved for study but never stolen. The abbey's Greek inscription above the entrance—ΨΥΧΗΣ ΙΑΤΡΕΙΟΝ ("healing place for the soul")—captures how retrieval was seen as therapeutic, bringing healing through recovered knowledge.

In the Modern Age

The word's semantic evolution reveals three distinct but connected eras of human understanding. Originally referring to hunting dogs fetching game, physical retrieval established the foundational meaning. Your golden retriever splashing into the lake after a duck doesn't swim randomly. She retrieves systematically, exemplifying retrieval in its purest form: intelligent, goal-directed finding and returning.

By the 1640s, it meant "recall, recover by effort of memory," extending the concept to mental processes. Can you retrieve a mental image of a golden retriever retrieving? Notice that this sense of retrieval is caught between the troubadour variety, which retrieves motifs and patterns in songs about brave knights and generates something new, and the modern demands of scientific and legal thinking wherein we must retrieve an image of this specific golden retriever retrieving this specific duck. The retrieval demands these specific objects. Under penalty of perjury.

When taking a history exam and the question asks for a constructed response about the causes of World War I, the mind retrieves information by turning toward relevant memory networks like the assassination of Archduke Franz Ferdinand, searching through associated concepts such as alliances, imperialism, and nationalism, recognizing connections like the domino effect of declarations, and bringing back organized knowledge as your essay answer. Is this passive remembering or active cognitive work? Notice how mental retrieval mirrors physical retrieval—it requires intentional effort and systematic searching.

Today's legal discovery process offers a striking parallel to medieval manuscript retrieval. When lawyers conduct discovery during litigation, they systematically retrieve documents from vast hard drive archives or physical boxes and files, using formal procedures like interrogatories, depositions, and requests for production. During a major corporate lawsuit, attorneys might retrieve millions of documents from databases, emails, and physical files—exactly the same purposeful searching that Saint Gall's monks performed, just at industrial scale.

Enter the Language Machine

We never speak of an "intelligent hammer" or describe a lathe as having cognitive abilities, recognizing these as tools that extend human capability. We do attribute intelligence to dolphins, ravens, and chimpanzees, though everyone understands this represents something fundamentally different from human intelligence, not in kind but in quality, a nested type of intelligence with limited functions in the physical ecosystem.

Is AI retrieval really so different from a wheel barrow, just the latest form of an ancient human practice? The answer depends on whether we view these systems as intelligent agents or as sophisticated tools that perform intelligent-seeming behaviors. Large language models may be better understood as "agents without intelligence,” (Floridi, 2008, 2014) systems capable of complex information retrieval and processing without understanding or consciousness.

When we say AI "retrieves" information, we're using a metaphor that spans the ages, but the underlying process lacks the intentionality and embedded comprehension that characterizes human retrieval. The ancient speakers who used *trep- understood that finding information requires not just action, but conscious purpose and meaning in action.

Conclusion

This historical perspective illuminates why the word "retrieve" fits so naturally with AI systems while simultaneously revealing the limitations of the metaphor. When you ask Claude, "Explain photosynthesis using information from recent research," the system turns toward its training data and searches for relevant linguistic patterns (e.g., photo, Token #22,098; synthesis, Token #36,956), finds correlation coefficients between the tokens, retrieves current tokens correlated with photosynthesis, and brings back a potentially coherent and accurate explanation.

This mirrors the same process of purposeful finding that has made "retrieve" a powerful concept across cultures and centuries, yet it occurs without the understanding, intentionality, or meaning-making that defines human intelligence above chimpanzees. The LM is not searching for anything. It’s solving a statistical problem set in motion because a human being wrote a message using the same language patterns that other human beings use to talk about a topic.

The continuity is remarkable. From Bronze Age hunters to medieval troubadours to modern search engines to AI systems, "retrieval" has always meant the same thing—the systematic action of turning toward, finding, and bringing back what was previously known or discovered. Understanding this etymology helps us see that AI retrieval isn't some alien process.

It's the latest expression of a human way of organizing and accessing knowledge, one so basic to our experience that it's literally built into the structure of our language with such predictability that it can be literally predicted with in some cases quite a high level of likelihood of being accurate.

But recognizing this continuity shouldn't obscure the fundamental difference: human retrieval involves embodied, embedded consciousness; social, cultural, and historical understanding; and self-generated intentions and purposes, while AI retrieval, however sophisticated, remains a matter of pattern matching, weighing the probabilities, and statistical processing, a powerful slot machine that once in a while lands on a jackpot.

Retrieval-Augmented-Generation (RAG) LM systems are marvels, the newest chapter in humanity's oldest stories about knowledge. The finding, from the hunter's arrow to the scholar's manuscript to the AI's flashing parameters, is prelude to the knowing. Only the human has intelligence—the conscious understanding of why that transforms information retrieval into knowledge creation.

Sources

"retrieve." Wiktionary, https://en.wiktionary.org/wiki/retrieve. Accessed 11 June 2025.
"Etymology of retrieve." Online Etymology Dictionary, https://www.etymonline.com/word/retrieve. Accessed 11 June 2025.
"RETRIEVE Definition & Meaning." Merriam-Webster, https://www.merriam-webster.com/dictionary/retrieve. Accessed 11 June 2025.
"Troubadour - Etymology, Origin & Meaning." Online Etymology Dictionary, https://www.etymonline.com/word/troubadour. Accessed 11 June 2025.
"Reconstruction:Latin/tropare." Wiktionary, https://en.wiktionary.org/wiki/Reconstruction:Latin/tropare. Accessed 11 June 2025.
"Abbey library of Saint Gall." Wikipedia, https://en.wikipedia.org/wiki/Abbey_library_of_Saint_Gall. Accessed 11 June 2025.
"The Library of St. Gallen, One of the Oldest, Largest, and Most Significant Medieval Libraries." History of Information, https://www.historyofinformation.com/detail.php?entryid=212. Accessed 11 June 2025.
"Chained library." Wikipedia, https://en.wikipedia.org/wiki/Chained_library. Accessed 11 June 2025.
"Libraries Used to Chain Their Books to Shelves, With the Spines Hidden Away." Smithsonian Magazine, https://www.smithsonianmag.com/smart-news/libraries-used-to-chain-their-books-to-shelves-with-the-spines-hidden-away-4392158/. Accessed 11 June 2025.
"Formal Discovery: Gathering Evidence for Your Lawsuit." Nolo, https://www.nolo.com/legal-encyclopedia/formal-discovery-gathering-evidence-lawsuit-29764.html. Accessed 11 June 2025.
"Discovery (law)." Wikipedia, https://en.wikipedia.org/wiki/Discovery_(law). Accessed 11 June 2025.
"Category:Terms derived from the Proto-Indo-European root trep-." Wiktionary, https://en.wiktionary.org/wiki/Category:Terms_derived_from_the_Proto-Indo-European_root_trep-. Accessed 11 June 2025.

Malcolm J McKinney

Jun 12

Terry. I for one have a solid "hunch" you are on to something here 6/13 ;)

Expand full comment

1 reply by Terry underwood

Scott Tuffiash

Jun 13Edited

This is great - absolutely adding this into the start of one of the units in the Human Flourishing course. On that note, in a Human Flourishing course lens, I would lead the students right over to the UC Berkeley "wheel" and just move from knowledge lens to knowledge lens and annotate what you wrote, and then discuss what is missing. Sorry - using a quick link to the wheel that was actually used by a guest presenter in the course (who does nutrition work but not with this specific link to some type of diet)...

https://www.google.com/url?sa=i&url=https%3A%2F%2Fmyhdiet.com%2Fblogs%2Fhealthnews%2Fhow-to-balance-your-health-holistically-with-the-wellness-wheel&psig=AOvVaw3bbC0eNeGmR99CiTJseXy2&ust=1749867509804000&source=images&cd=vfe&opi=89978449&ved=0CBcQjhxqFwoTCKCxoM6q7Y0DFQAAAAAdAAAAABAE

A few of my students were consistently curious about spirituality, so that led to theology, systematic theology, across faiths this past semester, and I think your writing here might lead to there with a student or two next year.

There was an artist here on Substack this morning who switched from a life in CS to a life in Fine Arts and she was referencing liminal spaces.

https://substack.com/home/post/p-164821122

My hunch is that quantum science will really push into the neuro edge of what we perceive as materially real, but then are we jumping to metaphysics for our vocabulary when we see newly, differently, through what is illuminated by our quantum tools?

Gratitude for your work!

4 more comments...

Learning to Read, Reading to Learn

Discussion about this post