Pragmatics: The Dustbin of Linguistics

Mar 17, 2024

The term "art of letters" from the romanized form of the Ancient Greek "grammatikḗ technḗ" had a holistic semantic sweep including not just syntax, but also reading and writing. Technical aspects of linguistics—what we now call “grammar”—were part and parcel of literature, rhetoric, and interpretation. Teaching and learning the art of letters meant gaining a deep understanding in and proficiency with language that would enable individuals to express ideas clearly, analyze texts critically, and engage in informed discussions, i.e., become college or career ready.

The Grammar of Dionysios Thrax, regarded as the first fulsome attempt at specifying the art of letters was first translated into English in 1874 though its ideas were embedded in technical and pedagogical notions throughout the medieval period into the modern world. Phonics as a method was not invented by the Science of Reading (SoR ©️), which marks its origin sometime in the 1980s with the publication of the Simple View. From the 1874 translation of the Dionysios’s Grammar:

Dionysios Thrax provided guidance on oral reading as well. During the period of classical rhetoric, scholars studied the effects of voicings, tones, pauses, etc., with a view toward contributions of orality to the semantics of written text. What we in the field of reading think of as oral fluency in terms of accuracy and reading rate, Dionysios thought of as expressive and essential to meaning:

Over time, as education systems evolved and specialized fields of study emerged, the term "grammar" came to focus on rules and structures of words in sentences. Vowels and consonants, syllables, and to a certain degree prosody were separated from Dionysios’s robust sense of grammar which was stripped of semantics as well. Indeed, the reading wars beginning in the 20th century produced unnatural concepts like “nonsense words” which literally do not exist in living language.

The ancient tradition contributed grammatical terms still significant today. As you consider this ancient classification system for words in isolation, keep in mind that instruction in the art of letters would have emphasized the interconnections among vowels, consonants, voicing, nouns, verbs, reading, and writing. It would have been inconceivable to think about a student of the art of letters learning these things piecemeal in isolation. The parts of speech were identified and explained, but as part of a whole fabric of language. Sound familiar? From Dionysios:

It’s clear that the original focus of language study located technical aspects of reading and writing in what is today an esoteric sub discipline of linguistics we call “pragmatics,” a portion of the field that looks at language-in-use, at situated meaning, as distinguished from language as an object, a system of rules and regulations, a toolkit that provides us with a method of transferring clear and complete messages from one mind to another.

In the United States today almost every state has passed legislation mandating that teaching the art of letters in the early grades must begin with vowels and consonants uncontaminated by meaning and situation. Indeed, the highest standard of achievement in the beginning is mastery of nonsense words, instilling a technical mindset that reading means pronouncing words, nothing less, nothing more.

*****

Readability formulas are fairly recent actors on the world stage, debuting in 1923, according to Pearson and Hiebert (2013). Although we today are more familiar with ubiquitous sentence-length/word-frequency formulas, the first published example of a formula relied on word-level data to predict text difficulty. The topic lay dormant until 1935 when Gray and Leary published a book titled What Makes a Book Readable reporting on a massive study of readability motivated by the fact that, though most people could sound out words, at least 50% of the adult population refused to read books. Interestingly, the motive to figure out how to predict the difficulty of a text was not an effort to make a productive reader-text match in classrooms, but to change the non-reading adult into a reader.

The researchers hypothesized that the problem for adult non-readers in the 1930s derived not from a lack of interest, but a lack of ability to read sophisticated texts written for the small portion of the sophisticated adult population who actually did read. If they could discover what made a text readable for the masses who read inexpertly at best, writers could ‘dumb down the difficulty.’ From the Preface of Gray’s book:

In the context of the New Deal and the ongoing economic crisis in the United States, reading became a potential social solution to the multilayered problems of unemployed adults in need of information related to doing a job. The impetus for understanding the “art of letters” in Ancient Greece shifted from the lofty goal of enabling greater participation in the life of the community among the citizens to the economic goal of providing complex information in a form accessible to the undereducated and unemployed:

Gray and Leary (1935) took pains to emphasize the “small but important” area of readability as a practical matter to help solve an immediate social problem. They remained “equally concerned” with a broad spectrum of problems beyond a narrow focus on “ease” or “difficulty” of words and sentences. However, the exigencies of the times—as exigencies of the times seem to always do—prompted the researchers to avoid relying on subjective matters of interest, taste, and the like. Thus, they decided to focus on structure:

Notably, Gray and Leary’s call for future research into “objective procedures” brought to bear on issues of readability by which they meant quantitative methods set the tone and timbre of readability research for most of the century. Pearson and Hiebert (20131) told a story of mixed success for quantitative readability formulas based on objective procedures and concluded that subjective qualitative procedures for making appropriate reader-text matchups offer us today the most promising path for supporting educators working to get children and books matched to provide access to interesting, informative, and imaginative texts.

*****

Readers of ltRRtl likely have familiarity with readability formulas based on two objective (read: easily countable) criteria: 1) sentence length and 2) word length. The longer the sentence, so goes the reasoning, the greater the demands placed on the reader. The longer the word, the more difficult the word is to decode for lexical access. As near as I can tell, the assumption is that syntactic complexity and phonological/morphological sophistication are the two most robust factors responsible for exporting reality as perceived by the writer by way of text into the mind of the reader.

Pearson and Hiebert (2013) reviewed previous research calling into question the appropriateness of one of those factors. Here’s the problem. Sometimes rewriting long sentences to make them shorter decreases the readability score but makes the text more difficult. From their study:

Pearson and Hiebert quoted these linguists, I believe, to highlight that rewriting longer sentences is not an objective task unless meaning is irrelevant. If I were to rewrite this text before you today to reduce the average sentence length, I wouldn’t have to rewrite every long sentence. I could cherry pick those sentences that are easiest to rewrite, say, or that I determine idiosyncratically to pose greater challenges. The outcome, however, is sometimes worse than the original. In effect, making a particular sentence shorter may actually make the text more difficult: “Davison and Kantor concluded that ‘… the most successful changes in the text oten run directly counter to what readability formulas would suggest, and that the most unsuccessful changes are those motivated by the strictures of the readability formulas.’” (p. 191)

*****

Having bolstered my intuitive understanding of the ways in which AI visual generators use language to produce images, I prompted OpenAI to create an image based on the original Davison and Kantor sentence, the long one. Notice that this sentence begins with a conditional that gets repeated in the adapted sentence; so the opening of the sentence is not responsible for differences in bot readings. The rewritten utterance leaves all of the original intact except for three elements: 1) the preposition “by” serving a conjunctive function is removed and replaced by a period; 2) the pronoun “it” is inserted into the rewritten clause; and 3) the original nominalized form of the verb “to grow” is replaced with the main verb form preceded by the modal “will” in the second, shortened sentence. Here is the image created by the bot in response to the original, longer single sentence:

Original Utterance, OpenAI, Prompted to create a visual image of the literal meaning of the sentence, March 11, 2024

In this image, the part of the original sentence interpreted by the bot as salient comes at the end, that is, “by growing new bark over the burned part.” If we look at this image through a narrative lens, a fire scorched the tree some time ago. Enough time has passed that we can see new bark having healed the wound. We infer that the conditional (“If given a chance before another fire comes”) has been satisfied. A retelling of the narrative might go like this. Once upon a time a tree lived through a forest fire. The fire died down, but the tree had been wounded. The wheel of fortune turned over a considerable time span without a conflagration, and the tree grew new bark to heal the wound. With luck this tree may survive another fire in the future.

What a difference a period, a pronoun, and a modal auxiliary makes in the adapted version. Keep in mind the adapted version is intended to simplify the text. Here’s the bot’s image created when prompted to depict the meaning of the simplified text:

Instead of finding “by growing new bark” salient, this time the bot keyed on the first main clause and its conditional. Clearly, the bot was “thinking” about bark because the tree has bark (bark is named in the second short sentence). But front and center is the fire, the scorched tree, its leaves destroyed. The emphasis no longer took into account the long time period after the fire while the tree was spared further fire damage and grew new bark. Instead, the period of fire when the tree faced possible death is depicted. Taking up the central point from the writer’s perspective, that trees can heal, the bot ignores healing in the second image and focuses on the destruction.

*****

What these images reveal is subtle but powerful. Using sentence length as a brute signifier of complexity ignores the fact that added words within a sentence—making a sentence longer—can indeed increase its readability if by readability we mean the degree to which a text and reader converge in an understanding, that is to say, in an alignment of illocutionary and perlocutionary force. What the writer intends to say matches what the reader reconstructs from the text—that’s readability.

From a purely syntactic perspective, single syllable verbs which convey the writer’s attitude or perspective on the topic or idea surrounding the content are treated as though they were just another word. In the tree sentences discussed above, the modal “will” is treated by the readability experts with little sensitivity to modality. “The tree will heal its own wounds” indicates the future; “it will grow new bark” indicates a second future with an uncertain connection to the tree in clause 1. The perspective of the writer, who explicitly relates the growing of new bark to the healing of the wound by way of the word “by,” is damaged—the two clauses express separate meanings additive but not inherently related.

The writer of the original sentence expressed meaning in the epistemic domain, that is, the writer knew something about a relationship between trees and fires and took up the perspective of a knower, an agent expressing knowledge about what can happen in the future of a tree. In the rewritten version, the writer expressed meaning in the socio-physical domain—a reporter stating two separate facts which the reader may or may not link together. The writer doesn’t intend to assert that the tree will grow new bark, but that they know the tree has the capacity to do so. Yong-Beom (2017), writing about modality as a feature of English from the perspective of teaching English as a foreign language, helps make this point clear:

Kim, Yong-Beom. 2017. Modal Categories and Dynamic Modality in English. Korean Journal of English Language and Linguistics 17-4, 701-72.

In the sentence “John must be home,” readability is all but impossible. Why? Because we can’t know the writer’s perspective. We need more than language. If we understood that the writer’s perspective is within the socio-physical domain, John is obligated to be at home: John is the active subject or the agent; all eyes are on John. Here is the bot’s vision of John, the bot who is provided with the context that John knows he must be home because he has been grounded. I explained to the bot that John had been grounded and then commanded it to create a visual for the sentence “John must be home”:

In the epistemic domain, however, the writer expresses an inner perspective on a possible state of affairs and their understanding of what is likely the case based on evidence of some sort. To read the sentence with comprehension, one must understand how the writer intends the word “must” with little help from the text. Meaning is inherent neither in the syntax nor the vocabulary, but in the context. Here’s the bot’s comprehension of the sentence in this alternative modality. I explained to the bot that no one seems to know where John is and asked it to create a visual for the sentence “John must be home”:

*****

William Gray and his colleagues turned to syntax and vocabulary as the most likely candidates to provide an objective, quantifiable method to predict a reader-text match with regard to textual difficulty. It seems clear from their 1935 book that the intent was expediency, not theoretical grounding. Texts usually consist of words in sentences, sentences can be simple or complex, words can be short or long, the country suffered from the Great Depression, half the adult population would not or could not read complex material. The solution seemed clear: Design a method which could be used to write informational books in a simplified manner.

Pragmatics, the part of linguistics dealing with how language is used in social situations, wasn’t fleshed out enough to be of much use in 1935. Studies of language as a fossilized object under a microscope was mainstream. Even today, pragmatics is contentious at the level of theory. Because pragmatics involves analysis of writers’ intentions and readers’ mindsets, it is inherently subjective, hidden behind language though embedded in fluid modality and received almost unconsciously. Challenge levels of texts are captured but crudely by way of analysis of syntax and vocabulary without concern for modality—not to mention reader expertise in terms of knowledge, motivation, metacognition, and the like.

I recall reading the phrase “pragmatics: the dustbin of linguistics” in my travels, but I can’t recall the citation. If I find it, I’ll add it to this post. It contains an unfortunate truth, in my view. When scholars study language and find things that don’t quite fit the model, those bits are tossed in the dustbin of pragmatics. Pearson and Hiebert’s (2013) conclusion that the time has come to set aside readability formulas strikes me as a call to empty out the dustbin and look around in it to find parts that might provide a structured subjectivity. Even simple sentences can be quite complex.

Learning to Read, Reading to Learn

Pragmatics: The Dustbin of Linguistics