“There’s no way these things can legitimately pass these tests because they don’t reason that well,” Marcus says [Professor emeritus of psychology and neural science at New York University]. “So it’s an embarrassment not just for the people whose names were on the paper but for the whole hypey culture that just wants these systems to be smarter than they actually are.” (Tom Bartlett, writing ‘A Study Found That AI Could Ace MIT, Three Students Beg to Differ,’ Chronicle of Higher Education, July 7, 2023)
I’m not a professor in the Department of Artificial Intelligence at some top shelf university like Professor Marcus. I am not trained to do research on whether AI can qualify to, say, run for Governor in Texas or Florida or even Arkansas. I can barely manage the artificial drummers on my iPad. This is not to say I can’t comment on such studies of AI and academic learning, as a non-peer reviewer dabbling in bots who has a PhD in language and literacy. Caveat emptor.
I can say, as I’ve said before, in my opinion, so far from what I’ve seen, AI is as dumb as a post, a rock, a cloud, getting less and less humanly intelligent, more and more artificial, as I get more experience with it. I find it interesting and useful. It can be deployed to do tasks routinely handled by humans like classify information, regress dependent variables on independent variables, and cluster information using meaning-language particles distributed systematically in human linguistic communication. It can classify and regress much better and faster than humans, but its clustering is risky. I see non-equivalency between human academic learning and artificial algorithmic learning. ChatGPT consistently warns users to regard its output with healthy skepticism. Caveat emptor squared.
That’s all.
*
The language bot has flash but no magic, glitz but no imagination, access to the entire corpus of Wikipedia but no provenance of any theory or philosophy. It can’t generate an original thought. It is completely dependent on its training. It can cluster the original ideas of humans like nobody’s business, but double check the clusters. Trust not the subtleties. Assuming that artificial intelligence is human is like assuming that saccharine has calories.
Judging by published subjective accounts (I’m thinking of well-published writers who have looked at what bots have generated about their body of work and found bots wanting), the language bot can’t be relied on to adequately communicate nuanced original thought. Nuance isn’t its strong suit, I’ve found, also, and I take some comfort in it. It can’t imagine it generating a hypothesis without prompting.
It would have to go like this: “Ok, Bot, it’s 1910, create for me a hypothetical explanation for the theory of relativity.” Here’s the thing: Who would write the prompt? Without Einstein there could be no prompt. The bot feeds on human meaning. Without a human the bot isn’t even a fence post.
I admit it is a remarkable kind of dumb. Even if it could not pass all the midterms and finals and complete all the assignments as an undergraduate at MIT, I’ve never found a fence post that could answer a question like “Were Plato and Aristotle good friends after working together at the Academy in Athens for so many decades?”
You tell me: What is an “intellectual counterpart”? How did the bot happen to predict ‘counterpart’ to fill the blank following ‘intellectual’? I don’t get it. The bot filled the slot with a tantalizing token that appears to carry sense but crumbles when pressed. Still, we get a sense that Plato and Aristotle probably didn’t go for ambrosia after a hard day in the classroom. Plato seems to have held the upper hand over the centuries. We call it to this day the ‘Academy,’ not the ‘Lyceum.’
*
Tom Bartlett’s three students who exposed the MIT study isn’t a simple story about a study retracted before it was reviewed. It’s a story about a very bad study conducted using a fatally flawed methodology lacking a theoretical framework pre-released by a group of academics mingled from the Academy and from AI Industry before the paper was passed under the noses of peer reviewers, who would very likely have sniffed it as a lemon—not because it was plagiarized, though there is nuanced concern about misuse of proprietary information without permission.
Its flaw? Truth be told, its flaw replicates the flaw at the heart of the factory model: The authors confuse teaching and learning with testing. Unfortunately, I can’t view the study itself to say more because it’s gone the way of the dinosaurs:
Regardless of the sophistication and features of tests of knowledge and application teachers at MIT require, opportunities to learn well beyond declarative, procedural, and conditional knowledge are designed for students. When I worked with engineering faculty at Sacramento State on the accreditation projects, I learned that MIT is world class in its uses of group projects. Just for a lark, I checked my memory with the bot. Sure enough, meaning-language particles from the bot’s training converged with my recollection:
So what if a bot can pass the tests? What has it not learned beyond the reach of the Knight of the Right Answer?
*
One study of ChatGPT from December, 2022, reported that the bot got high enough scores on two current medical school admissions exams to qualify for a slot during the 2022 admissions cycle in Italy. The differences between this med school applicant study and the MIT undergraduate study are profound. The aspiring MD bot was an applicant, not a graduate like the artificial MIT student. Judging aptitude for learning is an entirely different matter than assessing the cumulative learning of a student over four years of education.
The methodology for the MIT study involved prompting the bot with assignments and examinations semester after semester, feeding the bot information it could use to deduce answers to questions on tests or to write papers to display content, and then grading the results relying on core AI algorithms, the very same algorithms operating in the probabilistic manner that casts suspicion on ascribing certainty to standardized test scores. Teachers are cautioned against making consequential decisions about students based on such scores. Did the bot participate in any group projects?
The methodology for the MD study involved an authentic cycle of admissions testing with the bot taking the very same tests as the humans, scored precisely as the humans were scored. The MD bot applicant did a far superior job on one admissions test—the one with 60 multiple choice questions. The other test was more complex, requiring persuasive writing, and the bot got considerably lower scores. In short, the MD applicant study compared the bot to real students under the same conditions to determine their aptitude for success in medical school. The bot took on the persona of applicant under controlled conditions, not the persona of undergraduate ready to receive a degree.
Regarding the MIT study, I saw the hype in the news feeds and tracked down some details. Sure enough, according to the hype, AI can do as well as an MIT undergraduate. I tracked down the med school applicant study for the same reason: It was hyped in the news feeds. The hype formula was the same: a) Shiny object [bots think like humans], b) Emotional reaction [wow wee kerzowie] or [damn, we’re doomed], and c) Stay tuned [the bot is a meme of progress] or [the bot is taking us down a dark road]. That’s what happened after the MIT pre-released study found eyes:
When you click the link bluelined “study” you find the following. Please note that Iddo Drori is not being indicted for plagiarism or fabricating data or intentional deception for personal gain. The issue isn’t corruption or deception. The methodology used was suspect from the start, and evidently the researcher and his band of associates just missed the fatal flaw, that is, using a quantitative research technique flowing from an impoverished theory of teaching and learning to answer a mixed-methods question:
Here’s the rising action of the narrative told to us by Bartlett, the Chronicle journalist, who positions the story as a “cautionary tale”:
“One of the students, Neil Deshmukh, was skeptical when he read about the paper. Could ChatGPT really navigate the curriculum at MIT — all those midterms and finals — and do so flawlessly? Deshmukh shared a link to the paper on a group chat with other MIT students interested in machine learning. Another student, Raunak Chowdhuri, read the paper and immediately noticed red flags. He suggested that he and Deshmukh write something together about their concerns.”
In the hands of a better writer of fiction than I, the seeds of an episode of NCIS or Law and Order might lie in this soil. It might be fun to feed this scenario to the bot, see what its clustering might produce, including the hallucinations. Here comes the turning point, the account number in the Cayman Islands, the video of the golden showers, the moment lives begin to slip in the balance:
“For starters, it didn’t seem as if some of the questions could be solved given the information the authors had fed to ChatGPT. There simply wasn’t enough context to answer them. Other ‘questions’ weren’t questions at all, but rather assignments: How could ChatGPT complete those assignments and by what criteria were they being graded? ‘There is either leakage of the solutions into the prompts at some stage,’ the students wrote, ‘or the questions are not being graded correctly.’”
Part-time vassals of the Knight of the Right Answer also steeped in the famous Group Project approach came up with the same essential critique. Undergraduate accomplishments are not well represented in low-level methods of testing, though those methods are useful and important. AI was getting the right answers on the test questions somehow without learning a thing. It wasn’t being taught, it wasn’t being mentored, it wasn’t participating with peers, it was being fed, prompted, trained to answer questions, not taught to learn. The study of AI was, ironically, a factory model study.
*
Aristotle was a student of Plato at the Academy before he became a teacher with his own philosophy. Growing separate from his mentor but not apart, they disagreed fundamentally on proper understandings of the human condition. Plato was into universals, ideal forms, the Platonic bed that exists in the ether in which no one ever sleeps revealed imperfectly as a Queen-sized mattress on a box spring or whatever. Aristotle was into logic and physical reality, empiricism, a student of the world who was more interested in the bed he slept in than its ideal form. We all make the bed we must lie in.
Dylan says it right once again: There’s something happening here, but you don’t know what it is, do you, Mr. Jones? Maybe the bot will soon be capable of explaining itself to us.