Guest Opinion by Kip Hansen — 27 August 2024 — 1200 words
Last week I wrote an article here titled: “Illogically Facts —“Fact-Checking” by Innuendo”. One of my complaints about the fake fact-check performed by three staff at Logically Facts was that it read suspiciously like it had been written by an AI-chat-bot, a suspicion bolstered by the claim-to-fame of Logically Facts is that it is an AI-based effort.
I made the following statements:
”Logically Facts is a Large Language Model-type AI, supplemented by writers and editors meant to clean-up the mess returned by this chat-bot type AI. Thus, it is entirely incapable to making any value judgements between repeated slander, enforced consensus views, the prevailing biases of scientific fields and actual facts. Further, any LLM-based AI is incapable of Critical Thinking and drawing logical conclusions.”
“Logically Facts and the rest of the Logically empire, Logically.ai, suffer from all of the major flaws in current versions of various types of AIs, including hallucination, break-down and the AI-version of “you are what you eat”.
The article is very well written and exposes one of the many major flaws of modern AI-Large Language Models (AI LLMs). AI LLMs are used to produce both text response to chat-bot type questions, internet “searches”, and to build on-request images.
It has long been known that LLMs can and do “hallucinate”. The Wiki gives examples here. IBM gives a very good description of this problem – which you should read right now – at least the first half-dozen paragraphs have a moderate understanding how these examples could occur:
“Some notable examples of AI hallucination include:
- Google’s Bard chatbot incorrectly claiming that the James Webb Space Telescope had captured the world’s first images of a planet outside our solar system. [NB: I was unable to verify this claim – kh]
- Microsoft’s chat AI, Sydney, admitting to falling in love with users and spying on Bing employees
- Meta pulling its Galactica LLM demo in 2022, after it provided users inaccurate information, sometimes rooted in prejudice.”
So, the fact that AI LLMs can and do return not only incorrect, non-factual information, but entirely “made up” information, images, and even citations to non-existent journal articles, should shatter any illusion you might have as to the appropriate uses of chat-bot and AI search engine responses, even to fairly simple inquiries.
Now we add another layer of actuality, another layer of reality, to the lens through which you should view AI LLM based responses to questions you might pose to it. Remember, AI LLM are currently being used to write thousands of “news articles” (like the suspect Logically Facts “analysis” of climate denial), journal papers, editorials, scripts for TV and radio news.
AI LLMs: They are What They Eat
This latest article in the New York Times [repeating the link] does a good job of describing and warning us of the dangers of LLMs being trained n their own output.
What is LLM training?
“As they (AI companies) trawl the web for new data to train their next models on — an increasingly challenging task — they’re likely to ingest some of their own A.I.-generated content, creating an unintentional feedback loop in which what was once the output from one A.I. becomes the input for another.”
The Times presents a marvelous example of what happens when an AI-LLM is trained on its own output, in this case, hand-written digits it should be able to read and reproduce:
One can see that even in the first training on self-generated data, the LLM returns incorrect data – the wrong digits: The upper-left 7 becomes a 4, the 3 below that becomes an 8, etc. As that incorrect data is used to train the LLM further, after 20 iterations of re-training, the data (digits returned) is entirely undependable. After 30 iterations, all of the digits have become homogenized, basically representing nothing at all, no discernible digits, all the same.
The Times article, which was written by Aatish Bhatia, quite cleverly quips this as “Degenerative A.I.”
Think of the implications of this training when it has already become impossible for humans to easily distinguish between AI generated output and human written output. In AI training, it is only words (and pixels in case of images) that are included in the probability determination that results in the output – the AI answering for itself: “What is the most likely word to use next?”.
You really must see the examples of “Distribution of A.I.-generated data“ used in the Times article. As an AI is trained on its own previous output (“Eats itself” – kh) the probability distributions become narrower and narrower and the data less diverse.
I wrote previously that “The problem is immediately apparent: in any sort of controversy, the most “official” and widespread view wins and is declared “true” and contrary views are declared “misinformation” or “disinformation”. Individuals representing the minority view are labelled “deniers” (of whatever) and all slander and libel against them is rated “true” by default.“
With today’s media outlets all being generally biased in the same direction, towards the left, liberalism, progressivism and in favor of a single party or viewpoint (slightly different in each nation), AI LLMs become trained and thus biased to that viewpoint – the major media outlets being pre-judged as “dependable sources of information”. By the same measure, sources with opinions, viewpoints or facts contrary to the prevailing bias are pre-judged to be “undependable sources of information, mis- or disinformation”.
AI LLMs are thus trained on stories mass-produced by AI LLMs, slightly modified by human authors to read less machine-generated, and subsequently published in major media outlets. Having “eaten” their own output repeatedly, AI LLMs give narrower and less diverse answers to questions, which are less and less factual.
This leads to:
As a LLM is trained on its own data “the model becomes poisoned with its own projection of reality.”
Consider the situation we find in the real world of climate science. The IPCC reports are generated by humans from the output of climate scientists and others in trusted peer-reviewed science journals. It is well known that non-conforming papers are almost entirely excluded from journals because they are non-conforming. Some may sneak through, and some may find a pay-to-play journal that will publish these non-conforming papers, but not many see the light of day.
Thus, only consensus climate science enters the “trusted literature” of the entire topic. John P.A. Ioannidis has pointed out that “Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias.”
AI LLMs thus are trained on trusted sources that are already biased by publication bias, funding bias, the prevailing biases of their own fields, fear of non-conformity and group-think. Worse yet, as AI LLMs thus train themselves on their own output, or output of other AI LLMs, the results become less true, less diverse, and less dependable — potentially poisoned by their own false projections of reality.
In my opinion, many sources of information are already seeing the effects of impending AI LLM collapse – the subtle blurring of fact, opinion and outright fiction.
# # # # #
Author’s Comment:
We live in interesting times.
Be careful what information you accept – read and think critically, educate yourself from original principles and basic science.
Thanks for reading.
# # # # #
Related
Discussion about this post