Mo's Blog

Fact-checking and Inference in LLMs

Today, a friend and I discussed ChatGPT. I remain optimistic about the potential of LLMs, as initially highlighted in my GPT-2 text. I believe that LLMs will revolutionize intellectual labor like machines did during the Industrial Revolution. Why? Because there will be a race to improve the ability to predict the next token based on increasing context windows. Superior prediction relies on a deep understanding of the preceding text. LLMs might evolve into machines of intelligence, not just word games. However, my friend rightly pointed out that LLMs still tend to hallucinate information frequently.

I agree. Hallucinations are currently an issue when using LLMs for knowledge-based tasks. However, those who understand LLMs avoid using them in that way. I use ChatGPT for tasks with all the necessary information in the prompt. For example, I refine emails or ask for suggestions on improved Python function implementations.

These hallucinations remind me of toddlers learning to speak. At first, the brain of a toddler masters pronunciation, words, and grammar, but it lacks deep understanding. Toddlers know an apple is a fruit, usually red or green, and that they can eat it, but they do not fully grasp its meaning and all the relations to other objects. I observed similar issues in early language models like GPT-1 or GPT-2. Over time, children’s understanding grows as they learn more about the world and develop the ability to reflect on their knowledge. They build a memory that helps them trace the origins of information. If unsure about knowledge, they will ask or look it up online or in a library.

I predict that the issue of hallucinations in systems like ChatGPT will be addressed using an analogous approach. By giving the model access to a knowledge base—potentially even a formal one—it can make logical inferences and perform calculations. It mirrors how humans solve many problems: when an intuitive solution isn't immediately apparent, we turn to alternative strategies. Maybe we can formalize a problem in mathematics or computer code. Or, we should go to the library first to read some existing material. Then, based on this, we might find an explainable solution. Perhaps LLMs can be trained primarily in language comprehension and different reasoning strategies, including employing inference algorithms that provide step-by-step explanations.

I imagine that AI-based chat systems will have adapters to all sorts of existing algorithms, search engines, and databases. Similar to the human brain, which is more than an inference engine. The human brain also has a working memory, a set of trained skills for approaching challenging problems, and, of course, it uses tools like computers to overcome its limitations (e.g., multiplying two large numbers).

An intermediate step could be for the model to give out the algorithms, queries to answer factual questions, or API calls that return computed results. For example, an LLM could, by default, answer with a Wikidata query that retrieves some facts if such a query seems like the best answer strategy. Or, the LLM predicts that it can answer the question best by formulating a linear program that computes the optimal solution. As a next step, people could develop systems that perform these steps invisibly from the prompter, transforming them into an AI for non-technical people.