I need to start with a caveat. There's a lot of important technical detail, detail that's subtle and takes a long time to parse. I'm not going into that here. This series is aimed at users, so I'm going to take a lot of liberties to get there fast. Many of the heuristics I use aren't technically correct, but hopefully they're helpful. If you're the type of person who loves detail, you're going to hate this series.

When trying to understand what these models are, I've found it important to keep in mind two heuristics:

Heuristic 1: Large Language Models can't do mathematics.

If I ask an LLM, "what is two plus three?", it will give me the correct answer. However, the way it arrives at this is distinctly different from how a calculator would. Large Language Models are best thought of as algorithms that can predict, statistically, what the next most likely word would be, given a sequence of words. Statistically speaking, given the LLM's training data (which consists of most of the text on the public internet), the most likely word or words to follow "what is two plus three?" are "five," "two plus three equals five," or something like that. That's why you get the answer. It's because, in the many, many billions of lines of text the algorithm has seen, this would have been written out. Not because it added the numbers. If you use an example that's unlikely to have been in the training data, say multiplying two 4-digit numbers, the algorithm may not give you the correct answer.

Heuristic 2: Large Language Models don't understand words.

Without descending into the weeds, at its core, a Large Language Model consists of matrix multiplication. I know I've said they can't do mathematics, but being software running on silicon, they do use a lot of mathematics. Maths needs numbers, not words. The process of turning sentences into numbers that the model can process is something we can discuss later, but for now, just note that the first thing the LLM does when you give it some text is to turn that text into a vector. The last thing the LLM does, once it has finished manipulating the vector, is to turn it back into text. All that work happens in a high-dimensional vector space. That is to say, it's crunching numbers, not processing language.

"But Steven," I hear you say, "if this is becoming a maths lecture, I'm out!" Hang on for just a little longer; there is a point, and it's not linear algebra or vector spaces.

The reason I need you to understand these two heuristics is that LLMs hallucinate. In this field, the word "hallucinate" is a technical term. It's the process of combining words and sentences together into a grammatically correct, well-articulated, but factually inaccurate response. And this, my friends, is the point. LLMs are convincing. They can create beautiful prose. But LLMs don't understand. LLMs hallucinate.

3. What is Two Plus Three? Doing Mathematics While Tripping Balls.

Large Language Models hallucinate.