Explaining LLMs to laypeople
Table of Contents
- 1. Introduction
- 2. Tokens, autoregression, and the basic loop
- 3. Word vectors
- 4. Probability distribution and temperature
- 5. Abstract concepts via multi-layer neural networks
- 6. Okay, so what's a transformer?
- 7. The reward function
- 8. So what is an AI doing when I'm talking to it?
- 9. So how do I use this?
- 10. Closing questions
1. Introduction
This is an attempt to explain large language models (e.g. ChatGPT or DeepSeek) in such a way that a layperson could actually understand what's going on under the hood — not just in terms of lazy metaphors (whether overhyped or dismissive) but in terms of the actual algorithms they use.
I think this is really important, because if we don't understand what AI is, it's very easy to get misled, either by overestimating its abilities, using it for the wrong tasks, or underestimating it — any of which could lead to things like "LLM psychosis" in extreme cases! Think of AI like sex: abstinence-only isn't going to help very much, and neither is encouraging it thoughtlessly; but if you can understand what it is, what it does to you, how it works, and how to do it safely, than you can be prepared for the world. And it's important that everyone, techie or not, is prepared for the world we've found ourselves in!
All that out of the way, let's dive in!
2. Tokens, autoregression, and the basic loop
Large language models (LLMs, colloquially known as "AI") use an algorithm known as a transformer — originally invented for natural language translation — that reads in a large amount of input tokens (the prompt) and then outputs the next token (called the prediction). The processing an LLM does is called inference.
Tokens are the way that a large language model understands text (and also sometimes images or even video!). For text, they're typically short words, or sections of longer compound words, usually no more than four letters long. For example:
The word darkness, for example, would be split into two tokens, “dark” and “ness,” with each token bearing a numerical representation, such as 217 and 655. The opposite word, brightness, would similarly be split into “bright” and “ness,” with corresponding numerical representations of 491 and 655. — https://blogs.nvidia.com/blog/ai-tokens-explained/
To produce multiple output tokens like ChatGPT does, the LLM's interface runs the LLM multiple times in a loop, like this:
- take your prompt and convert it to tokens
- plug the prompt into the LLM and run inference
- convert the LLM's output into a token
- append the token to the prompt
- go to step (2)
This is called being autoregressive, and is the origin of people calling large language models "spicy autocomplete" — like when you repeatedly ask iPhone autocomplete to guess your next word.
However, the way large language models work under the hood is much more interesting than your phone's auto-complete!
3. Word vectors
First of all, the model actually doesn't operate on tokens in the way that an auto-complete system does.
Instead, part of what it learns in its first training phase is the underlying "concepts," known as word vectors, that groups of tokens correspond to. You can think of the model's "mind" as a big mathematical space — like the Cartesean grid in your high school math class, except with more than 4,000 dimensions instead of just two or three — where similar tokens are grouped together, and different ones are farther apart, with the big fuzzy clouds these form being "concepts."
How does it figure out where tokens go in this space, in order to form concepts? That's simple: it learns the correlations and anti-correlations between different tokens, based on how often tokens show up next to each other versus not, and how often they show up in similar versus opposite contexts (based on the similarity scores of the tokens around them). From this, the model figures out what words are closer together or farther apart across all those different dimensions.
These dimensions tend to end up corresponding to different "senses" or ways a word can mean things — so for instance, one axis might be gender, another heat, etc.
It then turns words into word vectors along these dimensions. It uses vectors (directions) instead of points (locations) because you can add words together to mean different things that are influenced by all the things you've added together, which is how vectors work, whereas you can't really do that with points.
You can still sort of think of root words as points though, which can be fun! So, for instance, a word that tends to always appear when people are talking about a certain aspect of history in the same way another word does will probably be a synonym or at least closely related. These correlations and anti correlations can capture a lot of naunce, since there are 4,000 dimensions along which they can be measured!
If you think of them as directions, though, as the model does, it lets it do all sorts of interesting things. For instance: king - man + woman = queen in the model's mental vector space.
You can think of the mind of a large language model, then, as a huge cloud of words where related words are closer together on some axes and farther apart on others and antinims are across axes from each other.
4. Probability distribution and temperature
When you give the model input, it actually converts every word into these concept vectors and then operates on those. Likewise, when it produces an output, it actually outputs a vector, and then the interface you're using turns into a word you can actually understand.
When it does this, it actually outputs a probability distribution of possible next concepts to output, ranging from very likely to very unlikely, and the interface you're using uses a random number generator to decide how far down in probability to look. You can tweak how random the number is (and thus how improbable the model's outputs can get) using temperature.
5. Abstract concepts via multi-layer neural networks
Notably, though, these models don't just learn the correlations between words and so on. They also learn more abstract concepts. This is because each layer of the model — and there are usually hundreds of layers — is essentially an abstraction of the previous inherently, encapsulaing higher level more complicated features. So if the first layer deals with the concepts that directly correspond to words like I described, then the next layer will be slightly more abstract than that and the next layer will be even more abstract. Here's what that looks like for image recognition:
So just imagine what that does for concepts: for example, at the first layer, it might understand words like "overcome," "fainted," "sobbed," and so on, and in a much later layer, it might have concepts for "melodrama," "emotion," as well as things like "verb," "noun," "adjective," and so on — not just understanding the words for those things, but actual concepts for those things, where all the words that belong to those concepts are associated with the concepts, and with each other, in that layer!
6. Okay, so what's a transformer?
Good question! I've been putting explaining them off, but now that we have a solid foundation, I can explain more!
When the model is deciding what token to output next, it's actually able to look at multiple places throughout the input you give it — usually about 200 places — where each place is about a sentence to a paragraph's length of word vectors long, and feed mostly those through the model to produce the output.
Part of a model's training process is actually learning which parts of an input to pay attention to when it's generating the outputs! It does this by learning word vectors and concept vectors, as we discussed before, and then when it's planning on outputting a new vector, it "pays attention" to all the places in the input prompt that have vectors that are similar in meaning to what it's outputting, weighing those vectors more heavily in the matrix math that produces its output.
So for instance, if it's supposed to understand and summarize an essay, it will typically learn to pay attention to the topic sentences of each paragraph and the beginning and ending paragraphs the most. This is because of those abstract concept vectors.
Meanwhile, if you paste in a big document and ask it about a specific subject answered in the document, it's probably going to pay the most attention to areas where words related to what you've talked about are most dense. This is based on word vectors.
Here's an example of what an attention matrix looks like for a very simple word by word attention mechanism translating a sentence:
(you can find more complicated graphs for full LLMs online, but they're less readily visual)
7. The reward function
The way a model learns to choose each next token to output is based on it being rewarded for following the patterns in human text. Basically, if it chooses a next concept that convincingly looks like the next concept a human would choose, based on the previous concepts in a body of text, then it is rewarded. This is the reward function.
Modern LLM training doesn't just dump the whole internet into it and ask it to get really good at following the patterns in the whole dataset, though. That's the first step, but then there are three more:
- supervised fine-tuning
- reinforcement learning with verified feedback
- reinforcement learning with human feedback
Supervised fine tuning is really simple: the LLM developers just give it thousands of examples of actually useful patterns, such as "summarize this essay" or "write this essay from this outline" or "critique this story," among very many others, and give the patterns it learns from those much higher importance for it, so it learns how to do actually useful things.
Reinforcement learning with verified feedback is very new, and very cutting edge. Basically, many big new models are, during training, asked to generate code, math, and control real programs on a computer. If they're able to do those things correctly, as verified by specific tests, then they get a reward, and if they don't, they get punished. Think of it like quizzes and homework. This teaches them to be better at saying stuff that's actually correct, at least in those fields.
Reinforcement learning with human feedback is the last step. Many AI interfaces, like the (in)famous ChatGPT, have a little thumbs up and thumbs down button at the end of each answer from the LLM. That's the human feedback! Basically, your reward/punishment of the AI is recorded, and fed back into the next training run, which usually releases every month or so. This is actually why AIs become "sycophantic" (always praising you): it's not an inherent property of LLMs, it's just that on average, humans really like when their robot servants praise them! :P
Since the large language model represents what it's doing as taking in a list of vectors, and then adding them, subtracting them, and transforming them to produce new vectors, you can think of it as the large language model learning the paths humans tend to take through concept-space and trying to learn how to copy them and apply them, to make its own paths. It doesn't really know the meaning of where it is or where it's going in concept space, in terms of how any of it corresponds to the real world, but it can try to start in a place, and if that place looks familiar, assemble a familiar set of paths to get to where it thinks it should go.
This process — of learning to predict human-like paths through a conceptual space without any connection to the real world — is key to understanding what an LLM is actually doing.
8. So what is an AI doing when I'm talking to it?
Let's review what we've learned.
One: That transformers can intelligently look at different parts of input text to choose what influences what it outputs next, across literally hundreds of thousands or even millions of words.
We know that they can pick up on really subtle, long range, and complex patterns, instead of the simple "what word comes next" pattern of auto-complete.
Two: That models put words into a mathematical space that represents the way the words are related or not, and can understand root words and word combinations.
We could say that models have a sort of understand of the meaning of words! Modern linguistics, as well as most modern philosophy, tends to hold that the meaning of words is in how they're used, the contexts and relationship to other words they show up in, not some kind of dictionary definition or even correspondence to "the real world", after all.
Three: That these models are multi-layer neural networks.
It would make sense to say that these models can actually understand and act on very abstract, generally applicable patterns in language, substituting them into different contexts.
Four: That these models can represent words as vectors that can be applied to any other word.
We could say that it's very easy for these models to take the patterns they've learned and apply them to different contexts, to apply to different concepts or words than they were originally learned in, thus allowing a sort of weak general intelligence. Combine this with the fact that they were trained on almost all of the writing in human history, and they've learned a lot of patterns!
Five: That despite all of this, models don't understand the real world, or what they're actually doing with anything.
Thus, while LLMs can't reason, they can sort of imitate reasoning. And while they can't be creative or give writing critique, they can imitate it. And because the patterns that they learn are so abstract, they're almost like templates, that they can apply to various situations.
Six: Tokens correspond to vectors for words and concepts.
This means that they can actually, especially in context, hide a lot of the model's internal thought and state — a lot of nuance. This means that, counterintuitively, even though the model is re-run from scratch to generate each successive token, because each token it generates contains an idea of where it was going, models are actually able to hold very high-level abstract in mind and right towards that end!
Think of it like this: let's say someone gives you a huge sheet of algebra formulas with a little description on each telling you when to apply them. Then they give you a formula to simplify. You can apply all those abstract algebra formulas in various combinations and synthesize them together in order to simplify the expression that you were given, all without realizing that what they gave you was a physics problem, or understanding what the physics formulas mean or correspond to, or even knowing anything about the real world at all! That's basically what LLMs do. It's like Searle's old Chinese Room thought-experiment, where a person who doesn't know Chinese at all is still able to produce perfect Chinese, and even keep up conversations in it, by using a very complex rulebook to translate input Chinese characters into output Chinese characters.
That's sort of what the AI is doing. It can't actually reason, but it's seen the patterns of movement through concept space (remember, concepts are vectors in a almost physical space to these models) that humans can perform and so they can follow those paths of movement through concept space too when they see applicable concepts or even translate those patterns or merge them together and put them in different contexts. And if you think about it, even reasoning itself is really just a conceptual pattern where you can slot in any concept into the corresponding parts of the pattern and then run the pattern through and get something correct on the other end. That's what a syllogism is after all!
When the AI is talking to you, then, all it's doing is it's being given your chat with it in the form of a stage play, essentially, with all your dialogue, and all its past dialogue, filled in, and then it's asked to predict what it should say next, in the personality of a helpful AI assistant chatting with a human. It's just following those really complicated, long form patterns, that's all!
9. So how do I use this?
Well, think about what we've learned.
Clearly, these tools aren't useless. If they know the patterns of, for instance, Socratic dialogue or writing critique, they can certainly do a good enough job at them to be useful! Remember, usefulness isn't an all or nothing thing — something can be useful even if it's only 80% good or correct. And indeed, for some things — such as summarization — studies have shown they're usually more accurate than a random human being is at the task, even if they don't always know what's important to keep in the summary and what isn't… because humans mess up on that too sometimes!
At the same time, based on what we've learned, we should know that you can't really rely on them for sustained reasoning, or understanding the world, because that's not something they can really do. They're also very credulous, and will tend to believe anything you, or any of the sources you plug into them, says: garbage in, garbage out! So you can't rely on them to write math, check sources, or make arguments for you.
More importantly, if you use an LLM to do something for you, you won't learn how to do it yourself, and you aren't practicing doing it either, which means you could get rusty. You also won't remember what you did or why as well, because you didn't do it, which can be a problem for coming back later. So it's usually best to treat LLMs as critique partners, asking them to question you, or critique your work, or point out things you missed, forcing you to actually think, articulate, and write, instead of outsourcing that to them; likewise, LLMs can help with research by running searches, summarizing sources, etc, for you, but you still always have to check its sources yourself.
10. Closing questions
- What might you want to use LLMs for?
- What wouldn't you use them for?
- What should you be careful of when using them?
- How did this change your percpetion of the "AI hype"?
- Are LLMs "spicy autocomplete"?
- Can we call them artificial intelligence, in some sense, if we admit multiple intelligences?
- What would we need to change about LLMs to make them as good as humans at thinking?