My attempt at explaining for laypeople how LLMs work

My attempt at explaining for laypeople how LLMs work programming

Basically, the way that large language models like ChatGPT work is that they read in chunks of words known as tokens. These are typically like short words, or sections of longer compound words, usually no more than four letters long. Large language models use an algorithm known as a transformer — originally invented for natural language translation — that reads in a large amount of tokens and then outputs the next token. To produce multiple output tokens, like it would in a chat, it just does the same process multiple times: pulls in an input prompt, produces an output token, then that token is added to the end of its input prompt and it is asked to come up with the next token after that. This is called being autoregressive and is the origin of people calling large language models "spicy autocomplete" — like when you repeatedly ask iPhone autocomplete to guess your next word.

However, the way large language models work under the hood is much more interesting than auto-complete!

First of all, the model actually doesn't operate on tokens in the way that an auto-complete system does. Instead, part of what it learns in its first training phase is the underlying "concepts," known as word vectors, that groups of tokens correspond to. Basically, the model has a mental space with more than 4,000 dimensions, and it learns the correlations and anti-correlations between different words to generate vectors (you can think of vectors in this case as something like both a position and a direction) for those words in that mathematical space. They do this based on which words tend to appear together versus never appear together, as well as which words tend to appear in similar verses opposite contexts. So, for instance, a word that tends to always appear when people are talking about a certain aspect of history in the same way another word does will probably be a synonym or at least closely related. These correlations and anti correlations can capture a lot of naunce, since there are 4,000 dimensions along which they can be measured! This also leads to some fun side effects. For instance, king - man + woman = queen in the model's mental vector space.

https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fthfvnext.bing.com%2Fth%2Fid%2FOIP.Eely2wIrPTPhPMhVw0eUIgHaEd%3Fcb%3Dthfvnext%26pid%3DApi&f=1&ipt=63f74b54f712bfe5af73225806412f63f4cfc6223c1d4dd953e8dffd56056dc0

When you give the model input, it actually converts every word into these concept vectors and then operates on those. Likewise, when it produces an output, it outputs a vector and then the interface you're using turns into a word you can actually understand. When it does this, it actually outputs a probability distrubtion of possible next concepts to output, ranging from very likely to very unlikely, and the interface you're using uses a random number generator to decide how far down in probability to look. You can tweak how random the number is (and thus how improbable the model's outputs can get) using temperature.

You can think of the mind of a large language model, then, as a huge cloud of words where related words are closer together on some axes and farther apart on others and antinims are across axes from each other.

https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fwww.mathworks.com%2Fhelp%2Fexamples%2Ftextanalytics%2Fwin64%2FVisualizeWordEmbeddingsUsingTextScatterPlotsExample_05.png&f=1&nofb=1&ipt=54db0d198cdbee64b7b92b1341972ef006f455e0a0cf471e6a0d31c6442a83a4

Notably, though, these models don't just learn the correlations between words and so on. They also learn more abstract concepts. This is because each layer of the model — and there are usually hundreds of layers — is essentially an abstraction of the previous inherently, encapsulaing higher level more complicated features. So if the first layer deals with the concepts that directly correspond to words like I described, then the next layer will be slightly more abstract than that and the next layer will be even more abstract. Here's what that looks like for image recognition:

So just imagine what that does for concepts!

Furthermore, when the model is deciding what token to output next, it's actually able to look at multiple places throughout the input you give it — usually about 200 places — where each place is about a sentence to a paragraph's length of token vectors (word/concepts) long, and feed mostly those through the model to produce the output. Part of its training process is actually learning which parts of an input to pay attention to when it's generating the outputs! So for instance, if it's supposed to understand and summarize an essay, it will typically learn to pay attention to the topic sentences of each paragraph and the beginning and ending paragraphs the most. Here's an example of what an attention matrix looks like for a very simple word by word attention mechanism translating a sentence:

https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fimg-blog.csdnimg.cn%2Fimg_convert%2F75e777e58cf4e732f3457e8887ffc073.png&f=1&nofb=1&ipt=4f2ac4c85ea7711ba47de4ae2ee4bba1595ddd8ff726271ace0123d4fc9301b0

(you can find more complicated graphs for full LLMs online, but they're less readily visual)

Counterintuitively, even though the model is re-run from scratch to generate each token, because each token it generates, and then which is added to its input for the next token, contains an idea of where it was going, and a lot more than you'd think can be embedded into the tokens it chooses, models are actually able to hold very high-level abstract in mind and right towards that end!

The way a model learns to choose each next token at your output is based on it being rewarded for following the patterns in human text. Basically, if it chooses a next concept that convincingly looks like the next concept a human would choose, based on the previous concepts in a body of text, then it is rewarded. This is the reward function.

As you can guess, this is similar to autocorrect, and means that models don't really reason or have any grounding in truth or accuracy. However, due to the fact that transformers can intelligently look at different parts of input text to choose what influences what it outputs next, and the way models understand the meaning of words — modern linguistics, as well as most modern philosophy, tends to hold that the meaning of words is in how they're used, the contexts and relationship to other words they show up in, not some kind of dictionary definition or even correspondence to "the real world," which means that these models do actually understand the meaning of words — and the ability to see abstract concepts, these models can actually pick up on very high level, non-local, and abstract patterns in human language, instead of the very simple one word after another patterns that you get from autocorrect. These patterns can also, thanks to the fact that everything is vectors for these models, be transposed into different contexts, to apply to different concepts or words than they were originally learned in, thus allowing a sort of weak general intelligence. Combine this with the fact that they were trained on almost all of the writing in human history, and they've learned a lot of patterns!

Thus, while LLMs can't reason, they can sort of imitate reasoning. And while they can't be creative or give writing critique, they can imitate it. And because the patterns that they learn are so abstract, they're almost like templates, that they can apply to various situations.

Think of it like this: let's say someone gives you a huge sheet of algebra formulas with a little description on each telling you when to apply them. Then they give you a formula to simplify. You can apply all those abstract algebra formulas in various combinations and synthesize them together in order to simplify the expression that you were given, all without realizing that what they gave you was a physics problem, or understanding what the physics formulas mean or correspond to, or even knowing anything about the real world at all! That's basically what LLMs do. It's like Searle's old Chinese Room thought-experiment.

That's sort of what the AI is doing. It can't actually reason, but it's seen the patterns of movement through concept space (remember, concepts are vectors in a almost physical space to these models) that humans can perform and so they can follow those paths of movement through concept space too when they see applicable concepts or even translate those patterns or merge them together and put them in different contexts. And if you think about it, even reasoning itself is really just a conceptual pattern where you can slot in any concept into the corresponding parts of the pattern and then run the pattern through and get something correct on the other end. That's what a syllogism is after all!