This page explains a large language model in two simple phases. First, training reads many text examples and adjusts weights so the model learns patterns. Later, inference uses those learned weights to answer a prompt by choosing one likely next token, adding it to the text, and repeating the step.
Teaching bridge from deep learning to LLMs: the previous page showed that a neural network is built from layers, weights, and biases. A large language model uses the same deep-learning idea at a much larger scale: instead of learning from a few inputs such as pixels or study hours, it learns patterns from huge amounts of text, then uses those learned weights to predict the next token in a sentence.
The model starts with many examples of written language.
It practises predicting tokens and adjusts its weights when it is wrong.
The learned patterns are stored as numbers inside the model.
A prompt goes in, then the model predicts one next token at a time.
Text is broken into tokens. A token can be a word, part of a word, or punctuation.
During training, the model practises predicting the next token from earlier tokens.
The useful patterns are saved in weights. This demo shows them as a small table.
During inference, the model uses the prompt and weights to produce tokens one by one.
Choose a tiny dataset, show what patterns training finds, then run inference from a prompt.
In a real LLM, training adjusts billions of weights. In this teaching demo, we show that idea as a small memory table of token patterns.
| Seen Context | Possible Next Token | Why It Learned That |
|---|
The model starts from the prompt, checks which patterns match the recent context, then picks the most likely next token.
This is the core of inference: after training, the model gives candidate next tokens different probabilities.