In our previous article, we gave an overview of how LLMs are trained. Now, let’s look “under the hood” and explore the core process: the Training Loop. This is an iterative process that allows the model to learn and improve gradually over time.
1. Data Preparation and Tokenization
Before a model can begin training, raw text data (like a book or article) must be prepared. The text is broken down into smaller pieces called tokens—these can be words, parts of words, or characters. Each token is then converted into a numerical value. This is where [vector embeddings] come into play, transforming tokens into dense arrays of numbers that carry semantic meaning.
2. Forward Pass
In this stage, the prepared data is fed through the neural network. The model uses its architecture, based on [encoders and transformers], to analyze the input and make a prediction. For example, if the input phrase is “the sun is shining…”, the model tries to predict the next word (e.g., “brightly”).
3. Loss Calculation
After the model makes a prediction, it compares it to the correct answer. The difference between the model’s prediction and the correct answer is called the loss function. The larger the difference, the higher the loss value, which means the model made a significant error.
4. Backpropagation
This is the most critical stage of training. The loss value is sent back through the network. Using a complex mathematical method called gradient descent, the model calculates how to adjust its weights (the parameters that define its “knowledge”) to make a more accurate prediction next time. This process occurs throughout the entire neural network—from end to beginning, gradually reducing the error.
This cycle—from prediction to correction—repeats millions and billions of times until the loss value is minimal. This is how the model slowly but surely learns and becomes capable of generating coherent and meaningful text.