An LLM is a type of AI model that excels at understanding and generating human language. It’s trained on massive amounts of text data, allowing it to understand nuances, context, and patterns in language.
Most LLMs are based on the Transformer architecture (deep learning architecture), which uses “attention” mechanisms to process text data.
There are 3 types of transformers:
LLMs are typically decoder-based models with billions of parameters.
The underlying principle of an LLM is: its objective is to predict the next token, given a sequence of previous tokens.
A “token” is the unit of information an LLM works with. You can think of a “token” as if it was a “word”, but for efficiency reasons LLMs don’t use whole words.
For instance, consider how the tokens “interest” and “ing” can be combined to form “interesting”, or “ed” can be appended to form “interested”.
You can play with tokens here:
Each LLM has some specific tokens that are used to open and close the structured components of its generation. For example, to indicate the start or end of a sentence, message, or response. The most important of those is the end-of-sequence (EOS) token.
LLMs are said to be autoregressive, meaning that the output from one pass becomes the input for the next one. This loop continues until the model generates the EOS token.
Technically, here is a brief overview of how it works:
Regarding the Transformer architecture, the key aspect is the attention. When predicting the next word, not every word in a sentence is equally important. For instance, with the sentence “The capital of France is”, words like “France” and “capital” carry the most meaning.
The process of identifying the most relevant words to predict the next token has proven to be incredibly effective. And there have been significant advancements in scaling neural networks and making attention mechanisms work for longer and longer sequences.
When working with LLMs, you might have noticed the word “context length”, which refers the maximum number of tokens that the LLM can process, and the maximum attention span it has.
Considering that the only job of an LLM is to predict the next token by looking at every input token, and to choose which tokens are “important”, the wording of your input sequence is extremely important.
And yes, the input sequence you provide to the LLM is called a prompt.
Because of predicting the next token, that’s why with traditional LLMs, you can’t ask about math problems or calculations.
LLMs are trained on large datasets of text, where they learn to predict the next word in a sequence through a self-supervised or masked language modeling objective.
From this unsupervised learning, the model learns the structure of language and underlying patterns in text, allowing it to generalize to unseen data.
After the above initial pre-training, LLMs can then fine-tuned on a supervised learning objective to perform specific tasks. For instance, some models are trained for conversational structures, while others focus on code generation.
LLMs are the key component of AI Agents, providing the foundation for understanding and generating human language.
They can interpret user instructions, maintain context in conversations, define a plan and decide which tools to use - The AI Agent’s brain!
That’s it, with AI Agents, I’ve covered 2 topics:
I hope you enjoyed this note! See you in the next one! 🤗