AI Agents: What are LLMs?

#personal-notes#ai#llms

Published at February 23, 2025

What is a Large Language Model?#

An LLM is a type of AI model that excels at understanding and generating human language. It’s trained on massive amounts of text data, allowing it to understand nuances, context, and patterns in language.

Most LLMs are based on the Transformer architecture (deep learning architecture), which uses “attention” mechanisms to process text data.

There are 3 types of transformers:

Encoders: takes text as input and outputs a dense representation (or embedding) of the text.
Decoders: focuses on generating new tokens to complete a sequence, one token at a time.
Seq2Seq (Encoder-Decoder): combines both encoders and decoders. The encoder first processes the input senquence into a context representation, then the decoder generates an output sequence.

LLMs are typically decoder-based models with billions of parameters.

The underlying principle of an LLM is: its objective is to predict the next token, given a sequence of previous tokens.

A “token” is the unit of information an LLM works with. You can think of a “token” as if it was a “word”, but for efficiency reasons LLMs don’t use whole words.

For instance, consider how the tokens “interest” and “ing” can be combined to form “interesting”, or “ed” can be appended to form “interested”.

You can play with tokens here:

Each LLM has some specific tokens that are used to open and close the structured components of its generation. For example, to indicate the start or end of a sentence, message, or response. The most important of those is the end-of-sequence (EOS) token.

Why LLMs do not output whole sentences at once? Understanding next token prediction!#

LLMs are said to be autoregressive, meaning that the output from one pass becomes the input for the next one. This loop continues until the model generates the EOS token.

Technically, here is a brief overview of how it works:

Once the input text is tokenized, the model computes a representation of the sequence that captures information about the meaning and the position of each token in the input sequence.
This representation goes into the model, which outputs scores that rank the likelihood of each token in its vocabulary as being the next token in the sequence.
Based on these scores, we have multiple strategies to select the tokens to complete the sequence. And the most common (easiest) strategy is to select the token with the highest probability (maximum score).

And of course, there are more advanced decoding strategies. For example, beam search explores multiple candidate sequences to find the one with the highest probability even if some individual tokens have lower scores.

Attention is all you need!#

Regarding the Transformer architecture, the key aspect is the attention. When predicting the next word, not every word in a sentence is equally important. For instance, with the sentence “The capital of France is”, words like “France” and “capital” carry the most meaning.

The process of identifying the most relevant words to predict the next token has proven to be incredibly effective. And there have been significant advancements in scaling neural networks and making attention mechanisms work for longer and longer sequences.
When working with LLMs, you might have noticed the word “context length”, which refers the maximum number of tokens that the LLM can process, and the maximum attention span it has.

Why prompting is important?#

Considering that the only job of an LLM is to predict the next token by looking at every input token, and to choose which tokens are “important”, the wording of your input sequence is extremely important.

And yes, the input sequence you provide to the LLM is called a prompt.

Because of predicting the next token, that’s why with traditional LLMs, you can’t ask about math problems or calculations.

How are LLMs trained?#

LLMs are trained on large datasets of text, where they learn to predict the next word in a sequence through a self-supervised or masked language modeling objective.

From this unsupervised learning, the model learns the structure of language and underlying patterns in text, allowing it to generalize to unseen data.

After the above initial pre-training, LLMs can then fine-tuned on a supervised learning objective to perform specific tasks. For instance, some models are trained for conversational structures, while others focus on code generation.

Again, how are LLMs used with AI Agents?#

LLMs are the key component of AI Agents, providing the foundation for understanding and generating human language.

They can interpret user instructions, maintain context in conversations, define a plan and decide which tools to use - The AI Agent’s brain!

That’s it, with AI Agents, I’ve covered 2 topics:

What is an Agent?

What are LLMs? (this post)

I hope you enjoyed this note! See you in the next one! 🤗