An Agent is an AI model capable of reasoning, planning, and interacting with its environment.
We call it Agent because it has agency, a.k.a it has the ability to interact with the environment.
To make it easier to understand, let’s take an example from real life. Imagine you have an agent in your house to do things for you.
One day, you want to have a cup of coffee for your breakfast. Then, you ask your agent “Good morning, please make me a cup of coffee”. Of course, the agent understands natural language, and can quickly grasp your request.
Before taking any action, the agent engages in reasoning and planning to figure out steps and tools to:
After figuring out the steps, the agent must act. To execute the plan, the agent can use tools from the list of tools the agent knows about.
In this case, the agent knows about the coffee machine, so the agent can activate it to brew the coffee.
Then finally, the agent will bring the coffee to the table.
An AI Agent is a system that leverages an AI model to interact with its environment in order to achieve a user-defined objective. It combines reasoning, planning, and the execution of actions (often via external tools) to fulfill tasks.
You can think of an agent as having two main components:
The brain (AI Model): This is where all the thinking happens. The AI model handles reasoning and planning. It decides which actions to take based on the user’s objective.
The body (Capabilities and Tools): This part represents everything the Agent is equipped to do. The scope of possible actions depends on what the agent has been equipped/trained with.
The most common AI model used for agents is an LLM (Large Language Model), which takes text as input and outputs text.
Apart from that, other types like VLM (Vision Language Model) can be used for agents that require visual capabilities.
LLMs are text-based models, so they can only generate text outputs.
However, we can use tools (I will add notes about this later) to make the agent more capable. (e.g., ChatGPT can generate images as well).
For example, we can use tools to (of course, apart from text generation):
It will heavily depend on the Tools, thus, an Agent can perform any tasks we implement via Tools to complete Actions.
For example, if I write an Agent to act as my personal assistant in order to send emails (e.g., I will ask it to “send an email to my direct reports to prepare 1-1s meetings”), I can give it some code to send emails. This will be a new Tool the Agent can use whenever it needs to send an email. We can write something like this:
def send_email(to: str, body: str): # Code to send an email
Then, the LLM will generate code to run the tool when it needs to, and fulfill the desired task.
The design of the Tools is very important and has a big impact on the quality of the Agent. Some tasks will require very specific tools while others may be solved with a general-purpose tool like “web_search”.
Actions are not the same as Tools. An Action can involve the use of multiple Tools to complete.
In the next post, we will learn about LLMs that power AI Agents.