AutoGPT Tutorial: Creating an Agent Powered Research Assistant with Auto-GPT-Forge
Let's begin by crafting your AI-powered research assistant
Welcome to the world of AutoGPT Forge, where we'll embark on a journey to create your very own Research Assistant powered by AutoGPT. I'm Sanchay Thalnerkar, your guide for this exciting tutorial.
🎉 Welcome Aboard!
Today, we're diving deep into the world of AutoGPT. By the end of our journey, you'll have your very own Research Assistant powered by AutoGPT. Feel free to check out in-depth guide on AutoGPT for be fully onboarded. Buckle up, and let's get started!
🌟 What's the Hype About Forge?
Imagine having a toolkit that simplifies the process of building an AI agent. That's what Forge is all about! It's a comprehensive template designed for AutoGPT, ensuring you have a smooth sail while crafting your agent. From setting up to running tests, Forge has got you covered.
🛠️ Getting Started: Step-by-Step Setup of AutoGPT
Before our grand performance, we need to ensure the stage is set. Here's how:
1. Learn System Requirements
- OS: Linux (Debian based), Mac, or Windows Subsystem for Linux (WSL). Windows users, here's your guide to WSL.
2. Clon the Magic Box: The Forge Repository
First, fork the AutoGPT repository. Got it? Great! Now, let's bring it to your local machine:
git clone https://github.com/YOUR_USERNAME/AutoGPT.git
cd AutoGPT
3. Setting up the Forge Environment
With the repository in place, let's set things up:
./run setup
Follow the on-screen prompts, and you'll be good to go!
4. 🤖 Crafting Your Agent
It's time to breathe life into your agent. Give it a unique name, something catchy like EinsteinBot
or CurieAssistant
.
./run agent create YOUR_AGENT_NAME
5. 🏟️ Stepping into the Arena
The Arena is where your agent meets others of its kind. It's a competitive space, aiming to find the crème de la crème of agents. To join the ranks:
./run arena enter YOUR_AGENT_NAME
6. 🚀 Launching Your Agent
With everything in place, let's set your agent in motion:
./run agent start YOUR_AGENT_NAME
Visit http://localhost:8000/
, log in, and voilà! Your agent is ready to assist.
📂 Navigating to Your AI Agent's Lair
Now that your agent is alive and kicking, it's time to delve deeper into its world. Each agent has its own dedicated space within the Forge. To navigate to your agent's directory:
cd autogpts/YOUR_AGENT_NAME
Replace YOUR_AGENT_NAME
with the unique name you gave your agent earlier. This directory is where all the magic happens. It's the heart of your agent, containing all its configurations, logic, and resources. Feel free to explore and customize!
📦 Adding Essential Dependencies
While in your agent's directory, it's time to equip it with some essential tools. We'll be using poetry
to manage our dependencies. If you're new to poetry
, think of it as a trusty sidekick that ensures your agent has everything it needs.
Run the following commands to add the required packages:
poetry add langchain bs4 python-dotenv
These packages will empower your agent with capabilities like language processing, web scraping, and environment variable management.
🚀 Relaunching Your Agent
With the new tools in place, it's time to set your agent back in motion. To run your agent again, use the following command:
./run agent start YOUR_AGENT_NAME
[YOUR_AGENT_NAME]
with the actual agent name that you created earlier.)
(Note: Replace 🛑 Halting Your Agent
Before we proceed further, let's ensure your agent is at rest. To stop your agent, use the following command:
./run agent stop
This command ensures your agent is no longer active, allowing you to safely make modifications.
🌲 Navigating the Directory Tree
To better understand where the agent.py
file resides, let's visualize the directory structure:
autogpts/
│
└───YOUR_AGENT_NAME/
│
└───forge/
│ agent.py
│ ...
Navigate to the agent.py
file located inside your agent's directory:
cd autogpts/YOUR_AGENT_NAME/forge/
This agent.py
file is the brain of your agent. It's where you define its behavior, logic, and interactions. Feel free to explore, tweak, and customize to your heart's content!
📦 1. Setting the Stage with Imports
Every great script starts with the right tools. Here's what we're arming ourselves with:
import json
import pprint
from forge.sdk import (
Agent,
AgentDB,
Step,
StepRequestBody,
Workspace,
ForgeLogger,
Task,
TaskRequestBody,
PromptEngine,
chat_completion_request,
)
import os
from dotenv import load_dotenv
from langchain import PromptTemplate
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
from langchain.chat_models import ChatOpenAI
from langchain.prompts import MessagesPlaceholder
from langchain.memory import ConversationSummaryBufferMemory
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Type
from bs4 import BeautifulSoup
import requests
import json
from langchain.schema import SystemMessage
LOG = ForgeLogger(__name__)
...
🗝️ Unlocking the Secrets: Setting Environment Variables
Before our research assistant can start its work, we need to give it some keys. Not literal keys, but keys to access certain services and tools. These keys are sensitive; think of them as passwords. We don't want to leave them lying around in our code for anyone to see. Instead, we store them safely in a locked drawer, known as environment variables.
load_dotenv('.env')
browserless_api_key = os.getenv('BROWSERLESS_API_KEY')
serper_api_key = os.getenv('SERP_API_KEY')
open_ai_api = os.getenv('OPENAI_API_KEY')
Breaking it Down:
-
load_dotenv('.env')
: This is like unlocking our drawer. We're telling our code, "Hey, I've stored some secrets in this.env
file. Please load them up so we can use them." -
Fetching the Keys:
os.getenv('BROWSERLESS_API_KEY')
: Here, we're reaching into our drawer and pulling out the key labeled 'BROWSERLESS_API_KEY'. This key might be for a service that lets our agent browse the web without opening an actual browser.os.getenv('SERP_API_KEY')
: Similarly, this is another key, perhaps for a service that lets our agent search the web efficiently.os.getenv('OPENAI_API_KEY')
: This is the master key. It allows our agent to access the powerful OpenAI models and use them for research.
By setting up these environment variables, we ensure that our agent has all the tools it needs while keeping our sensitive keys safe and secure.
🤖 2. Meet the Star: The ForgeAgent Class
In the vast universe of code, every class has its role. Some are mere supporting characters, while others take center stage. Among them, the ForgeAgent
class shines the brightest in our script. Let's get to know this star performer a bit better.
🎭 The Role
The ForgeAgent
class is the embodiment of our agent's intelligence and behavior. Think of it as the brain behind all the operations, decisions, and interactions your agent will have.We will dive into this at the end after we have created all the tools and assembled all the pieces.
class ForgeAgent(Agent):
...
By extending the Agent
class, ForgeAgent
inherits all the foundational capabilities of an agent. But, as with any great protagonist, it's not just about the inherited traits but the unique ones they bring to the table.
🧬 Inheritance and Enhancement
In object-oriented programming, inheritance allows a class to adopt properties and behaviors from another class. In our case, the ForgeAgent
inherits from the Agent
class. This means that our ForgeAgent
starts with a set of predefined abilities.
However, what makes ForgeAgent
special is not just what it inherits, but how it enhances and expands upon those inherited capabilities. By defining new methods or overriding existing ones, the ForgeAgent
can exhibit behaviors tailored to our specific needs.
🎨 Customization and Flexibility
The ellipsis (...
) inside our class definition is a placeholder, hinting at the vast potential for customization. Here, you can define methods that dictate how the agent searches for information, interacts with users, processes data, and so much more.
For instance, you might want your agent to scrape specific websites, understand user queries in a particular way, or even have its own unique personality quirks. All these customizations can be coded within the ForgeAgent
class.
🌟 The Spotlight
In the grand performance of our agent's operations, the ForgeAgent
class will often be in the spotlight. Whether it's fetching data, making decisions, or interacting in real-time, this class will be at the heart of it all. As you delve deeper into the code, always remember the pivotal role this class plays in shaping the agent's journey.
Seting up the Tasks
2.1 🎬 Action! Creating Tasks
Alright, let's talk about getting things done. You know how we all have our to-do lists? Well, our agent has one too. And every time it needs to tackle something new, it creates a task. That's where the create_task
method comes in.
Imagine telling your assistant, "Hey, can you fetch me today's weather?" That's a task. And our agent, being the go-getter it is, uses create_task
to note it down and get cracking.
2.2 🧠 The Brain: Executing Steps
Now, tasks are great, but they're just the big picture. The real action happens step-by-step. Think of it like following a recipe. You don't just "make a cake." You mix the ingredients, preheat the oven, pour the batter, and so on. Each of these is a step.
The execute_step
method? That's our agent's way of going through its recipe or, in our case, the task. It's where our agent thinks, "Okay, what's next?" and then does it.
🔧 3. The Toolbox: Helper Functions
Ever tried fixing something without the right tools? Frustrating, right? Our agent feels the same way. That's why it has a bunch of helper functions. These are like the little gadgets in a toolkit, each designed for a specific job.
Say our agent needs to fetch data from a website. Instead of writing the same code every time, it just calls a helper function, like fetch_data_from_website()
. It's all about working smarter, not harder.
🧰 Setting Up the Toolkit
Before we dive into the heart of our agent, the ForgeAgent
class, we need to equip it with some essential tools. These tools will be a set of functions that our agent will rely on to conduct its research.
Let's define these functions outside the main class
, essentially laying them out as standalone utilities. Think of them as the Swiss Army knife our agent reaches for when it needs to get things done.
Now, let's dive into each tool:
search
3.1 🔍 The Detective: Every research project begins with a question, a curiosity. And to answer that, we often turn to the vast ocean of information: the internet. The search
function is our agent's way of sifting through this ocean to find relevant nuggets of information.
def search(query):
url = "https://google.serper.dev/search"
payload = json.dumps({
"q": query
})
headers = {
'X-API-KEY': serper_api_key,
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
return response.text
Imagine you're writing a paper and need statistics on a specific topic or references from scholarly articles. Instead of spending hours manually searching, our agent uses this function to quickly pull up relevant results. It's like having a librarian who instantly knows where to find every book or article you need.
scrape_website
3.2 🌐 The Explorer: Research isn't just about finding sources; it's about extracting valuable information from them. The scrape_website
function allows our agent to delve into specific websites and retrieve the content we need.
def scrape_website(url):
# The agent would access the given URL and extract the necessary data.
headers = {
'Cache-Control': 'no-cache',
'Content-Type': 'application/json',
}
# Define the data to be sent in the request
data = {
"url": url
}
# Convert Python object to JSON string
data_json = json.dumps(data)
# Send the POST request
post_url = f"https://chrome.browserless.io/content?token={browserless_api_key}"
response = requests.post(post_url, headers=headers, data=data_json)
# Check the response status code
if response.status_code == 200:
soup = BeautifulSoup(response.content, "html.parser")
text = soup.get_text()
print("CONTENTTTTTT:", text)
if len(text) > 10000:
output = summary(objective, text)
return output
else:
return text
else:
print(f"HTTP request failed with status code {response.status_code}")
Think of it as sending our agent on a field trip to a digital archive. It goes in, looks around, and brings back the exact piece of information or data you need for your research. No more manual copy-pasting or sifting through irrelevant content.
summary
3.3 📝 The Scribe: Research often involves going through vast amounts of information. But not everything you find is going to be relevant. That's where the summary
function comes in. It distills lengthy content into concise, digestible summaries.
def summary(content):
# The agent processes the content and generates a concise summary.
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k-0613")
text_splitter = RecursiveCharacterTextSplitter(
separators=["\n\n", "\n"], chunk_size=10000, chunk_overlap=500)
docs = text_splitter.create_documents([content])
map_prompt = """
Write a summary of the following text for {objective}:
"{text}"
SUMMARY:
"""
map_prompt_template = PromptTemplate(
template=map_prompt, input_variables=["text", "objective"])
summary_chain = load_summarize_chain(
llm=llm,
chain_type='map_reduce',
map_prompt=map_prompt_template,
combine_prompt=map_prompt_template,
verbose=True
)
output = summary_chain.run(input_documents=docs, objective=objective)
return output
Imagine you've found a 50-page report, but you only need the key points. Instead of reading through the entire document, our agent uses this function to provide a brief overview, highlighting the main findings or points. It's like having a personal assistant who can quickly review and summarize documents for you, ensuring you focus only on what's essential.
🛠️ 4. Gearing Up: Initialization and Setup
Before the action unfolds, our agent needs its environment set up just right.Make sure you add this after your imports!
load_dotenv('.env')
...
🧰 Building Our Perfect Toolkit
Imagine you have a beautifully crafted wooden box. Inside this box, you want to keep all the tools that you frequently use, each in its own dedicated slot, labeled and described. This way, whenever you need a tool, you know exactly where to find it and how to use it.
That's precisely what we're doing with our agent's toolkit.
tools = [
Tool(
name="Search",
func=search,
description="useful for when you need to answer questions about current events, data. You should ask targeted questions"
),
Tool(
name="ScrapeWebsite",
func=scrape_website,
description="Scrape content from a website"
),
]
Here's a breakdown:
tools
The Box: The tools
list is our wooden box. It's where we'll store each tool our agent can use.
Tool(...)
The Tools: Each Tool
object represents a specific tool in our box:
-
name: This is the label on the tool's slot. It's a short, descriptive name that tells us (and our agent) what the tool does. For instance, "Search" tells us that this tool can search the web.
-
func: This points to the actual function (or the tool itself) that gets the job done. For the "Search" tool, it points to the
search
function we defined earlier. -
description: Think of this as a mini user manual written on a sticky note and attached to the tool. It gives a brief overview of what the tool does and offers some tips on how to use it. For example, the description for "Search" suggests asking targeted questions for better results.
By organizing our tools in this manner, we ensure that our agent has a well-structured, easily accessible toolkit. It can quickly pick the right tool for the job, ensuring efficiency and accuracy in its research tasks.
SystemMessage
📜 Setting the Ground Rules with Imagine you're hiring a new researcher for your team. On their first day, you'd probably sit them down and explain their role, right? You'd tell them about the standards of your organization, the importance of factual accuracy, and the guidelines they should follow.
That's exactly what we're doing with the SystemMessage
.
system_message = SystemMessage(
content="""You are a world class researcher, who can do detailed research on any topic and produce facts based results;
you do not make things up, you will try as hard as possible to gather facts & data to back up the research
...
(include other rules and guidelines here)
"""
)
The SystemMessage
is like our orientation speech for the agent. It sets the tone and expectations. By defining this message, we're reminding our agent of its primary role: to be a top-notch researcher that values facts and data above all.
agent_kwargs
The Agent's Playbook: Now, let's talk about agent_kwargs
. Think of this as the agent's playbook or instruction manual. It contains specific settings and parameters that guide the agent's behavior.
agent_kwargs = {
"extra_prompt_messages": [MessagesPlaceholder(variable_name="memory")],
"system_message": system_message,
}
Here's a quick breakdown:
- extra_prompt_messages: This is like giving our agent a notepad where it can jot down important notes. The
MessagesPlaceholder(variable_name="memory")
is a placeholder where the agent can store and recall information, helping it remember past interactions or data.
- system_message: Here, we're linking our orientation speech (the
system_message
we defined earlier) to the agent. This ensures that, before the agent starts its tasks, it's reminded of its role and responsibilities.
In essence, with agent_kwargs
, we're equipping our agent with a clear set of instructions and guidelines, ensuring it operates efficiently and in line with our expectations.
🚀 Booting Up Our Research Assistant
Setting up an AI agent, especially one as sophisticated as a research assistant, is a bit like assembling a high-performance car. Each part has its role, and when put together correctly, you get a machine that purrs with efficiency. Let's break down this assembly process.
ChatOpenAI
1. 🧠 The Brain: llm = ChatOpenAI(temperature=0, model='gpt-3.5-turbo-16k-0613')
Here, we're initializing the core of our agent, its brain if you will. We're using the ChatOpenAI
class to set up a language model. The parameters we've chosen, like temperature=0
, ensure that our agent gives consistent, deterministic responses. The model 'gpt-3.5-turbo-16k-0613'
is a powerful version of the GPT-3 model, ensuring our agent has top-notch cognitive abilities.
ConversationSummaryBufferMemory
2. 📔 The Notepad: memory = ConversationSummaryBufferMemory(
memory_key="memory", return_messages=True, llm=llm, max_token_limit=1000)
Research involves recalling past information, and for that, our agent needs a memory. The ConversationSummaryBufferMemory
acts as our agent's notepad, allowing it to remember past interactions. The parameters here define how this memory works. For instance, max_token_limit=1000
ensures our agent doesn't get overwhelmed with too much information.
initialize_agent
3. 🤖 Assembling the Agent: agent = initialize_agent(
tools,
llm,
agent=AgentType.OPENAI_FUNCTIONS,
verbose=True,
agent_kwargs=agent_kwargs,
memory=memory,
)
Now, it's time to bring everything together. The initialize_agent
function is like the assembly line where we integrate all the parts we've set up.
tools
is our toolkit, which we discussed earlier.llm
is the brain of our agent.agent=AgentType.OPENAI_FUNCTIONS
specifies the type of agent we're creating.verbose=True
is like turning on the debug mode, allowing us to see detailed logs of the agent's operations.agent_kwargs
contains specific settings and guidelines for our agent.memory
is, of course, our agent's notepad.
Once this function runs, we have our fully assembled research assistant, ready to delve into the world of information and assist us in our research endeavors.
customstep
🚦 The Green Light: Imagine you've built a fancy new robot. It's shiny, has all these cool features, and is ready to go. But how do you get it to start? You need a button, a trigger, something that says, "Alright, let's get to work!" That's what customstep
is for our research assistant.
def customstep(query):
result = agent({"input": query})
return result['output']
Breaking it Down:
-
Function Name -
customstep
: It's a custom function we've created to interact with our agent. Think of it as the "Start" button on our robot. -
Parameter -
query
: This is what we want to ask or instruct our agent to do. Maybe we want to know about the history of the Eiffel Tower or the molecular structure of caffeine. Whatever our research question is, we pass it as thequery
. -
Inside the Function:
agent({"input": query})
: Here's where the magic happens. We're telling our agent, "Hey, here's a question for you," and we wait for it to process and come up with an answer.result['output']
: Once our agent has done its research and found an answer, it stores it in theresult
. We then extract the answer (or output) and return it.
So, every time we have a new research question, we simply call the customstep
function, feed in our question, and let our agent handle the rest. It's like having a personal researcher on speed dial, always ready to dive into any topic we're curious about.
ForgeAgent
Class
🤖 Introducing Our Star: The The ForgeAgent
class is the heart of our research assistant. It's like the control room where all the major decisions are made and actions are executed. Let's dive into its structure and understand its functions.
class ForgeAgent(Agent):
Here, our ForgeAgent
is inheriting from the base Agent
class. Think of it as a specialized type of agent, tailored for our research needs.
__init__
1. 🎬 Initialization: def __init__(self, database: AgentDB, workspace: Workspace):
super().__init__(database, workspace)
This is the agent's birthplace. When our ForgeAgent
is created, this method sets it up, connecting it to its database and workspace. It's like setting up a new employee's desk with all the necessary tools and resources.
create_task
2. 📝 Task Creation: async def create_task(self, task_request: TaskRequestBody) -> Task:
task = await super().create_task(task_request)
LOG.info(
f'📦 Task created: {task.task_id} input: {task.input[:40]}{"..." if len(task.input) > 40 else ""}'
)
return task
Every research project begins with a task. This method is where our agent starts its work, creating a new task based on our request. It then logs a message to keep us informed about the task's creation.
execute_step
3. 🧠 Thought Process: async def execute_step(self, task_id: str, step_request: StepRequestBody) -> Step:
self.workspace.write(task_id=task_id, path="output.txt", data=b"Research Agent is thinking...")
step = await self.db.create_step(
task_id=task_id, input=step_request, is_last=True
)
step_input = 'None'
if step.input:
step_input = step.input[:19]
message = f' 🔄 Step executed: {step.step_id} input: {step_input}'
if step.is_last:
message = (
f' ✅ Final Step completed: {step.step_id} input: {step_input}'
)
LOG.info(message)
artifact = await self.db.create_artifact(
task_id=task_id,
step_id=step.step_id,
file_name='output.txt',
relative_path='',
agent_created=True,
)
LOG.info(f'Received input for task {task_id}: {step_request.input}')
step.output = customstep(step_request.input)
return step
This is where the real magic happens. Once our agent has a task, it uses this method to think, process, and act. It's a series of steps:
- Writing to Workspace: Our agent jots down some initial thoughts or data.
- Database Interaction: It interacts with its database, creating steps and logging its actions.
- Logging: Throughout the process, our agent keeps us informed by logging its actions and decisions.
- Executing the Task: Finally, it calls the
customstep
function to execute the main research task based on our input.
In essence, the execute_step
method is our agent's thought process, where it takes our request, processes it, and produces a result.
🚀 Launching Your Research Assistant
Alright, we've set everything up, and our research assistant is eager to get started. Here's how you bring it to life:
1. 🖥️ Starting the Agent
In your terminal, navigate to the directory where your agent resides and run the agent again. If you recall, the command is:
./run agent start YOUR_AGENT_NAME
Replace YOUR_AGENT_NAME
with the unique name you gave your agent earlier.
2. 🌐 Accessing the Interface
Once the agent is up and running, open your favorite web browser and navigate to:
http://localhost:8000
This is the user interface of your agent. It's like the dashboard of a car, where you can interact with and control your agent.
3. 🔐 Logging In
Before you can start, you'll need to log in. You'll see options to log in using Google or GitHub. Choose your preferred method and authenticate.
4. 💬 Making a Query
Now, the fun part! You'll see a chatbox interface. This is where you communicate with your research assistant. Type in a research query or any question you have in mind. Maybe you want to know about the history of the pyramids or the molecular structure of water. Whatever it is, type it in and hit send.
5. 🎩✨ Witness the Magic
Sit back and watch as your research assistant dives into the vast ocean of information, processes it, and presents you with a well-researched answer. It's like having a personal researcher at your fingertips!
Congratulations! You've successfully set up and launched your very own AI-powered research assistant. Explore, experiment, and enjoy the magic of AI research. Happy researching! 🎉📚🔍