AutoGPT Tutorial: Creating an Agent Powered Research Assistant with Auto-GPT-Forge

Wednesday, September 27, 2023 by sanchayt743
AutoGPT Tutorial: Creating an Agent Powered Research Assistant with Auto-GPT-Forge

Let's begin by crafting your AI-powered research assistant

Welcome to the world of AutoGPT Forge, where we'll embark on a journey to create your very own Research Assistant powered by AutoGPT. I'm Sanchay Thalnerkar, your guide for this exciting tutorial.

🎉 Welcome Aboard!

Today, we're diving deep into the world of AutoGPT. By the end of our journey, you'll have your very own Research Assistant powered by AutoGPT. Feel free to check out in-depth guide on AutoGPT for be fully onboarded. Buckle up, and let's get started!

🌟 What's the Hype About Forge?

Imagine having a toolkit that simplifies the process of building an AI agent. That's what Forge is all about! It's a comprehensive template designed for AutoGPT, ensuring you have a smooth sail while crafting your agent. From setting up to running tests, Forge has got you covered.

🛠️ Getting Started: Step-by-Step Setup of AutoGPT

Before our grand performance, we need to ensure the stage is set. Here's how:

1. Learn System Requirements

2. Clon the Magic Box: The Forge Repository

First, fork the AutoGPT repository. Got it? Great! Now, let's bring it to your local machine:

git clone https://github.com/YOUR_USERNAME/AutoGPT.git
cd AutoGPT

3. Setting up the Forge Environment

With the repository in place, let's set things up:

./run setup

Follow the on-screen prompts, and you'll be good to go!

4. 🤖 Crafting Your Agent

It's time to breathe life into your agent. Give it a unique name, something catchy like EinsteinBot or CurieAssistant.

./run agent create YOUR_AGENT_NAME

5. 🏟️ Stepping into the Arena

The Arena is where your agent meets others of its kind. It's a competitive space, aiming to find the crème de la crème of agents. To join the ranks:

./run arena enter YOUR_AGENT_NAME

6. 🚀 Launching Your Agent

With everything in place, let's set your agent in motion:

./run agent start YOUR_AGENT_NAME

Visit http://localhost:8000/, log in, and voilà! Your agent is ready to assist.


📂 Navigating to Your AI Agent's Lair

Now that your agent is alive and kicking, it's time to delve deeper into its world. Each agent has its own dedicated space within the Forge. To navigate to your agent's directory:

cd autogpts/YOUR_AGENT_NAME

Replace YOUR_AGENT_NAME with the unique name you gave your agent earlier. This directory is where all the magic happens. It's the heart of your agent, containing all its configurations, logic, and resources. Feel free to explore and customize!


📦 Adding Essential Dependencies

While in your agent's directory, it's time to equip it with some essential tools. We'll be using poetry to manage our dependencies. If you're new to poetry, think of it as a trusty sidekick that ensures your agent has everything it needs.

Run the following commands to add the required packages:

poetry add langchain bs4 python-dotenv

These packages will empower your agent with capabilities like language processing, web scraping, and environment variable management.

🚀 Relaunching Your Agent

With the new tools in place, it's time to set your agent back in motion. To run your agent again, use the following command:

./run agent start YOUR_AGENT_NAME

(Note: Replace [YOUR_AGENT_NAME] with the actual agent name that you created earlier.)


🛑 Halting Your Agent

Before we proceed further, let's ensure your agent is at rest. To stop your agent, use the following command:

./run agent stop

This command ensures your agent is no longer active, allowing you to safely make modifications.

🌲 Navigating the Directory Tree

To better understand where the agent.py file resides, let's visualize the directory structure:

autogpts/
│
└───YOUR_AGENT_NAME/
    │
    └───forge/
        │   agent.py
        │   ...

Navigate to the agent.py file located inside your agent's directory:

cd autogpts/YOUR_AGENT_NAME/forge/

This agent.py file is the brain of your agent. It's where you define its behavior, logic, and interactions. Feel free to explore, tweak, and customize to your heart's content!


📦 1. Setting the Stage with Imports

Every great script starts with the right tools. Here's what we're arming ourselves with:

import json
import pprint

from forge.sdk import (
    Agent,
    AgentDB,
    Step,
    StepRequestBody,
    Workspace,
    ForgeLogger,
    Task,
    TaskRequestBody,
    PromptEngine,
    chat_completion_request,
)

import os
from dotenv import load_dotenv
from langchain import PromptTemplate
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
from langchain.chat_models import ChatOpenAI
from langchain.prompts import MessagesPlaceholder
from langchain.memory import ConversationSummaryBufferMemory
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Type
from bs4 import BeautifulSoup
import requests
import json
from langchain.schema import SystemMessage
LOG = ForgeLogger(__name__)
...

🗝️ Unlocking the Secrets: Setting Environment Variables

Before our research assistant can start its work, we need to give it some keys. Not literal keys, but keys to access certain services and tools. These keys are sensitive; think of them as passwords. We don't want to leave them lying around in our code for anyone to see. Instead, we store them safely in a locked drawer, known as environment variables.

load_dotenv('.env')
browserless_api_key = os.getenv('BROWSERLESS_API_KEY')
serper_api_key = os.getenv('SERP_API_KEY')
open_ai_api = os.getenv('OPENAI_API_KEY')

Breaking it Down:

  1. load_dotenv('.env'): This is like unlocking our drawer. We're telling our code, "Hey, I've stored some secrets in this .env file. Please load them up so we can use them."

  2. Fetching the Keys:

    • os.getenv('BROWSERLESS_API_KEY'): Here, we're reaching into our drawer and pulling out the key labeled 'BROWSERLESS_API_KEY'. This key might be for a service that lets our agent browse the web without opening an actual browser.
    • os.getenv('SERP_API_KEY'): Similarly, this is another key, perhaps for a service that lets our agent search the web efficiently.
    • os.getenv('OPENAI_API_KEY'): This is the master key. It allows our agent to access the powerful OpenAI models and use them for research.

By setting up these environment variables, we ensure that our agent has all the tools it needs while keeping our sensitive keys safe and secure.


🤖 2. Meet the Star: The ForgeAgent Class

In the vast universe of code, every class has its role. Some are mere supporting characters, while others take center stage. Among them, the ForgeAgent class shines the brightest in our script. Let's get to know this star performer a bit better.

🎭 The Role

The ForgeAgent class is the embodiment of our agent's intelligence and behavior. Think of it as the brain behind all the operations, decisions, and interactions your agent will have.We will dive into this at the end after we have created all the tools and assembled all the pieces.

class ForgeAgent(Agent):
    ...

By extending the Agent class, ForgeAgent inherits all the foundational capabilities of an agent. But, as with any great protagonist, it's not just about the inherited traits but the unique ones they bring to the table.

🧬 Inheritance and Enhancement

In object-oriented programming, inheritance allows a class to adopt properties and behaviors from another class. In our case, the ForgeAgent inherits from the Agent class. This means that our ForgeAgent starts with a set of predefined abilities.

However, what makes ForgeAgent special is not just what it inherits, but how it enhances and expands upon those inherited capabilities. By defining new methods or overriding existing ones, the ForgeAgent can exhibit behaviors tailored to our specific needs.

🎨 Customization and Flexibility

The ellipsis (...) inside our class definition is a placeholder, hinting at the vast potential for customization. Here, you can define methods that dictate how the agent searches for information, interacts with users, processes data, and so much more.

For instance, you might want your agent to scrape specific websites, understand user queries in a particular way, or even have its own unique personality quirks. All these customizations can be coded within the ForgeAgent class.

🌟 The Spotlight

In the grand performance of our agent's operations, the ForgeAgent class will often be in the spotlight. Whether it's fetching data, making decisions, or interacting in real-time, this class will be at the heart of it all. As you delve deeper into the code, always remember the pivotal role this class plays in shaping the agent's journey.


Seting up the Tasks

2.1 🎬 Action! Creating Tasks

Alright, let's talk about getting things done. You know how we all have our to-do lists? Well, our agent has one too. And every time it needs to tackle something new, it creates a task. That's where the create_task method comes in.

Imagine telling your assistant, "Hey, can you fetch me today's weather?" That's a task. And our agent, being the go-getter it is, uses create_task to note it down and get cracking.

2.2 🧠 The Brain: Executing Steps

Now, tasks are great, but they're just the big picture. The real action happens step-by-step. Think of it like following a recipe. You don't just "make a cake." You mix the ingredients, preheat the oven, pour the batter, and so on. Each of these is a step.

The execute_step method? That's our agent's way of going through its recipe or, in our case, the task. It's where our agent thinks, "Okay, what's next?" and then does it.

🔧 3. The Toolbox: Helper Functions

Ever tried fixing something without the right tools? Frustrating, right? Our agent feels the same way. That's why it has a bunch of helper functions. These are like the little gadgets in a toolkit, each designed for a specific job.

Say our agent needs to fetch data from a website. Instead of writing the same code every time, it just calls a helper function, like fetch_data_from_website(). It's all about working smarter, not harder.


🧰 Setting Up the Toolkit

Before we dive into the heart of our agent, the ForgeAgent class, we need to equip it with some essential tools. These tools will be a set of functions that our agent will rely on to conduct its research.

Let's define these functions outside the main class, essentially laying them out as standalone utilities. Think of them as the Swiss Army knife our agent reaches for when it needs to get things done.

Now, let's dive into each tool:

Every research project begins with a question, a curiosity. And to answer that, we often turn to the vast ocean of information: the internet. The search function is our agent's way of sifting through this ocean to find relevant nuggets of information.

def search(query):
    url = "https://google.serper.dev/search"
    payload = json.dumps({
        "q": query
    })
    headers = {
        'X-API-KEY': serper_api_key,
        'Content-Type': 'application/json'
    }
    response = requests.request("POST", url, headers=headers, data=payload)
    return response.text

Imagine you're writing a paper and need statistics on a specific topic or references from scholarly articles. Instead of spending hours manually searching, our agent uses this function to quickly pull up relevant results. It's like having a librarian who instantly knows where to find every book or article you need.

3.2 🌐 The Explorer: scrape_website

Research isn't just about finding sources; it's about extracting valuable information from them. The scrape_website function allows our agent to delve into specific websites and retrieve the content we need.

def scrape_website(url):
    # The agent would access the given URL and extract the necessary data.
    headers = {
        'Cache-Control': 'no-cache',
        'Content-Type': 'application/json',
    }

    # Define the data to be sent in the request
    data = {
        "url": url
    }

    # Convert Python object to JSON string
    data_json = json.dumps(data)

    # Send the POST request
    post_url = f"https://chrome.browserless.io/content?token={browserless_api_key}"
    response = requests.post(post_url, headers=headers, data=data_json)

    # Check the response status code
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, "html.parser")
        text = soup.get_text()
        print("CONTENTTTTTT:", text)

        if len(text) > 10000:
            output = summary(objective, text)
            return output
        else:
            return text
    else:
        print(f"HTTP request failed with status code {response.status_code}")

Think of it as sending our agent on a field trip to a digital archive. It goes in, looks around, and brings back the exact piece of information or data you need for your research. No more manual copy-pasting or sifting through irrelevant content.

3.3 📝 The Scribe: summary

Research often involves going through vast amounts of information. But not everything you find is going to be relevant. That's where the summary function comes in. It distills lengthy content into concise, digestible summaries.

def summary(content):
    # The agent processes the content and generates a concise summary.
    llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k-0613")

    text_splitter = RecursiveCharacterTextSplitter(
        separators=["\n\n", "\n"], chunk_size=10000, chunk_overlap=500)
    docs = text_splitter.create_documents([content])
    map_prompt = """
    Write a summary of the following text for {objective}:
    "{text}"
    SUMMARY:
    """
    map_prompt_template = PromptTemplate(
        template=map_prompt, input_variables=["text", "objective"])

    summary_chain = load_summarize_chain(
        llm=llm,
        chain_type='map_reduce',
        map_prompt=map_prompt_template,
        combine_prompt=map_prompt_template,
        verbose=True
    )

    output = summary_chain.run(input_documents=docs, objective=objective)

    return output

Imagine you've found a 50-page report, but you only need the key points. Instead of reading through the entire document, our agent uses this function to provide a brief overview, highlighting the main findings or points. It's like having a personal assistant who can quickly review and summarize documents for you, ensuring you focus only on what's essential.


🛠️ 4. Gearing Up: Initialization and Setup

Before the action unfolds, our agent needs its environment set up just right.Make sure you add this after your imports!

load_dotenv('.env')
...

🧰 Building Our Perfect Toolkit

Imagine you have a beautifully crafted wooden box. Inside this box, you want to keep all the tools that you frequently use, each in its own dedicated slot, labeled and described. This way, whenever you need a tool, you know exactly where to find it and how to use it.

That's precisely what we're doing with our agent's toolkit.

tools = [
    Tool(
        name="Search",
        func=search,
        description="useful for when you need to answer questions about current events, data. You should ask targeted questions"
    ),
    Tool(
        name="ScrapeWebsite",
        func=scrape_website,
        description="Scrape content from a website"
    ),
]

Here's a breakdown:

The Box: tools

The tools list is our wooden box. It's where we'll store each tool our agent can use.

The Tools: Tool(...)

Each Tool object represents a specific tool in our box:

  1. name: This is the label on the tool's slot. It's a short, descriptive name that tells us (and our agent) what the tool does. For instance, "Search" tells us that this tool can search the web.

  2. func: This points to the actual function (or the tool itself) that gets the job done. For the "Search" tool, it points to the search function we defined earlier.

  3. description: Think of this as a mini user manual written on a sticky note and attached to the tool. It gives a brief overview of what the tool does and offers some tips on how to use it. For example, the description for "Search" suggests asking targeted questions for better results.

By organizing our tools in this manner, we ensure that our agent has a well-structured, easily accessible toolkit. It can quickly pick the right tool for the job, ensuring efficiency and accuracy in its research tasks.


📜 Setting the Ground Rules with SystemMessage

Imagine you're hiring a new researcher for your team. On their first day, you'd probably sit them down and explain their role, right? You'd tell them about the standards of your organization, the importance of factual accuracy, and the guidelines they should follow.

That's exactly what we're doing with the SystemMessage.

system_message = SystemMessage(
    content="""You are a world class researcher, who can do detailed research on any topic and produce facts based results; 
            you do not make things up, you will try as hard as possible to gather facts & data to back up the research
            ...
            (include other rules and guidelines here)
            """
)

The SystemMessage is like our orientation speech for the agent. It sets the tone and expectations. By defining this message, we're reminding our agent of its primary role: to be a top-notch researcher that values facts and data above all.

The Agent's Playbook: agent_kwargs

Now, let's talk about agent_kwargs. Think of this as the agent's playbook or instruction manual. It contains specific settings and parameters that guide the agent's behavior.

agent_kwargs = {
    "extra_prompt_messages": [MessagesPlaceholder(variable_name="memory")],
    "system_message": system_message,
}

Here's a quick breakdown:

  1. extra_prompt_messages: This is like giving our agent a notepad where it can jot down important notes. The
MessagesPlaceholder(variable_name="memory")

is a placeholder where the agent can store and recall information, helping it remember past interactions or data.

  1. system_message: Here, we're linking our orientation speech (the system_message we defined earlier) to the agent. This ensures that, before the agent starts its tasks, it's reminded of its role and responsibilities.

In essence, with agent_kwargs, we're equipping our agent with a clear set of instructions and guidelines, ensuring it operates efficiently and in line with our expectations.


🚀 Booting Up Our Research Assistant

Setting up an AI agent, especially one as sophisticated as a research assistant, is a bit like assembling a high-performance car. Each part has its role, and when put together correctly, you get a machine that purrs with efficiency. Let's break down this assembly process.

1. 🧠 The Brain: ChatOpenAI

llm = ChatOpenAI(temperature=0, model='gpt-3.5-turbo-16k-0613')

Here, we're initializing the core of our agent, its brain if you will. We're using the ChatOpenAI class to set up a language model. The parameters we've chosen, like temperature=0, ensure that our agent gives consistent, deterministic responses. The model 'gpt-3.5-turbo-16k-0613' is a powerful version of the GPT-3 model, ensuring our agent has top-notch cognitive abilities.

2. 📔 The Notepad: ConversationSummaryBufferMemory

memory = ConversationSummaryBufferMemory(
    memory_key="memory", return_messages=True, llm=llm, max_token_limit=1000)

Research involves recalling past information, and for that, our agent needs a memory. The ConversationSummaryBufferMemory acts as our agent's notepad, allowing it to remember past interactions. The parameters here define how this memory works. For instance, max_token_limit=1000 ensures our agent doesn't get overwhelmed with too much information.

3. 🤖 Assembling the Agent: initialize_agent

agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.OPENAI_FUNCTIONS,
    verbose=True,
    agent_kwargs=agent_kwargs,
    memory=memory,
)

Now, it's time to bring everything together. The initialize_agent function is like the assembly line where we integrate all the parts we've set up.

  • tools is our toolkit, which we discussed earlier.
  • llm is the brain of our agent.
  • agent=AgentType.OPENAI_FUNCTIONS specifies the type of agent we're creating.
  • verbose=True is like turning on the debug mode, allowing us to see detailed logs of the agent's operations.
  • agent_kwargs contains specific settings and guidelines for our agent.
  • memory is, of course, our agent's notepad.

Once this function runs, we have our fully assembled research assistant, ready to delve into the world of information and assist us in our research endeavors.


🚦 The Green Light: customstep

Imagine you've built a fancy new robot. It's shiny, has all these cool features, and is ready to go. But how do you get it to start? You need a button, a trigger, something that says, "Alright, let's get to work!" That's what customstep is for our research assistant.

def customstep(query):
    result = agent({"input": query})
    return result['output']

Breaking it Down:

  1. Function Name - customstep: It's a custom function we've created to interact with our agent. Think of it as the "Start" button on our robot.

  2. Parameter - query: This is what we want to ask or instruct our agent to do. Maybe we want to know about the history of the Eiffel Tower or the molecular structure of caffeine. Whatever our research question is, we pass it as the query.

  3. Inside the Function:

    • agent({"input": query}): Here's where the magic happens. We're telling our agent, "Hey, here's a question for you," and we wait for it to process and come up with an answer.
    • result['output']: Once our agent has done its research and found an answer, it stores it in the result. We then extract the answer (or output) and return it.

So, every time we have a new research question, we simply call the customstep function, feed in our question, and let our agent handle the rest. It's like having a personal researcher on speed dial, always ready to dive into any topic we're curious about.


🤖 Introducing Our Star: The ForgeAgent Class

The ForgeAgent class is the heart of our research assistant. It's like the control room where all the major decisions are made and actions are executed. Let's dive into its structure and understand its functions.

class ForgeAgent(Agent):

Here, our ForgeAgent is inheriting from the base Agent class. Think of it as a specialized type of agent, tailored for our research needs.

1. 🎬 Initialization: __init__

def __init__(self, database: AgentDB, workspace: Workspace):
    super().__init__(database, workspace)

This is the agent's birthplace. When our ForgeAgent is created, this method sets it up, connecting it to its database and workspace. It's like setting up a new employee's desk with all the necessary tools and resources.

2. 📝 Task Creation: create_task

async def create_task(self, task_request: TaskRequestBody) -> Task:
    task = await super().create_task(task_request)
        LOG.info(
            f'📦 Task created: {task.task_id} input: {task.input[:40]}{"..." if len(task.input) > 40 else ""}'
        )
        return task

Every research project begins with a task. This method is where our agent starts its work, creating a new task based on our request. It then logs a message to keep us informed about the task's creation.

3. 🧠 Thought Process: execute_step

async def execute_step(self, task_id: str, step_request: StepRequestBody) -> Step:
     self.workspace.write(task_id=task_id, path="output.txt", data=b"Research Agent is thinking...")
        step = await self.db.create_step(
            task_id=task_id, input=step_request, is_last=True
        )
        step_input = 'None'
        if step.input:
            step_input = step.input[:19]
        message = f'	🔄 Step executed: {step.step_id} input: {step_input}'
        if step.is_last:
            message = (
                f'	✅ Final Step completed: {step.step_id} input: {step_input}'
            )

        LOG.info(message)
        artifact = await self.db.create_artifact(
            task_id=task_id,
            step_id=step.step_id,
            file_name='output.txt',
            relative_path='',
            agent_created=True,
        )
        LOG.info(f'Received input for task {task_id}: {step_request.input}')
        step.output = customstep(step_request.input)
        return step

This is where the real magic happens. Once our agent has a task, it uses this method to think, process, and act. It's a series of steps:

  • Writing to Workspace: Our agent jots down some initial thoughts or data.
  • Database Interaction: It interacts with its database, creating steps and logging its actions.
  • Logging: Throughout the process, our agent keeps us informed by logging its actions and decisions.
  • Executing the Task: Finally, it calls the customstep function to execute the main research task based on our input.

In essence, the execute_step method is our agent's thought process, where it takes our request, processes it, and produces a result.


🚀 Launching Your Research Assistant

Alright, we've set everything up, and our research assistant is eager to get started. Here's how you bring it to life:

1. 🖥️ Starting the Agent

In your terminal, navigate to the directory where your agent resides and run the agent again. If you recall, the command is:

./run agent start YOUR_AGENT_NAME

Replace YOUR_AGENT_NAME with the unique name you gave your agent earlier.

2. 🌐 Accessing the Interface

Once the agent is up and running, open your favorite web browser and navigate to:

http://localhost:8000

This is the user interface of your agent. It's like the dashboard of a car, where you can interact with and control your agent.

3. 🔐 Logging In

Before you can start, you'll need to log in. You'll see options to log in using Google or GitHub. Choose your preferred method and authenticate.

4. 💬 Making a Query

Now, the fun part! You'll see a chatbox interface. This is where you communicate with your research assistant. Type in a research query or any question you have in mind. Maybe you want to know about the history of the pyramids or the molecular structure of water. Whatever it is, type it in and hit send.

5. 🎩✨ Witness the Magic

Sit back and watch as your research assistant dives into the vast ocean of information, processes it, and presents you with a well-researched answer. It's like having a personal researcher at your fingertips!


Congratulations! You've successfully set up and launched your very own AI-powered research assistant. Explore, experiment, and enjoy the magic of AI research. Happy researching! 🎉📚🔍

Discover tutorials with similar technologies

Upcoming AI Hackathons and Events