ElevenLabs Tutorial: Adding a Witty Narrator into Minecraft via a Simple Mod

Wednesday, July 26, 2023 by septian_adi_nugraha408
ElevenLabs Tutorial: Adding a Witty Narrator into Minecraft via a Simple Mod

The emergence of artificial intelligence has become more pervasive and adaptable in almost every way imaginable. What's even more surprising is the surge in the capabilities of generative AI, demonstrated by use cases such as copy generations, coding assistant, image generations and other creative endeavors that formerly are thought to be exclusive to human minds. In some ways, this is such a thrilling revelation, with promises of so much possibilities of human-AI creative collaborations.

With creativity, next comes fun. After all, arts and creative works serve mostly as form of entertainments in some ways. When it comes to the fun side of AI, possibilites are indeed limitless! Imagine collaborating with a novelist to generate narration for a video game. Tasks that were formerly only possible for an indie game company to pull off can now be attempted by a single software developer which usually possess only passable or even mediocre flair for writing. That's exactly what we attempt to do in this tutorial.

We will utilize ElevenLabs' API and Anthropic's Claude model to generate narrations for Minecraft game. Each time the character steps on a tile, a new narration will be generated. The narration itself is styled after and inspired by an outstanding video game called "A Stanley Parable", where the titular protagonist of the game walks around an office building while every step he takes gets narrated by a witty and snarky narrator. Without further ado, let's dive in to the exciting world of game modding with AI tools!

Introduction to ElevenLabs

ElevenLabs is a software company that specializes in developing speech synthesis technology, powered by artificial intelligence. Imagine text-to-speech technology, but with a sound and flow that's much more natural. In addition to speech synthesis, ElevenLabs also offers a tool called VoiceLab, which allows users to create their own voices. Users can design entirely new synthetic voices or clone their own voices. In this tutorial, we will use ElevenLabs' API to synthesize speech that will narrate the actions of our Minecraft character.

Introduction to Anthropic's Claude

Claude is a highly advanced AI model developed by Anthropic, an AI research organization that strives to make AI safer and beneficial for humanity. This results in an AI model that can "understand" and generate responses that are more human-like. Therefore, Claude is an excellent tool for various applications, especially those that require creativity and a human touch. That's why we will use Anthropic's Claude model to generate unique and witty narrations in our Minecraft game.

Introduction to Minecraft

Minecraft is a sandbox video game that revolves around building, gathering resources, and exploring virtual blocky worlds. It has become one of the most popular video games in the world, boasting millions of players worldwide who can play both locally and online. One of the most prominent aspects of Minecraft is its openness. Players have the freedom to reach their objectives in their own way, tell their in-game stories as they wish, and even modify the game itself.

Introduction to Game Modification (Modding)

Game modification, or 'modding', involves changing or adding to a game's code and assets to alter the gameplay experience. Mods can affect gameplay, graphics, and other aspects of the game. When it comes to Minecraft, modding can involve everything from creating custom blocks and items to designing new worlds and inventing new mechanics. Modding is not only a fun way to enhance the experience of playing video games and extend the game's shelf life, but it's also a great way to learn about programming and game design.

Prerequisites

  • Basic knowledge of Python and web frameworks based on it, such as Flask
  • Access to ElevenLabs' API
  • Access to Anthropic's API

Outline

  1. Initializing the Modding Project
  2. Building the Narration Generation Service
  3. Integrating the Service to Minecraft
  4. Testing the Narration Mod

Discussion

In this tutorial, we will work with two projects that make up the narration mod for Minecraft. The first one is the Raspberry Jam Mod, which is a port of the popular mcpipy library. This library was originally written for the Raspberry Pi to run Python scripts in Minecraft Pi Edition. With the Raspberry Jam Mod, we can run Python scripts in Minecraft Java Edition.

The second project is a Flask app. This app will receive in-game events, such as tile positions, via HTTP requests from the Raspberry Jam Mod script. The development of a separate service is necessary because the Raspberry Jam Mod scripts don't allow the use of external libraries, such as anthropic or elevenlabs. Moreover, it's currently not possible to insert voice narration into the game using the script. This is why we need to build and run a separate service for the narration generation purpose.

Initializing the Modding Project

As discussed briefly in the opening of this section, we will begin the whole modding project by first setting up our Minecraft installations. In order to start modding, we should make sure that our Minecraft installations is readily moddable in some ways that we can run Python scripts on it. One of the way to accomplish that is by installing various tools on top of our Minecraft installation. Namely, Forge and Raspberry Jam Mod.

First, make sure that already have Minecraft Java Edition installed in your operating system. After that, let's open our Minecraft Launcher.

the user interface of minecraft launcher

Let's click the Installations tab. This menu should list the various installations of Minecraft in our machine.

we create new installation with version 1.12.2

Let's create a new installation and choose "Release 1.12.2" from the dropdown. Save the profile and give it a memorable name, or just "1.12.2", and try to start a new world. Make sure everything worked properly. If everything works nicely, great! We're off to a good start already!

Remember our tools from earlier? Next we will install Forge. Download Forge installer for version 1.12.2 and install it. The process should be straightforward enough that we can just pick default settings and leave it to install. Next, if we run Windows as our operating system, we can install RaspberryJamMod with the windows installer which we can download here.

From the same github link, we can download more resources that we need to start our modding project. But first, make sure that we already have a directory in our Minecraft app data directory called mods. Let's head to the directory by using Windows' Run (Win key + R) and type %appdata%\.minecraft (Take care of the backslash! it's a Windows thing). Once we have the directory opened, craate the mods directory.

we go to Minecraft app data directory using Windows' Run

Once we have the mods directory, download the mods.zip file from the RaspberryJamMod's Github and copy the file inside the zip file to the mods directory me wade earlier.

we copy the downloaded mods to the mods directory

Next, we download another zip file which is called python-scripts.zip. Inside the zip file, there should be a directory called mcpipy, that's the directory of our Python scripts! let's copy the directory into our Minecraft app data directory.

Let's get into the coding now! Open the mcpipy directory inside of our .minecraft using our favorite code editors/IDEs. After that, we should see various Python scripts which we can run in-game. For instance, there's a script called whereami to return the position of the player character to the console, and helloworld which will return "Hello world!" and spawning a diamond ore block under the player.

the various scripts inside mcpipy directory, such as helloworld.py which return 'Hello World'! and spawn a diamond ore block, and whereami.py which simply returns the position of player character.

Let's open our Minecraft Launcher again, it should have Forge version to choose now. Let's choose that version and run the game.

we choose the forge version and run the game.

Once we have the game run, create a new world, preferably in "Creative" mode so that we don't get accidentally killed when we play around with the scripts. Next, let's run the script by typing /py helloworld in the game and press enter.

the `/py helloworld command returns "Hello world!" to the console and spawns a diamond ore block under the player character's feet

Great! not long after that, the game should output "Hello world!" at the in-game chat, and when we look down, there should be a diamond ore block under our characters' feet! If we can run the command without error and returning the desired response, congrats! we just started our modding journey in Minecraft using Python scripts.

Building the Narration Generation Service

Next, let's build our narration generation service using Flask. Create a new directory in our coding projects directory called mc-narrator and enter the directory

mkdir mc-narrator
cd mc-narrator

Next, let's install all the required dependencies after we create the virtual environment.

python -m venv env

# On Linux and macOS
source env/bin/activate

# On Windows
.\env\Scripts\activate

# We install Flask, Anthropic for Claude model, ElevenLabs for the audio generation, and python-dotenv to read the environment fariable
pip install flask anthropic elevenlabs "pydantic==1.*" python-dotenv

# We freeze the dependencies into a file called requirements.txt
pip freeze > requirements.txt

When the installation process is completed, let's make our app.py file. This is the file which serves as Flask app.

import os
from flask import Flask, request
import anthropic
from elevenlabs import generate, set_api_key, play

app = Flask(__name__)

@app.route('/narrate', methods=['POST'])
def generate_narration():
    data = request.get_json()
    prompt = (f"{anthropic.HUMAN_PROMPT} You will narrate my actions on Minecraft like the narrator from 'A Stanley Parable'."
              f"{anthropic.AI_PROMPT} Understood."
              f"{anthropic.HUMAN_PROMPT} Great! this is the event: {data['event']}; the narration:")

    c = anthropic.Anthropic(api_key=os.getenv("CLAUDE_KEY"))
    resp = c.completions.create(
        prompt=f"{prompt} {anthropic.AI_PROMPT}",
        stop_sequences=[anthropic.HUMAN_PROMPT],
        model="claude-v1.3-100k",
        max_tokens_to_sample=900,
    )

    print(resp.completion)

    set_api_key(os.getenv("XI_API_KEY"))
    audio = generate(
                    text=resp.completion,
                    voice="Bella",
                    model='eleven_monolingual_v1'
                )
    play(audio)
    return resp.completion


@app.route('/')
def hello_world():
    return 'Hello, World!'

if __name__ == '__main__':
    app.run(debug=True)

Here, we only defined one endpoint, /narrate which will generate narration and play the audio generated from the narration. In the handler of the /narrate endpoint, we first take the JSON payload sent by the client, which contains "event" field. Then we use Claude model to generate narrations of our character actions "like the narrator from 'A Stanley Parable'". next, we generate the audio using ElevenLabs' API. We use "Bella" voice, which is a pre-made voice provided by ElevenLabs and their monolingual model, as we will exclusively narrate the events in English.

This endpoint will use services from Anthropic and ElevenLabs for generating the narration text and the audio, respectively. Therefore, we need to pass API keys for both services which, being sensitive information, we store at a separate .env file.

CLAUDE_KEY=sk-ant-api03-xxxxxxxxxxxxxxxxxxxxxxxx
XI_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Great! now, let's test run our narration generation service. Run the app with this command

flask run

Open up our REST API testing tools. We can use tools such as Postman, Insomnia, or just cURL. In this tutorial, I use Insomnia. Let's send the payload which is a JSON with a single field called "event" which we fill with "I step on a cobblestone".

we test running the narration service, the service responded with the witty and over the top narration, A Stanley Parable-style.

Fantastic! our narrator service responded with the text which in my case, is as follow:

What an incredible step forward into the unknown! The cobblestone path ahead winds endlessly into parts unseen. An adventure surely awaits, for better or for worse! Onward to glory, or doom! *dramatic musical cue* The choice is yours, dear player. How shall your odyssey unfold? The breathless anticipation is palpable! Make your move, adventurer, destiny calls! *ominous echoing voice* Cobblestone... Cobblestone... Cobblestone...

What a witty and over-the-top narration, just like the one in "A Stanley Parable" video game! Not long after, the service should also run the audio narration with the soothing but equally snarky voice of "Bella". Isn't that nice? suddenly our adventures in Minecraft get so much more interesting and livelier!

Integrating the Service to Minecraft

By now, we must have some ideas where this is going. That's right! in this subsection, we will add a new Python script in the mcpipy directory, which we can run directly in-game from Minecraft. Let's open our mcpipy directory in our code editors, and create a new file inside the directory called narrate.py. Let's write the code below.

# narrate.py
import time
from mine import *
from sys import argv
import http.client
import json

mc = Minecraft()
# Define the server and the endpoint
server = "localhost:5000"
endpoint = "/narrate"

mc.postToChat("activating narration...")
while True:
    time.sleep(1)
    # Create a connection
    conn = http.client.HTTPConnection(server)

    pos = mc.player.getTilePos()
    mc.postToChat("Position: ")
    mc.postToChat(pos)
    block = mc.getBlock(pos.x, pos.y-1, pos.z)
    print(block)
    block_types = {
    0: "AIR",
    1: "STONE",
    2: "GRASS",
    3: "DIRT",
    4: "COBBLESTONE",
    5: "WOOD_PLANKS",
    6: "SAPLING",
    7: "BEDROCK",
    8: "WATER_FLOWING",
    9: "WATER_STATIONARY",
    10: "LAVA_FLOWING",
    11: "LAVA_STATIONARY",
    12: "SAND",
    13: "GRAVEL",
    14: "GOLD_ORE",
    15: "IRON_ORE",
    16: "COAL_ORE",
    17: "WOOD",
    18: "LEAVES",
    20: "GLASS",
    21: "LAPIS_LAZULI_ORE",
    22: "LAPIS_LAZULI_BLOCK",
    24: "SANDSTONE",
    26: "BED",
    30: "COBWEB",
    31: "GRASS_TALL",
    35: "WOOL",
    37: "FLOWER_YELLOW",
    38: "FLOWER_CYAN",
    39: "MUSHROOM_BROWN",
    40: "MUSHROOM_RED",
    41: "GOLD_BLOCK",
    42: "IRON_BLOCK",
    43: "STONE_SLAB_DOUBLE",
    44: "STONE_SLAB",
    45: "BRICK_BLOCK",
    46: "TNT",
    47: "BOOKSHELF",
    48: "MOSS_STONE",
    49: "OBSIDIAN",
    50: "TORCH",
    51: "FIRE",
    53: "STAIRS_WOOD",
    54: "CHEST",
    56: "DIAMOND_ORE",
    57: "DIAMOND_BLOCK",
    58: "CRAFTING_TABLE",
    60: "FARMLAND",
    61: "FURNACE_INACTIVE",
    62: "FURNACE_ACTIVE",
    64: "DOOR_WOOD",
    65: "LADDER",
    67: "STAIRS_COBBLESTONE",
    71: "DOOR_IRON",
    73: "REDSTONE_ORE",
    78: "SNOW",
    79: "ICE",
    80: "SNOW_BLOCK",
    81: "CACTUS",
    82: "CLAY",
    83: "SUGAR_CANE",
    85: "FENCE",
    89: "GLOWSTONE_BLOCK",
    95: "BEDROCK_INVISIBLE",
    98: "STONE_BRICK",
    102: "GLASS_PANE",
    103: "MELON",
    107: "FENCE_GATE",
    246: "GLOWING_OBSIDIAN",
    247: "NETHER_REACTOR_CORE",
    }

    mc.postToChat(block)
    payload = {
        "event": "I step on a {}".format(block_types.get(block))
    }
    payload_json = json.dumps(payload)
    headers = {
        'Content-Type': 'application/json'
    }
    conn.request("POST", endpoint, body=payload_json, headers=headers)
    res = conn.getresponse()
    data = res.read()
    conn.close()

    mc.postToChat(data.decode('utf-8'))

Pheew, now that's a handful of code to write! To be fair, this is to be expected. As mcpipy scripts are mostly self-contained, they only support standard Python libraries. Therefore, we can't use convenient third-party libraries such as requests to make the HTTP requests to our narration generator service.

In this code, we start with a while loop. This ensures that the mod will stay running after the initial script trigger. After that, we will set a second delay before the next narration generation. This will make time for our player characters to move around before the next narration is generated. Next, we use mcpipy's various functions such as getTilePos() to get the tile which the player is standing on, getBlock() which will retrieve the information about block at given position, and postToChat() which will send the value into Minecraft's console output. We defined a dictionary that contains the names of available block types in version 1.12.2 of Minecraft along with the ID. We will retrieve the name of the block type by using the block ID from getBlock() function.

Finally, we send the event to our narrator service after we compose the event, which mainly says "I step on a <block-type>". This will send sentences such as "I step on a GRASS" when the player is walking on grass, and "I step on a STONE" when the player step into stony area. With the integration step done, the final step is to test the integration.

Testing the Narration Mod

Let's step into our Minecraft world again. Once we get into the world, type /py narrate.

we input `/py narrate` command into our Minecraft console

After we input the command, let's push Enter. The opening message of the mod should show up.

the opening message of our narrate script shows up along with the position of the character as well as the block ID

As we stepped on a grass in this image, the narration should describe the situation. Let's wait for a moment, the audio narration should play not long after the block type is identified.

the narration text should show up after the audio narration finished playing

Voila! After we listened to the soothing narration from "Bella", we should see the text that narrated our "step on grass" event from earlier. This time, the narrator said:

&quot;Ah, I see you have spawned into a new world. The open fields of grass sprawl out before you, filled with possibility... and grass. So much grass. A veritable sea of the green stuff. Best get to chopping it down, I suppose. Can&#39;t make an omelette without breaking a few blades of grass.&quot;

Try wandering a little bit further! The narration should then adapt to the different kinds of blocks that the characters step on, whether it is grass, stone, sand, dirt, or even a tree stump!

Conclusion

In this tutorial, we get our hands on modding a popular game, Minecraft. However, since this might as well be our first time modding a video game, we keep it simple by using mcpipy, which allows us to run Python scripts inside Minecraft. We incorporated voice narration in the style of "A Stanly Parable", another popular video game which features witty and snarky narrator which follows every steps that the players take.

However, the voice narration itself is special because it's generated by two AI services. We generate the narration text using Anthropic's Claude, resulting in witty and believable narration. After we have the narration text, we generate the audio narration using ElevenLabs' API. The collaboration of two different AI services results in an unique experience which has never been possible to pull off before, especially by a single developer!

Finally, I thank you for joining me in our exciting explorations inside the world of game modding and incorporating AI tools to enhance our experience. By reading and following the steps in this tutorial, I hope I can help you getting into game modding as well as sparking some ideas on how to utilize AI tools in video game development. See you on the next tutorials!