ElevenLabs Tutorial: Create stories with Voice AI from ElevenLabs
ElevenLabsis voice technology research company, developing the most compelling AI speech software for publishers and creators.
ChatGPT is an AI-based chatbot developed by OpenAI. It is powered by the GPT-3.5 architecture, which stands for "Generative Pre-trained Transformer 3.5." GPT-3.5 is an advanced language model that has been trained on a massive amount of text data from the internet and other sources. Check out ChatGPT apps to get inspired by what you can use it for.
React is a JavaScript library for building user interfaces.
Material-UI a comprehensive collection of prebuilt components that are ready for use in production right out of the box.
FastAPI is a modern, fast (high-performance), web framework for building APIs.
What are we going to build?
In this tutorial, we will build a React app to generate brand new stories and add add voiceover to listen to story. Sit back, relax, enjoy the tutorial and don't forget to make a cup of coffee ☕️.
Learning outcomes
- Getting familiar with ElevenLabs.
- Getting familiar with OpenAI's ChatGPT-3.5-turbo (LLM).
- Creating React app from scratch.
- Getting familiar Material UI.
Prerequisites
Go to Visual Studio Code and donwload version, that compatible with your operating system, or use any other code editor like: IntelliJ IDEA, PyCharm, etc.
To use ElevenLabs, we need API key. Go to ElevenLabs and create an account. It's free! 🎉. And in the upper right corner click on your profile picture
> profile
. Next click on the eye icon and copy/save
your API key.
To use OpenAI's ChatGPT-3.5-turbo, we need API key. Go to OpenAI and create an account. It's free! 🎉. And in the upper right corner click on your profile picture
> View API Keys
. Next click on the Create new secret key
and copy/save
your API key.
Nothing more! Just a cup of coffee ☕️ and a laptop 💻.
Getting started
Create a new project
First thing first, open Visual Studio Code and create a new folder named elevenlabs-tutorial
:
mkdir elevenlabs-tutorial
cd elevenlabs-tutorial
Backend
Create a folder for backend
Let's create new folder for backend. Open your terminal and run the following commands:
mkdir backend
cd backend
Create a new python file
Now, we need to create a new python file. Open your terminal and run the following commands:
touch api.py
Create a virtual environment and activate it
Next, we need to create python virtual environment and activate it. Open your terminal and run the following commands:
python3 -m venv venv
# on MacOS and Linux:
source venv/bin/activate
# on Windows:
venv\Scripts\activate
Install all dependencies
Now, we need to install all dependencies. Open your terminal and run the following commands:
pip install fastapi
pip install elevenlabs
pip install openai
Import all dependencies
Next, we need to import all dependencies. Go to api.py
and add the following code:
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
import uvicorn
import cohere
from elevenlabs import generate, set_api_key
import openai
Initialize FastAPI and add CORS middleware. Learn more about CORS middleware.
app = FastAPI()
origins = ['http://localhost:3000/'] # put your frontend url here
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_methods=["*"],
allow_headers=["*"],
)
Add global variables.
AUDIOS_PATH = "frontend/src/audios/"
AUDIO_PATH = "/audios/"
Implement the API endpoints for voice generation.
@app.get("/voice/{query}")
async def voice_over(query: str):
set_api_key("your-api-key") # put your API key here
audio_path = f'{AUDIOS_PATH}{query[:4]}.mp3'
file_path = f'{AUDIO_PATH}{query[:4]}.mp3'
audio = generate(
text=query,
voice='Bella', # premade voice
model="eleven_monolingual_v1"
)
try:
with open(audio_path, 'wb') as f:
f.write(audio)
return file_path
except Exception as e:
print(e)
return ""
Implement the API endpoints for story generation.
@app.get("/chat/chatgpt/{query}")
def chat_chatgpt(query: str):
openai.api_key = "your-api-key" # put your API key here
try:
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": query}
]
)
return response['choices'][0]['message']['content']
except Exception as e:
print(e)
return ""
Run the backend
uvicorn api:app --reload
Now, open your browser and go to http://localhost:8000/docs
. You should see the following:
Try to play with the API and check it out whether everything we implemented correctly. For example, click on dropdown
> Try it out
:
Frontend
Create a new React app
Now, we need to create a new React app. Open your terminal and run the following commands:
npx create-react-app frontend
cd frontend
Install all dependencies
Now, we need to install all dependencies. Open your terminal and run the following commands:
npm install @mui/material @emotion/react @emotion/styled @mui/joy @mui/icons-material
npm install use-sound
Implement the UI
Go to src/App.js
and replace the code with the following:
import React, { useState } from 'react';
import Textarea from '@mui/joy/Textarea';
import Button from '@mui/joy/Button';
import Box from '@mui/joy/Box';
import { Send, HeadphonesOutlined } from '@mui/icons-material/';
import useSound from 'use-sound';
// import s from './audios/hell.mp3';
import Typography from '@mui/material/Typography';
function App() {
const [loading, setLoading] = useState(false);
const [story, setStory] = useState('');
const [query, setQuery] = useState('');
const [audio, setAudio] = useState('');
const [play] = useSound(audio);
const handleQueryChange = (e) => {
setQuery(e.target.value);
}
const generateStory = () => {
setLoading(true);
console.log('story about: ', query);
fetch(`http://127.0.0.1:8000/chat/chatgpt/${query}`, {
method: 'GET',
headers: {
'Accept': 'application/json'
}
})
.then(response => {
if (response.ok) {
return response.json();
} else {
throw new Error('Request failed');
}
})
.then(data => {
console.log('story: ', data);
if (data) {
setStory(data);
}
})
.catch(err => {
console.log(err);
});
setLoading(false);
}
const generateAudio = () => {
setLoading(true);
console.log('audio about: ', story);
fetch(`http://127.0.0.1:8000/voice/${story}`, {
method: 'GET',
headers: {
'Accept': 'application/json'
}
})
.then(response => {
if (response.ok) {
return response.json();
} else {
throw new Error('Request failed');
}
})
.then(data => {
console.log('audio path: ', data);
if (data) {
setAudio(data);
}
})
.catch(err => {
console.log(err);
});
setLoading(false);
}
const handleSubmit = (e) => {
e.preventDefault();
generateStory();
}
return (
<Box sx={{ marginTop: '32px', marginBottom: '32px', display: 'flex', flexWrap: 'wrap', flexDirection: 'column', alignItems: 'center', justifyContent: 'center', textAlign: 'center', minHeight: '100vh'}}>
<Typography variant="h5" component="h5">
ElevenLabs Tutorial: Create stories with Voice AI from ElevenLabs
</Typography>
<Box sx={{ marginTop: '32px', width: '600px' }}>
<form
onSubmit={handleSubmit}>
<Textarea
sx={{ width: '100%' }}
onChange={handleQueryChange}
minRows={2}
maxRows={4}
placeholder="Type anything…" />
<Button
disabled={loading || query === ''}
type='submit'
sx={{ marginTop: '16px' }}
loading={loading}>
<Send />
</Button>
</form>
</Box>
{story && (
<Box sx={{ marginTop: '32px', width: '600px' }}>
<Textarea
sx={{ width: '100%' }}
value={story}/>
<Button
loading={loading}
sx={{ marginTop: '16px' }}
onClick={audio ? play : generateAudio}>
<HeadphonesOutlined />
</Button>
</Box>
)}
</Box>
);
}
export default App;
Let's go through the code above.
First, we import all the necessary components from @mui/material
and @mui/icons-material
. Then, we import useSound
from use-sound
to play the generated audio. Next, we define the App
component. Inside the App
component, we define the states to store the story, query, and audio.
Next, we implemented functions: handleQueryChange
, generateStory
, generateAudio
, and handleSubmit
.
handleQueryChange
will be called when the user types in the text area. It will update the query
state with the value from the text area.
const handleQueryChange = (e) => {
setQuery(e.target.value);
}
handleSubmit
will be called when the user clicks on the Send
icon. It will call the generateStory
function. then sends a GET request to the FastAPI chat/chatgpt
endpoint to generate an story based on entered query
. Then, it will update the story
state with the generated story.
const handleSubmit = (e) => {
e.preventDefault();
generateStory();
}
const generateStory = () => {
setLoading(true);
console.log('story about: ', query);
fetch(`http://127.0.0.1:8000/chat/chatgpt/${query}`, {
method: 'GET',
headers: {
'Accept': 'application/json'
}
})
.then(response => {
if (response.ok) {
return response.json();
} else {
throw new Error('Request failed');
}
})
.then(data => {
console.log('story: ', data); // output story in the console
if (data) {
setStory(data);
}
})
.catch(err => {
console.log(err);
});
setLoading(false);
}
generateAudio
will be called when the user clicks on the HeadphonesOutlined
icon. It will send a GET request to the FastAPI voice/
endpoint to generate an audio. Then, it will update the audio
state with the generated audio path.
const generateAudio = () => {
setLoading(true);
console.log('audio about: ', story);
fetch(`http://127.0.0.1:8000/voice/${story}`, {
method: 'GET',
headers: {
'Accept': 'application/json'
}
})
.then(response => {
if (response.ok) {
return response.json();
} else {
throw new Error('Request failed');
}
})
.then(data => {
console.log('audio path: ', data);
if (data) {
setAudio(data);
}
})
.catch(err => {
console.log(err);
});
setLoading(false);
}
The return statement will render the UI. Here you can see components like: Box
, Typography
, Textarea
, Button
, Send
and HeadphonesOutlined
all of them are buil-in components from Material-UI. Learn more about Material-UI components here.
return (
<Box sx={{ marginTop: '32px', marginBottom: '32px', display: 'flex', flexWrap: 'wrap', flexDirection: 'column', alignItems: 'center', justifyContent: 'center', textAlign: 'center', minHeight: '100vh'}}>
<Typography variant="h5" component="h5">
ElevenLabs Tutorial: Create stories with Voice AI from ElevenLabs
</Typography>
<Box sx={{ marginTop: '32px', width: '600px' }}>
<form
onSubmit={handleSubmit}>
<Textarea
sx={{ width: '100%' }}
onChange={handleQueryChange}
minRows={2}
maxRows={4}
placeholder="Type anything…" />
<Button
disabled={loading || query === ''}
type='submit'
sx={{ marginTop: '16px' }}
loading={loading}>
<Send />
</Button>
</form>
</Box>
{story && (
<Box sx={{ marginTop: '32px', width: '600px' }}>
<Textarea
sx={{ width: '100%' }}
value={story}/>
<Button
loading={loading}
sx={{ marginTop: '16px' }}
onClick={audio ? play : generateAudio}>
<HeadphonesOutlined />
</Button>
</Box>
)}
</Box>
);
Create new folder audios
in src
folder. We will save all the generated voiceovers in this folder.
mkdir src/audios
Run the app
Let's run the app and see how it works.
npm start
Open your browser and go to http://localhost:3000/
. You will see the app running.
Let's try to generate a story. Generate a short story about cat and kittens.
.
Cool! We got a story. Let's go through it.
Perfect! Let's listen to the audio of this story by clicking on the HeadphonesOutlined
icon.
Conclusion
I hope this tutorial provided clear and detailed guidance, accompanied by few screenshots, to ensure a seamless setup process. By the end of this tutorial, you should have a working app that can generate stories and voiceovers. This is amazing! Today, we learned a lot of cool technologies and tools.
Thank you for following along with this tutorial.