Creating an AI-Powered Cooking Assistant with LLaMA 3.2 Vision

Wednesday, October 16, 2024 by TommyA
Creating an AI-Powered Cooking Assistant with LLaMA 3.2 Vision

Creating an AI-Powered Cooking Assistant with LLaMA 3.2 Vision

Hello! It is Tommy again, and today I’m thrilled to take you through an exciting journey where we’ll create a practical AI-powered cooking assistant. In this tutorial, we’ll leverage LLaMA 3.2 Vision, a powerful AI model from Meta, to analyze images of ingredients and suggest recipes in real-time. With the help of Groq Cloud for efficient AI processing and Streamlit for building an interactive user interface, you’ll be able to build a functional app by the end of this tutorial.

So, whether you’re a beginner in AI or just curious about how machine learning can enhance our kitchen experiences, this tutorial will give you a hands-on understanding of these tools while keeping things simple and practical.

Let’s get cooking! 🍳

Setting Up Your Conda Environment

Before diving into the code, let’s prepare your environment using Conda, a popular package and environment manager for Python. We’ll set up a dedicated environment to keep everything organized.

Steps to Set Up Conda Environment:

  1. Install Conda: If you don’t have Conda installed, download and install it from here.

  2. Create a New Conda Environment: Once Conda is installed, open your terminal or command prompt and run:

    conda create --name cooking-assistant python=3.11
    

    This creates a new environment called cooking-assistant with Python 3.11

  3. Activate the Environment:

    conda activate cooking-assistant
    
  4. Install Required Packages: Now, let’s install the necessary Python packages:

    pip install groq streamlit python-dotenv
    

Creating the Main Application File

Next, create a file named main.py where we’ll write the core logic of our AI-powered cooking assistant. This file will handle uploading images, sending them to the Groq Cloud API for analysis, and displaying the results in a simple interface built with Streamlit.

Open your text editor or IDE, create a new file called main.py, and add the following code. We’ll go step by step to understand each part of it.

Initializing the Groq Client for Image Analysis

We’ll begin by setting up the Groq client. This client will allow us to interact with the LLaMA 3.2 Vision model, which will analyze the images of ingredients uploaded by users.

Code:

from groq import Groq
import base64
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Initialize Groq client with API key
client = Groq(api_key=os.getenv("GROQ_API_KEY"))

Explanation:

  • dotenv: We use dotenv to securely manage our API keys. The .env file should contain the Groq API key.
  • Groq Client: The Groq client is initialized using the API key, which allows us to interact with LLaMA 3.2 Vision model.

Analyzing Ingredients Using LLaMA 3.2 Vision

Once the Groq client is set up, we need a function to send the image data to the LLaMA 3.2 Vision model for analysis.

Code:

def analyze_ingredient(image_bytes):
    # Convert the image to base64
    base64_image = base64.b64encode(image_bytes).decode('utf-8')

    # Send request to Groq API to identify ingredients
    response = client.chat.completions.create(
        model="llama-3.2-11b-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                     "type": "text",
                     "text": "Identify the ingredients in this image. 'Only the ingredients' comma separated and nothing else."
                  },
                    {
                     "type": "image_url",
                     "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                     }
                  }
                ]
            }
        ],
        stream=False,
        temperature=1,
        max_tokens=1024,
        top_p=1,
        stop=None,
    )

    return response.choices[0].message.content

Explanation:

  • Base64 Encoding: We convert the image into base64 format, which is necessary for sending the image data through the API.
  • Groq API Call: We send the image to the LLaMA 3.2 Vision model 11b to identify the ingredients.

Suggesting Recipes Based on Identified Ingredients

Once the ingredients are identified, we can ask LLaMA 3.2 to suggest recipes using those ingredients. This is done using another Groq API call.

Code:

def suggest_recipe(ingredients):
    # Send identified ingredients to Llama3.2 to get recipe suggestions
    response = client.chat.completions.create(
        model="llama-3.2-11b-text-preview",
        messages=[
            {"role": "user", "content": f"Suggest a recipe using these ingredients: {ingredients}"}
        ]
    )
    return response.choices[0].message.content

Explanation:

  • Recipe Suggestion: The identified ingredients are sent to the LLaMA 3.2 text model to generate recipe suggestions.

Building the Streamlit Interface

Now that the core functionality is in place, let’s build the Streamlit interface, where we can upload images and get ingredient identification and recipe suggestions.

import streamlit as st

# Streamlit UI
st.title("AI-Powered Cooking Assistant")

uploaded_files = st.file_uploader("Upload ingredient images", accept_multiple_files=True, type=["png", "jpg", "jpeg"])

if uploaded_files:
    ingredients_list = []

    for uploaded_file in uploaded_files:
        # Analyze each uploaded image
        image_bytes = uploaded_file.read()
        ingredients = analyze_ingredient(image_bytes)
        ingredients_list.append(ingredients)
        st.write(f"Identified Ingredients in {uploaded_file.name}: {ingredients}")

    # Suggest recipes based on the identified ingredients
    if ingredients_list:
        all_ingredients = ", ".join(ingredients_list)
        st.write(f"All identified ingredients: {all_ingredients}")
        recipe = suggest_recipe(all_ingredients)
        st.write("Suggested Recipe:")
        st.write(recipe)

Explanation:

  • File Uploader: we can upload one or more images.
  • Image Processing: For each uploaded image, we analyze it and display the identified ingredients.
  • Recipe Suggestion: Once all ingredients are identified, the Llama3.2 model suggests a recipe.

Running the Application

Now that everything is set up, you can run your app. In the terminal, make sure you’re in the same directory as main.py, and run the following command:

streamlit run main.py

Once the app is running, you can upload images, view identified ingredients, and see recipe suggestions in real-time!

App running locally
Cooking App running locally

What’s Next?

  • Explore other models: Try experimenting with different Llama3.2 models available on Groq Cloud.
  • Enhance functionality: Add features like saving favorite recipes or improving the ingredient identification process.
  • Deploy your app: Consider deploying your app to a cloud platform like Heroku or Streamlit Cloud to share with others.

Conclusion

In this tutorial, we created an AI-powered cooking assistant that uses the LLaMA 3.2 Vision model through Groq Cloud to analyze images of ingredients and suggest recipes. We also built a user-friendly interface using Streamlit, allowing users to upload images and interact with the AI in real time.

Now that you’ve seen how to integrate vision models with a web interface, you can further enhance the assistant by adding more features or improving its accuracy. This project showcases how AI can be seamlessly integrated into everyday applications, providing useful, real-world solutions.

Happy coding and cooking! 🎉