Making AI Smarter and Smaller: A Practical Guide to Efficient Model Training

Thursday, August 15, 2024 by sanchayt743

Making AI Smarter and Smaller: A Practical Guide to Efficient Model Training

Hi, I’m Sanchay Thalnerkar, an AI Engineer. I’ve been exploring ways to make AI more efficient, and I’m excited to share an interesting approach I’ve been working on. In the world of artificial intelligence, bigger models often steal the spotlight, but what if you could get similar results without the hefty price tag and massive computing power? This guide walks you through a clever approach: using a large AI model to create top-notch training data, then using that data to train a smaller, more manageable model.

My Method: Efficient AI in Three Steps

First, we leverage a large model like Meta-Llama-3.1-405B, made accessible by AI/ML API, to generate a dataset of marketing scenarios. AI/ML API’s platform allows us to tap into the vast capabilities of this powerful model, creating the perfect study guide for our smaller model. This data is then formatted using the “alpaca prompt” structure, making it easy for a smaller model to learn effectively. Finally, we use a tool called Unsloth to efficiently train our smaller model, starting with Meta-Llama-3.1-8B, on this data.

The outcome? A model that’s smaller, faster, and capable of producing high-quality outputs for specific marketing tasks, comparable to what you’d expect from a much larger model. For instance, when prompted with “Create a marketing campaign to promote a chocolate bar for Cadbury, targeting adults and boomers,” the results can be surprisingly good.

This method offers several benefits. It allows for creating AI models specialized in specific tasks, making it accessible even to small companies or individual developers without the need for expensive hardware or massive budgets. By focusing on generating diverse, high-quality training data and carefully fine-tuning your smaller model, you can create powerful and efficient AI tools tailored to your needs.

Step 1: Setting Up the Environment

Before we begin, let's set up our development environment:

Install Python: If you haven't already, download and install Python from https://www.python.org/downloads/windows/
Create a virtual environment:
- Open Command Prompt
- Navigate to your project directory
- Run the following commands:

python -m venv myenv
myenv\Scripts\activate

Install required packages: Run the following commands in your activated virtual environment:

pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes  ratelimit
pip install torch transformers datasets rich

Additional dependencies: Based on the code in the tutorial, you'll also need to install:

pip install ipython

Start by importing libraries

import json
from typing import List, Dict
import time
import re
from ratelimit import limits, sleep_and_retry
from collections import Counter

Step 1: Setting Up the AI/ML API Client and Handling API Calls

Before we dive into creating the data generation function, it's crucial to first set up the AI/ML API client. This API offers a suite of powerful AI functionalities, including text completion, image inference, and more. Let’s walk through the necessary steps to get everything configured and ready for use.

1.1: Create an Account and Obtain an API Key

To start using the AI/ML API, you'll need to create an account and generate an API key. Follow these steps:

Create an Account: Visit the AI/ML API website and sign up for an account.
Generate an API Key: After logging in, navigate to your account dashboard and generate your API key here.

You'll need to use this API key to authenticate your requests and access the various AI models available through the API.

1.2: Initialize the AI/ML API Client

Once you have your API key, you can set up the client in your environment. This client will be used to interact with the AI/ML API for making various AI-related requests. Here’s how you can do it:

from openai import OpenAI

# Initialize the AI/ML API client with your API key and base URL
client = OpenAI(
    api_key="your_api_key_here",
    base_url="https://api.aimlapi.com",
)

Replace "your_api_key_here" with the API key you generated earlier. This client will be the primary interface for sending requests to the AI/ML API.

1.3: Implementing Rate-Limited API Calls

To handle the API interactions more effectively, especially under rate limits or other transient issues, we define a function called rate_limited_api_call. This function ensures that our requests are resilient to potential issues like rate limiting by the API:

def rate_limited_api_call(messages):
    response = client.chat.completions.create(
        model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
        messages=messages,
        temperature=0.7,
    )
    return response

The rate_limited_api_call function is designed to send a request to the AI/ML API using the specified model and messages. The temperature parameter controls the creativity of the AI's output.

1.4: Handling Errors and Retries

To further enhance the reliability of our API calls, we define a function called get_model_responses. This function is responsible for handling errors and retrying the API call a specified number of times (max_retries) before giving up:

def get_model_responses(messages: List[Dict[str, str]], max_retries: int = 3, timeout: int = 60) -> str:
    for attempt in range(max_retries):
        try:
            response = rate_limited_api_call(messages)
            return Markdown(response.choices[0].message.content.strip())
        except Exception as e:
            console.print(f"[bold red] Error in the model response (Attempt {attempt + 1} / {max_retries}): {e} [/bold red]")
            if attempt < max_retries - 1:
                time.sleep(5)
    raise Exception("Failed to get the response after repeated errors")

This function attempts to retrieve a response from the model and handles any errors gracefully. If all attempts fail, it raises an exception, ensuring that our system is robust and can handle unexpected issues.

Step 2: Creating Data Generation Function

Let's walk through the entire process of how the data generation function works, step by step.

First, we define a function called generate_multiple_marketing_samples. This function’s job is to create several marketing scenarios that we can later use to train a smaller, more efficient AI model. Here's how it starts:

def generate_multiple_marketing_samples(num_samples: int = 5) -> List[str]:
    system_message = f"""You are an elite marketing AI, tasked with generating {num_samples} exceptional, diverse, and innovative marketing scenarios. Each scenario must be unique and push the boundaries of creative marketing. Your goal is to create a dataset that will revolutionize AI-assisted marketing when used for fine-tuning.

    For each sample, provide:

    1. Instruction: A specific, challenging marketing task that requires creativity and expertise.
    2. Input: Detailed context, including company info, target audience, constraints, and goals.
    3. Response: An outstanding, creative solution that addresses the instruction and input.

    Guidelines:
    - Cover a wide range of industries: tech, fashion, food, finance, entertainment, healthcare, education, travel, etc.
    - Include various marketing channels: social media, email, content marketing, influencer partnerships, guerrilla marketing, AR/VR campaigns, etc.
    - Address different marketing objectives: brand awareness, lead generation, customer retention, product launch, crisis management, etc.
    - Incorporate current trends: sustainability, AI, personalization, interactive content, social causes, etc.
    - Consider various audience segments: Gen Z, millennials, seniors, B2B executives, niche hobbyists, global markets, etc.
    - Include challenging scenarios: limited budgets, tight deadlines, highly competitive markets, rebranding efforts, etc.
    - Explore innovative formats: interactive stories, user-generated content campaigns, multi-platform narratives, gamification, etc.

    Format each sample as follows:
    ### Instruction:
    [Concise, specific marketing task]

    ### Input:
    [Detailed context and requirements]

    ### Response:
    [Creative, comprehensive marketing solution]

    Separate each sample with ---

    Remember: Quality, creativity, and diversity are crucial. Each sample should be a masterclass in innovative marketing that would challenge and inspire even seasoned professionals."""

    user_message = f"Generate {num_samples} cutting-edge, diverse marketing scenarios that showcase the future of AI-assisted marketing creativity. Use the specified format, separating each sample with ---."

    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message},
    ]

    response = get_model_response(messages)
    return [sample.strip() for sample in response.split("---") if sample.strip()]

In this first part, we create two messages. The system_message sets the stage, telling the AI that it’s supposed to act like a top-tier marketing expert. The user_message gives specific instructions: it tells the AI how many scenarios to generate (based on the num_samples we input) and how to format each scenario. The format includes three parts: an instruction, some background information, and a response, which will be the solution to the marketing task.

Example Content Generated

Below are some examples of the marketing content generated. The outputs include Facebook ads, sales pages, and Twitter threads tailored to specific audiences and objectives.

Example 1: Facebook Ad for a Fitness Program

Instruction:
Create a Facebook ad for a new fitness program targeting busy professionals, with the objective of generating leads.

Response:
Hook: "Get Fit, Not Frustrated: Unlock Your Dream Body in Just 15 Minutes a Day!"
Narrative: "As a busy professional, you know how hard it is to find time for the gym. But what if you could achieve your fitness goals without sacrificing your sanity? Our program is designed to help you get results in just 15 minutes a day, no matter your schedule."
Climax: "Join our community of like-minded individuals and get instant access to our proven workout routines, nutrition plans, and expert coaching."
Resolution: "Sign up now and take the first step towards a healthier, happier you!"

Persuasion Matrix:
Primary Technique: Problem-Agitate-Solve (PAS)
Psychological Triggers: Scarcity, Social Proof, Authority
Linguistic Patterns: Embedded commands, metaphors
Objection Handling: "I don't have time" (addressed in the narrative)

Optimization Insights:
A/B Test Suggestion: Test a variant with a different hook, focusing on the benefits of increased energy.
Key Performance Indicators: Conversion rate, cost per lead, click-through rate
Omnichannel Synergy: Share user-generated content on Instagram to create social proof.

Example 2: Sales Page for an E-book on Entrepreneurship

Instruction:
Write a sales page for a new e-book on entrepreneurship, targeting aspiring business owners, with the objective of generating sales.

Response:
Hook: "Unlock the Secrets to Building a 6-Figure Business from Scratch"
Narrative: "Are you tired of living paycheck to paycheck? Do you dream of building a business that generates wealth and freedom? Our e-book reveals the exact strategies and mindset shifts you need to succeed as an entrepreneur."
Climax: "Get instant access to our comprehensive guide, packed with actionable advice and real-life examples from successful entrepreneurs."
Resolution: "Buy now and start building the business of your dreams!"

Persuasion Matrix:
Primary Technique: Before-After-Bridge (BAB)
Psychological Triggers: Greed, Curiosity, Authority
Linguistic Patterns: Storytelling, rhetorical questions
Objection Handling: "I don't know where to start" (addressed in the narrative)

Optimization Insights:
A/B Test Suggestion: Test a variant with a different narrative, focusing on the benefits of lifestyle freedom.
Key Performance Indicators: Conversion rate, revenue, average order value
Omnichannel Synergy: Share a free chapter on LinkedIn to generate buzz.

Example 3: Twitter Thread for a Sustainable Fashion Brand

Instruction:
Create a Twitter thread for a new sustainable fashion brand, targeting environmentally conscious consumers, with the objective of increasing brand awareness.

Response:
1/6 "The fashion industry is one of the largest polluters in the world. But what if you could look great and do good at the same time? Introducing our new sustainable fashion brand, dedicated to reducing waste and promoting eco-friendly practices."
2/6 "Our mission is to make sustainable fashion accessible to everyone, without sacrificing style or quality. From organic cotton to recycled materials, we're committed to reducing our environmental footprint."
3/6 "But it's not just about us – it's about the impact we can have together. By choosing sustainable fashion, you're supporting a movement towards a more conscious and responsible industry."
4/6 "We're not just talking the talk – we're walking the walk. Our supply chain is transparent, and we're committed to continuous improvement and innovation."
5/6 "So what are you waiting for? Join the sustainable fashion revolution and shop our latest collection now!"
6/6 "Together, we can make a difference. Share this thread with a friend and help spread the word about the power of sustainable fashion!"

Persuasion Matrix:
Primary Technique: Social Proof
Psychological Triggers: Guilt, Social Identity, Scarcity
Linguistic Patterns: Repetition, metaphors
Objection Handling: "I don't know if I can afford it" (addressed in the thread)

Optimization Insights:
A/B Test Suggestion: Test a variant with a different call-to-action, focusing on encouraging engagement.
Key Performance Indicators: Engagement rate, follower growth, website traffic
Omnichannel Synergy: Share user-generated content on Instagram to create social proof.

And so on, up to 50 pieces of marketing content.

Next, these messages are combined into a list called messages, which is then sent to the AI model to generate the content:

messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message},
    ]

    response = get_model_response(messages)

Here, the messages list is like a conversation with the AI. The first message sets the context, and the second one asks for the specific task. We then use a function called get_model_response (which you would define elsewhere in your code) to send these messages to the AI model and get back the generated scenarios.

After we receive the AI’s response, we need to clean it up and prepare it for use:

return [sample.strip() for sample in response.split("---") if sample.strip()]

The response from the AI is usually a long string with each scenario separated by ---. This part of the function splits the response into individual scenarios, trims any extra spaces, and removes any empty entries. The result is a list of clean, well-structured marketing scenarios.

Output Example:

Let's see what this function outputs. Suppose we ask for a few marketing scenarios:

Instruction: Create a Facebook ad for a new fitness program targeting busy professionals.
- Input: The program is designed for people with limited time but who still want to achieve their fitness goals.
- Response: A detailed ad with a hook, narrative, and call to action, designed to attract leads from busy professionals.
Instruction: Write a sales page for an e-book on entrepreneurship.
- Input: The e-book is targeted at aspiring business owners.
- Response: A compelling sales pitch that outlines the benefits of the e-book and persuades the reader to purchase.

These scenarios are formatted in a way that makes them directly usable as training data for a smaller AI model.

Why This Method Works

This function is simple yet powerful. It allows us to harness the capabilities of a large AI model to generate high-quality, diverse training data. This data is then perfectly formatted to train a smaller model that can perform specific marketing tasks. By controlling the number of samples and the format, we ensure that the generated data is both relevant and easy to use, making the overall process more efficient and effective.

Step 3: Quality Control

After generating our samples, it's crucial to ensure that they meet a certain standard of quality. This is where our quality control function comes into play. The goal here is to filter out any samples that might not be good enough for training our AI model. Let's break down how this function works.

Here's the code for our quality control function:

def quality_check(sample: dict) -> bool:
    min_length = 50
    max_repetition = 0.3

    for key in ["instruction", "input", "response"]:
        text = sample[key]
        if len(text) < min_length:
            return False

        words = re.findall(r"\w+", text.lower())
        unique_words = set(words)
        if len(unique_words) / len(words) < (1 - max_repetition):
            return False

    return True

Output

How the Function Works

This function performs two main checks: a length check and a repetition check.

Length Check: The function begins by setting a minimum length (min_length) that each part of the sample (instruction, input, and response) must meet. The idea here is straightforward: if any part of the sample is too short (less than 50 characters), it likely doesn't contain enough information to be useful. If this is the case, the function immediately returns False, meaning the sample fails the quality check.
Repetition Check: The next check is a bit more involved. This check ensures that the text isn't too repetitive. First, the function finds all the words in the text and converts them to lowercase to ensure consistency. It then calculates the ratio of unique words to the total number of words. If too many words are repeated (more than 30%), the function returns False. This check is important because excessive repetition can indicate low-quality or shallow content, which might not be helpful for training.

Why This Is Important

By passing each sample through these two checks, we make sure that only high-quality data is used to train our model. The length check ensures that the samples are detailed enough, while the repetition check ensures that the content is varied and rich in vocabulary. This process helps maintain the integrity of our dataset, ensuring that our AI model learns from diverse and informative examples.

If a sample passes both checks, the function returns True, indicating that the sample is good enough to be included in our training data. This step is essential because high-quality training data leads to a more reliable and effective AI model.

Visualizing the Process

This quality control step is a safeguard, ensuring that the data fed into our model is robust and well-rounded. It’s like a filter that only lets through the best examples, setting the foundation for a model that can perform at a high level. Plus, the function is flexible enough to be updated with additional checks if needed, allowing us to refine our quality standards as we go.

Step 4: Ensuring Diversity

To build a well-rounded and effective AI model, it's essential that our training data covers a broad range of marketing scenarios. This is where our diversity tracking function comes into play. Its purpose is to monitor how well our dataset represents different industries, marketing channels, and objectives, ensuring that our model is trained on a diverse set of examples.

Here’s the code for our diversity tracking function:

industry_keywords = ["tech", "fashion", "food", "finance", "entertainment", "healthcare", "education", "travel"]
channel_keywords = ["social media", "email", "content marketing", "influencer", "guerrilla", "AR", "VR"]
objective_keywords = ["brand awareness", "lead generation", "customer retention", "product launch", "crisis management"]

def track_diversity(dataset: List[dict]) -> None:
    industries = Counter()
    channels = Counter()
    objectives = Counter()

    for sample in dataset:
        text = " ".join([sample["instruction"], sample["input"], sample["response"]]).lower()
        industries.update(word for word in industry_keywords if word in text)
        channels.update(word for word in channel_keywords if word in text)
        objectives.update(word for word in objective_keywords if word in text)

    print("Dataset Diversity:")
    print(f"Industries: {dict(industries)}")
    print(f"Channels: {dict(channels)}")
    print(f"Objectives: {dict(objectives)}")

Output

How the Function Works

This function is designed to give us a clear picture of how diverse our dataset is, by checking if we're covering enough ground across different industries, marketing channels, and objectives.

Keyword Lists: We start by defining three lists of keywords. These keywords represent important categories:
- Industries: Examples include "tech," "fashion," and "food."
- Channels: This list covers various marketing methods like "social media," "email," and "content marketing."
- Objectives: These are the goals of the marketing campaigns, such as "brand awareness," "lead generation," and "customer retention."
Counting Occurrences: We then create a Counter for each category. A Counter is like a tally system that counts how many times each keyword appears in our dataset. The function loops through each sample in the dataset, combining the instruction, input, and response into one long string. This string is converted to lowercase to ensure that our keyword matching is case-insensitive.
Tracking Keywords: For each sample, the function checks if any of the keywords from our lists appear in the text. If a keyword is found, it gets added to the respective Counter. This process helps us see how often each industry, channel, or objective is represented in our dataset.
Reporting Results: Finally, the function prints out a summary of how well each category is represented. This output shows us the distribution of industries, channels, and objectives in our dataset. It’s a quick way to spot any gaps—if a certain industry or marketing method is underrepresented, we can go back and generate more samples to balance things out.

Why This Matters

Ensuring diversity in our dataset is crucial because it leads to a more versatile and capable AI model. If the training data only focuses on a few industries or marketing channels, the model might struggle with scenarios outside of those areas. By tracking and maintaining diversity, we make sure that our model can handle a wide variety of tasks, making it much more useful in real-world applications.

For example, if we notice that "tech" and "fashion" are well-represented but "healthcare" is not, we can specifically generate more healthcare-related marketing scenarios. This approach ensures that our AI model is not biased towards any particular industry and is capable of performing well across the board.

Visualizing the Process

By actively tracking and ensuring diversity in our training data, we're setting up our AI model for success in a variety of marketing contexts. This step is key to building a robust, flexible model that can adapt to different challenges, ultimately making it a valuable tool for marketing professionals across different sectors.

Step 5: Fine-Tuning Dataset Creation

In this step, we aim to create a dataset specifically designed for fine-tuning a language model to generate marketing and social media content. The create_finetuning_dataset function manages this process, generating and compiling a set of high-quality samples.

The function works by setting a target of 1,000 samples and making multiple API calls to fetch batches of marketing scenarios. It continues until the target is reached or the maximum number of API calls is made. Each sample is broken down into its key components—instruction, input, and response—and stored in a dictionary.

Before adding a sample to the dataset, it undergoes a quality check. If it passes, it's included; if not, it's skipped. Progress is saved after each API call to avoid data loss.

The entire dataset creation took about 25-30 minutes, mainly due to API rate limits.

Here’s the code:

def create_finetuning_dataset(target_samples: int, output_file: str):
    console.print(
        Panel(
            f"Creating fine-tuning dataset for [bold]Marketing & Social Media Content[/bold]",
            expand=False,
        )
    )

    dataset = []
    samples_per_call = 50  # Reduced to respect token limits
    calls_made = 0
    max_calls = 200  # Increased to allow for more total samples

    with console.status("[bold green]Generating samples...") as status:
        while len(dataset) < target_samples and calls_made < max_calls:
            calls_made += 1
            status.update(
                f"API call {calls_made} (Dataset size: {len(dataset)}/{target_samples})"
            )

            samples = generate_multiple_marketing_samples(samples_per_call)

            for sample in samples:
                if len(dataset) >= target_samples:
                    break

                parts = sample.split("###")
                if len(parts) == 4:
                    instruction = parts[1].replace("Instruction:", "").strip()
                    input_text = parts[2].replace("Input:", "").strip()
                    response = parts[3].replace("Response:", "").strip()

                    sample_dict = {
                        "instruction": instruction,
                        "input": input_text,
                        "response": response,
                    }

                    if quality_check(sample_dict):
                        dataset.append(sample_dict)
                    else:
                        console.print("[yellow]Skipped low-quality sample")
                else:
                    console.print(
                        f"[yellow]Skipped malformed sample: {sample[:100]}..."
                    )

            # Save progress after each API call
            with open(output_file, "w") as f:
                json.dump(dataset, f, indent=2)

    console.print(
        f"[bold green]Dataset creation complete! Total samples: {len(dataset)}"
    )
    track_diversity(dataset)

create_finetuning_dataset(
    target_samples=1000,
    output_file="marketing_social_media_dataset_v1.json",
)

The result is a dataset of 1,000 well-crafted marketing scenarios, each formatted with clear instructions, relevant input, and detailed responses. This approach ensures that the dataset is rich and diverse, providing a solid foundation for fine-tuning the language model.

Output:

This dataset, while time-consuming to create, is essential for developing a model that can effectively generate high-quality marketing and social media content.

Step 6: Model Preparation and Quantization

With the dataset ready, the next crucial step is to prepare the language model for fine-tuning. This involves using the Unsloth library to load a pre-trained model while applying certain optimizations that make the process more efficient. One of the key techniques used here is 4-bit quantization, which helps to significantly reduce memory usage and improve the model's performance.

The function FastLanguageModel.from_pretrained is central to this process. It loads the pre-trained model from a selection of pre-quantized models that are designed to download quickly and avoid out-of-memory errors, which is particularly important when working with large models. This method also includes settings like maximum sequence length and automatic support for RoPE scaling, which helps the model handle longer sequences without performance issues.

Here’s how the code looks:

from unsloth import FastLanguageModel  
import torch

max_seq_length = 800  # Adjust based on your needs
dtype = None  # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True  # Use 4bit quantization to reduce memory usage. Can be set to False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

This code effectively prepares the model for the fine-tuning process. By setting load_in_4bit to True, we enable 4-bit quantization, which is a method to compress the model's weights, reducing the overall memory footprint and allowing for faster processing. This is particularly useful when working on hardware with limited memory capacity.

The dtype parameter is also adjustable, depending on the specific GPU you’re using. For instance, if you’re using a Tesla T4 or V100 GPU, you might choose Float16, while newer Ampere GPUs would use Bfloat16. If you leave it as None, the code will automatically detect and set the most appropriate data type for your hardware.

This setup ensures that the model is both loaded and optimized for the fine-tuning task with the dataset you’ve created. By managing resources efficiently and enhancing performance, this preparation step is critical for successfully fine-tuning the model to generate high-quality marketing and social media content.

Step 7: Applying LoRA Adapters to the Model

In this step, we improve the base model by applying LoRA (Low-Rank Adaptation) adapters. LoRA allows us to fine-tune the model more efficiently by only updating a small portion of its parameters, which reduces both computational demands and memory usage. This is especially useful when working with large models or limited resources.

The process involves configuring the model with specific settings, such as the rank (r), lora_alpha, and lora_dropout, to ensure optimal performance. The rank determines how much of the model's capacity is used for fine-tuning, while lora_alpha controls the impact of the updates. We also set lora_dropout to zero, meaning no dropout is applied, which helps retain as much of the original model's knowledge as possible.

Additionally, we enable gradient checkpointing to handle long sequences more efficiently and reduce memory usage. This is particularly helpful when fine-tuning models that need to process large amounts of data without overwhelming the system.

Here’s the code that applies these settings:

model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", 
                      "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

This setup fine-tunes the model by focusing on specific parts of the model, such as the projection layers (q_proj, k_proj, v_proj, etc.), which are critical for the model's performance. By adjusting these components, the model can be efficiently tuned to perform well on the specific tasks of generating marketing and social media content. The use of gradient checkpointing further ensures that the model can handle long sequences without using excessive memory, making the fine-tuning process both effective and resource-efficient.

Step 8: Formatting Dataset for Training

In this step, we prepare the dataset for training by formatting it into a structure that the model can easily process. This involves creating a function that applies a specific prompt template to each example in the dataset, ensuring that the data is structured in a consistent and clear way for the model to understand.

We start by defining a prompt template, called alpaca_prompt, which organizes each example into three parts: the instruction, the input, and the expected response. The template ensures that each example provides clear guidance on the task at hand and the context required to complete it. Additionally, we append an end-of-sequence token (EOS_TOKEN) to each entry, signaling the model when the generated text should end.

Here’s the code that sets up the template and processes the dataset:

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:  
{}

### Input:  
{}

### Response:  
{}"""

EOS_TOKEN = tokenizer.eos_token

def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["response"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

This function, formatting_prompts_func, takes each example from the dataset and applies the alpaca_prompt format. It combines the instruction, input, and response into a single text block and then adds the EOS_TOKEN to mark the end of the sequence. The function returns a list of formatted texts, which are then ready for training.

To apply this formatting to the entire dataset, we load the dataset and use the map function to process it with our formatting function:

dataset = load_dataset('json', data_files='/content/advanced_marketing_dataset_v5.json', split='train')
dataset = dataset.map(formatting_prompts_func, batched=True)

This code loads the dataset from a JSON file and applies the formatting function to each example in the dataset. The result is a dataset that is fully prepared for training, with each entry formatted consistently according to the template. This preparation is crucial for ensuring that the model can learn effectively from the data, as it provides clear and structured examples of the tasks it will be trained to perform.

Step 9: Training the Model

In this step, we move on to the crucial phase of training the model using the SFTTrainer from the Hugging Face TRL library. This involves setting up the training environment with various parameters that will guide the model's learning process. These parameters include the batch size, gradient accumulation steps, learning rate, and other optimization settings that are crucial for effective training.

We start by configuring the SFTTrainer with the model, tokenizer, and the formatted dataset we prepared earlier. The training arguments are carefully chosen to balance the efficiency and performance of the training process. For instance, we use a small batch size with gradient accumulation to make better use of memory, while the learning rate is set to ensure steady progress in model optimization. The training process is designed to update the model parameters based on the dataset, gradually improving the model's ability to generate accurate and contextually appropriate responses.

Here’s the code that sets up and executes the training:

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

trainer_stats = trainer.train()

This code sets up the training process where the model parameters are adjusted to better fit the data. After the setup, the trainer.train() method starts the training process, optimizing the model based on the dataset we’ve prepared.

Output:

Step 10: Generating and Parsing Output

After the model has been trained, the next step is to generate text based on a given prompt and then parse this output into a structured format. We use the trained model to generate text by providing it with a specific instruction. The output is then decoded and parsed into a dictionary format to make it easier to analyze and understand.

Here’s how we generate the output and parse it:

from rich.console import Console
from rich.panel import Panel
from rich.text import Text
from rich.markdown import Markdown
from IPython.display import display, Markdown
import json

FastLanguageModel.for_inference(model)
inputs = tokenizer(
    [
        alpaca_prompt.format(
            "Best marketing post for sneaker company",
            "",
            "",
        )
    ], return_tensors = "pt").to("cuda")

text_streamer = TextStreamer(tokenizer)
output = model.generate(**inputs, streamer = text_streamer)

# Decode the output
output_text = tokenizer.decode(output[0], skip_special_tokens=True)

# Parse output to dictionary
def parse_output_to_dict(output_text):
    result = {}
    current_section = None
    lines = output_text.split('\n')
    for line in lines:
        line = line.strip()
        if line.startswith('###'):
            current_section = line.strip('# ').lower().replace(' ', '_')
            result[current_section] = {}
        elif ':' in line:
            key, value = line.split(':', 1)
            key = key.lower().replace(' ', '_').strip()
            result[current_section][key] = value.strip()
        elif line and current_section:
            if 'content' not in result[current_section]:
                result[current_section]['content'] = []
            result[current_section]['content'].append(line)

    return result

parsed_output = parse_output_to_dict(output_text)

# Display parsed output as JSON
display(Markdown("## Parsed JSON Output\n\n```json\n" + json.dumps(parsed_output, indent=2) + "\n```"))

This code first sets up the model for inference, then generates a response to a prompt about a "regional local Mumbai clothing brand." The generated output is decoded from the tokenized format back into readable text. To make the output more manageable, we parse it into a dictionary, organizing the information into structured sections. Finally, the parsed output is displayed as a formatted JSON object, providing a clear and organized view of the generated content.

This process of generating and parsing ensures that the output from the model is not only usable but also easy to interpret and integrate into further applications.

Step 11: Saving and Reloading the Model

In this final step, we focus on saving the fine-tuned model and tokenizer so that they can be used later without needing to retrain the model from scratch. This is an essential part of the workflow, as it allows you to preserve the work done and quickly reload the model whenever needed for future tasks.

We start by saving both the model and the tokenizer to a local directory named "lora_model". This directory will store all the necessary files that allow the model to be restored later. Once saved, we proceed to reload the model and tokenizer from this directory to ensure everything was stored correctly and can be retrieved without issues.

Here’s the code to save and reload the model:

# Uncomment to load the saved model
model.save_pretrained("lora_model")
tokenizer.save_pretrained("lora_model")

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "lora_model",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model)

After reloading the model, we set it up for inference to generate text based on new prompts. This allows us to test the model’s ability to handle different instructions and inputs, verifying that it produces coherent and contextually appropriate responses.

Here’s how you can generate text with the reloaded model:

inputs = tokenizer(
    [
        alpaca_prompt.format(
            "Create a marketing campaign to promote the chocolate bar",
            "Company: Cadbury, target audience: adults/boomers",
            "",
        )
    ], return_tensors = "pt").to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

In this example, we provide the model with a prompt to create a marketing campaign for a chocolate bar targeting adults and boomers. The model processes this input and generates a relevant marketing campaign response. This process demonstrates that the model can effectively use the knowledge it has gained through fine-tuning to produce high-quality, targeted content.

By saving and reloading the model, you ensure that all the work invested in fine-tuning is not lost and can be easily reused. This completes the training and testing of the model, confirming that it is ready to generate text based on specific instructions and that the results are formatted and saved correctly for any future use.

Comparison between 405B and 8B for the Same Prompt

When comparing the outputs of the original 405B model with those from the fine-tuned 8B model, the differences are clear and significant. The fine-tuned model demonstrates a more refined and practical approach, making it a standout tool for real-world applications. Below is a comparison of both models on the same prompts, followed by an analysis of the fine-tuned model's strengths.

Comparison between 405B and 8B for the Same Prompt

Example Prompts and Outputs

Certainly. Here's the complete markdown table with all the content as requested:

Instruction	Original 405B Model Output	LLaMA 8B Not Finetuned	Fine-Tuned 8B Model Output
Create a Facebook ad for a fitness program targeting women aged 25-45, aiming to drive sign-ups for a free trial.	"Here's a sample Facebook ad for a fitness program targeting women aged 25-45: Headline: Transform Your Body, Unlock Your Strength Text: Get ready to feel empowered, confident, and unstoppable! Call-to-Action (CTA) Button: Sign Up for Free Trial Targeting Options: Demographics: Women aged 25-45, Interests: Fitness, health, wellness, weight loss, nutrition, exercise, women's empowerment Additional Tips: Use social proof, emphasize the benefits, and use a clear and direct CTA."	Assistant: Here's a potential Facebook ad for a fitness program targeting women aged 25-45: Headline: Transform Your Body in Just 30 Days! Image: A high-energy image of a fit and happy woman in her 30s, with a toned body and a bright smile. She's standing in front of a beautiful outdoor setting, such as a beach or a mountain trail. Body: Are you tired of feeling stuck in a rut? Do you want to feel confident and empowered in your own skin? Our 30-Day Fitness Challenge is here to help! This comprehensive program is designed specifically for women like you, who want to transform their body and improve their overall health. With our challenge, you'll get: * 30 days of expert-led workouts, tailored to your fitness level * A personalized nutrition plan to help you fuel your body * Access to a supportive community of like-minded women * Free trial access to our premium fitness app, with over 1,000 workouts to choose from Sign up now and get ready to: * Lose weight and tone your body * Increase your energy and confidence * Improve your mental health and well-being Try it risk-free for 30 days! Call-to-Action (CTA) and Button: Try Now and Get a FREE 30-Day Trial! Sign up in 1 minute and start your transformation today! Target Audience: * Women aged 25-45 * Interests: fitness, health, wellness, running, yoga, Pilates, strength training * Behaviors: people who have shown interest in fitness and wellness products or services * Location: target specific cities or regions to reach a local audience Ad Placement: * Facebook and Instagram feed ads * Facebook and Instagram stories ads * Video ads with a 30-second clip of a fit and happy woman in action Budget Allocation: * 70% of budget to Facebook and Instagram feed ads * 20% of budget to Facebook and Instagram stories ads * 10% of budget to video ads Bidding Strategy: * Cost per click (CPC) bidding * Target cost per acquisition (CPA) of $10 This ad aims to drive sign-ups for a free trial of the fitness program, with a focus on transformation and empowerment. The ad targets women aged 25-45 who are interested in fitness and wellness, and offers a risk-free trial to encourage sign-ups.	Response: Hook: Get Fit and Feel Amazing! Narrative: Are you tired of feeling sluggish and unmotivated? Our fitness program is designed to help you achieve your goals and boost your energy levels. Climax: Join our free trial and experience the transformative power of our program for yourself! Resolution: Sign up now and start your journey to a healthier, happier you!
Create a Twitter thread for a fashion brand, showcasing their new sustainable clothing line.	"Here's a Twitter thread for a fashion brand showcasing their new sustainable clothing line: 1/8 BIG NEWS! We're thrilled to introduce our new sustainable clothing line, 'EcoCycle'! 4/8 Meet the designers behind EcoCycle! 8/8 Thanks for joining us on this journey! At [Brand Name], we're committed to creating fashion that's not only beautiful, but also sustainable."	Assistant: Here's a Twitter thread for a fashion brand showcasing their new sustainable clothing line: 1/7 Exciting news! We're proud to introduce our new sustainable clothing line, designed to make a positive impact on the planet without sacrificing style! Our mission is to make fashion a force for good. Stay tuned for a glimpse into our new collection #SustainableFashion #EcoFriendly 2/7 Our new line is made from eco-friendly materials, such as organic cotton, recycled polyester, and Tencel. We've also reduced our water usage by 40% through innovative manufacturing processes. Every step we take is a step towards a more sustainable future #GreenFashion 3/7 Meet the hero of our collection: our signature 'Eco-Shift' dress. Made from 100% organic cotton, this stunning piece is not only stylish but also kind to the planet. Plus, it's designed to last – because we believe fashion should be timeless, not disposable #SustainableClothing 4/7 We're committed to reducing waste in our supply chain. That's why we've partnered with a local textile recycling program to upcycle old fabrics into new materials. It's a win-win for the planet and our customers! #CircularFashion 5/7 But sustainability isn't just about materials – it's also about people. Our new line is designed to be accessible and affordable, so everyone can join the sustainable fashion movement. No one should have to choose between style and conscience #FashionForAll 6/7 We're not just talking about the clothes – we're talking about the process. Our new line is made in facilities that prioritize fair labor practices and safe working conditions. We believe fashion should be a force for good, not harm #FashionWithConscience 7/7 So, what are you waiting for? Join the sustainable fashion revolution with us! Our new collection is now available online and in-store. Shop with a clear conscience and make a difference with every purchase #SustainableFashion #EcoFriendlyFashion	Response: Hook: "Introducing our new sustainable clothing line: Made with love, not landfill." Narrative: "At [Brand Name], we believe fashion should be a force for good. That's why we've created a new line of clothing made from 100% recycled materials..." Climax: "Our new line features a range of stylish pieces..." Resolution: "Shop now and make a statement with [Brand Name]..."
Design an email campaign for an e-commerce store promoting a new product line, aiming to increase sales by 20%.	"Here's a comprehensive email campaign design to promote a new product line and increase sales by 20%: Email Campaign Structure: 1. Teaser Email (sent 1 week before launch) 2. Launch Email (sent on launch day) 3. Special Offer Email (sent 3 days after launch) 4. Social Proof Email (sent 1 week after launch) 5. Reminder Email (sent 2 weeks after launch) 6. Win-Back Email (sent 3 weeks after launch)"	Assistant: Here's a comprehensive email campaign design to promote a new product line and increase sales by 20%: Campaign Name: "Introducing Our Latest Collection: Unlock 20% More Style" Target Audience: Subscribers who have shown interest in the brand's previous collections and are likely to be interested in the new product line. Email List Segmentation: 1. New Arrivals List: Send a dedicated email to subscribers who have purchased from the brand before, showcasing the new product line. 2. Abandoned Cart List: Send a reminder email to customers who have left items in their cart, highlighting the new products and offering a discount. 3. Loyal Customer List: Send a special email to loyal customers who have spent over $500 in the past year, offering exclusive discounts and early access to new products. Email 1: Introduction Email (sent 7 days before launch) * Subject Line: "Get Ready for Our Newest Collection" * Content: + Brief introduction to the new product line, highlighting the inspiration and design behind it. + High-quality images and videos showcasing the new products. + Call-to-action (CTA) button: "Learn More" or "Be the First to Know" * Goal: Build excitement and anticipation for the new product line. Email 2: Preview Email (sent 4 days before launch) * Subject Line: "Sneak Peek: Our Newest Collection" * Content: + Exclusive preview of 2-3 key products from the new line. + Behind-the-scenes content, such as design sketches or product development stories. + CTA button: "Sneak Peek Exclusive" or "Get Early Access" * Goal: Create a sense of urgency and encourage subscribers to engage with the brand. Email 3: Launch Email (sent on launch day) * Subject Line: "Our Newest Collection is Here!" * Content: + Full showcase of the new product line, including all available products. + Special introductory offer, such as a discount or free shipping on orders over $50. + CTA button: "Shop Now" or "Explore Our Collection" * Goal: Drive sales and encourage subscribers to make a purchase. Email 4: Abandoned Cart Email (sent 3 days after launch) * Subject Line: "Don't Miss Out: Complete Your Order" * Content: + Reminder of the items left in the cart. + Exclusive offer, such as a discount or free shipping...	Response: Content Anatomy: A. Hook: "Introducing Our Newest Collection: Modern Meets Minimalism" B. Narrative: "We're excited to introduce our latest collection, featuring sleek designs and high-quality materials..." C. Climax: "Get ready to elevate your style..." D. Resolution: "Shop Now and discover your new favorite pieces!"
Create a Facebook ad for a fitness program targeting busy professionals, aiming to drive sign-ups for a free trial.	"Here's a sample Facebook ad for a fitness program targeting busy professionals: Headline: Get Fit, Not Overwhelmed: Free Trial for Busy Professionals Text: Are you tired of sacrificing your health and wellness for the demands of your career? Call-to-Action (CTA) Button: Sign Up for Free Trial Targeting: Demographics: Adults 25-50 years old, Interests: Fitness, health, wellness, career development, entrepreneurship, productivity"	Assistant: Here's a potential Facebook ad for a fitness program targeting busy professionals: Headline: Get Fit in 30 Minutes a Day (No Gym Required!) Image: A high-energy image of a busy professional (e.g. a young executive or entrepreneur) exercising at home, with a laptop and phone in the background to represent their demanding schedule. Body: Are you a busy professional who struggles to find time to exercise? Do you wish you had more energy and confidence to tackle your day? Introducing [Program Name], the fitness program designed specifically for people like you. Our program is tailored to fit your busy schedule, with quick and effective workouts that can be done from the comfort of your own home (or office!). Benefits: * Boost your energy and productivity * Improve your focus and concentration * Get fit in just 30 minutes a day * No gym membership required * Personalized coaching and support Special Offer: Sign up for a FREE 7-day trial and get: * Access to our comprehensive workout library * Personalized coaching and support from our expert trainers * A FREE consultation with one of our fitness experts to create a customized workout plan Call-to-Action (CTA): Try it now and start feeling the benefits of a fit and healthy lifestyle! Targeting: * Job titles: Executive, Manager, Entrepreneur, Small Business Owner * Industries: Finance, Technology, Healthcare, Education * Interests: Fitness, Health, Wellness, Self-Improvement, Productivity * Behaviors: Busy professionals, people who have exercised in the past but haven't in a while, people who are interested in health and wellness Ad Budget: $500 to start, with a daily budget of $50 to target busy professionals in the desired locations. Ad Duration: Ongoing, with regular monitoring and optimization to ensure the best possible results. Tracking and Optimization: * We will track the ad's performance using Facebook's built-in metrics, including reach, clicks, conversions, and cost per conversion. * We will regularly monitor the ad's performance and make adjustments as needed to optimize the targeting, ad creative, and ad budget. This is just a starting point, and we can adjust the targeting, ad creative, and ad budget based on the results and feedback from the free trial sign-ups.	Response: Content Anatomy: A. Hook: "Get Fit, Feel Amazing - in Just 20 Minutes a Day!" B. Narrative: "As a busy professional, finding time for the gym can be tough..." C. Climax: "Try our program for FREE..." D. Resolution: "Sign up now..."
Write a sales email for a B2B software solution targeting marketing managers, with the primary objective of scheduling a demo.	"Here is a sample sales email for a B2B software solution targeting marketing managers: Subject: Boost Marketing Efficiency with [Software Name] Body: Dear [Marketing Manager's Name], Call-to-Action: Click this link to schedule a demo... Best regards, [Your Name]"	Assistant: Here's a sample sales email: Subject: Unlock the Full Potential of Your Marketing Efforts with [Software Name] Dear [Marketing Manager's Name], As a marketing manager, you know the importance of staying ahead of the competition and making data-driven decisions to drive business growth. But, let's face it - managing multiple marketing channels, campaigns, and teams can be overwhelming, leading to wasted time, resources, and opportunities. At [Company Name], we've developed [Software Name], a cutting-edge B2B software solution designed to streamline your marketing operations, enhance collaboration, and provide actionable insights. Our platform has helped numerous marketing teams like yours: * Easily manage and optimize multi-channel campaigns * Automate repetitive tasks and workflows * Analyze and visualize data to inform strategic decisions * Enhance collaboration and communication across teams I'd love to show you how [Software Name] can help your team achieve its marketing goals. A personalized demo would be an excellent opportunity for us to discuss your specific challenges and showcase the benefits of our solution. Schedule a Demo Today During the demo, we'll cover: * How to simplify campaign management and optimization * How to automate and streamline workflows * How to unlock actionable insights and improve decision-making * How to enhance collaboration and communication across teams Would you like to schedule a 30-minute demo at your convenience? Please reply to this email or click the link below to select a time that suits you best: [Calendly or ScheduleOnce link] Looking forward to showing you the power of [Software Name] and exploring how it can help your team succeed. Best regards, [Your Name] [Your Title] [Company Name] [Contact Information] This email aims to: * Address the marketing manager's pain points and challenges * Highlight the key benefits and features of the software * Encourage the recipient to schedule a demo to see the solution in action * Personalize the email by addressing the recipient by name and providing a clear call-to-action.	Response: Hook: "Unlock Your Marketing Potential with Our Software Solution" Narrative: "Are you tired of relying on manual processes..." Climax: "Schedule a demo with our expert team..." Resolution: "Book your demo today..."

Analysis of Fine-Tuned Model's Strengths

When you look at the outputs from the original 405B model and the fine-tuned 8B model, the difference is pretty striking. The fine-tuned model is more aligned with practical, real-world applications. Here’s why it stands out:

Focused and On-Point: The fine-tuned model delivers exactly what you need without unnecessary details. This makes it highly practical, as you get content that’s ready to use right away.
Clear and Concise: The fine-tuned model excels in clear and simple communication. Whether it’s crafting an ad, a social media post, or an email, the output is always straightforward and effective, which is crucial for capturing attention quickly.
Tailored to the Task: The fine-tuned model provides responses that are specifically suited to the task at hand. It doesn’t give generic outputs but rather offers content that fits the exact situation, making it incredibly useful for targeted marketing efforts.
Time-Saving: The efficiency of the fine-tuned model means you spend less time editing or adjusting the content. It’s nearly ready to go as soon as it’s generated, which is a huge advantage when you’re managing multiple tasks.

Real-Life Applications

For Marketers: This model acts like a reliable assistant who understands what kind of messaging will resonate with your audience. It helps create content that’s not just effective but also precisely targeted.
For Content Creators: The fine-tuned model is a time-saver, providing responses that are nearly ready to be used, streamlining the content creation process.
For Busy Professionals: The model’s ability to cut through the noise and deliver exactly what you need makes it an invaluable tool, allowing you to focus on more critical tasks.

Conclusion

In conclusion, the fine-tuned 8B model proves to be a powerful and practical tool for anyone needing to create content that’s focused, effective, and ready to use. It eliminates the excess and delivers clear, precise results that save time and effort. Whether you’re crafting a marketing campaign, writing an email, or generating social media content, this model is not just an improvement—it’s a game-changer that enhances your daily workflow and boosts the quality of your outputs.

The entire process of fine-tuning and generating content using the 8B model was achieved at a cost of approximately $3-5, making it an affordable and efficient solution for high-quality content creation.

Making AI Smarter and Smaller: A Practical Guide to Efficient Model Training