Anthropic tutorial: How to build your own Judicial AI Assistant

Thursday, May 25, 2023 by RafiqB

What is Claude?

Claude is a Large Language Model, created by Anthropic. Claude can help you as a chatbot, summarization tool, code-writing assistant, and more! Recently Anthropic announced that Claude is increasing its context size to 100k tokens, which is around 75 000 words! This is a large capacity that will allow many people to speed up their work with large documents and books. Previously, just reading such long texts could take about 5h. Now the model will be able to read, summarize, analyze the text, and answer questions in a few minutes! And also Anthropic’s Claud is focused mostly on safety. Also, users claim that interaction with their LLM gives more human feeling. Maybe a new leader arises, and we will all be using Anthropic Apps in the near future?

Maybe, but let’s first test what we came for and check it out!

How to use it?

To use Claude you must apply for early access!

Today I'll be using the Anthropic Python SDK to make it easier for us to work with the models. You can also use the API or TypeScript/JavaScript SDK.

Legal Tech - AI for Law

In the complex domain of legal affairs, the ability to accurately analyze and interpret legal documents can make a significant difference. However, the intricate language and length of these documents often make the process cumbersome and time-consuming. With this simple example we can attempt to explore how Anthropic's Claude can help disseminate these large texts, by quickly going through these lengthy texts in a matter of seconds and extracting pertinent information, comprehensive insights including the potential impact, sentiment, repercussions, and possible pitfalls or caveats of certain legal passages, like those found in contracts.

What is interesting for us to explore here, is not the capabilities we are familiar with, such as summarization, prediction, etc, but to explore Claude's constitution as a Constitutional AI, and handling of large, language-complex prompts.

What are we building?

In short, we will build a very simple API, leveraging claude-v1-100k model to extract information from these large prompts.

Files

Ideally, we would have a legal database we can query, or a more robust search interface to further automate the process, however, for the brevity of this tutorial I would like to use local files in our working directory. For starters, I will be using files with tokens in the range of [40000 and 80000] but feel free to test the limits! Remember that the model is Claude-v1-100k. We'll see how Claude handles them. The files will be in .pdf format so to deal with them we will use PDF Reader.

Dependencies

Let's start by creating a new directory and virtual environment.

mkdir claude_tutorial
cd claude_tutorial

python3 -m venv venv

# Linux/MacOS
source venv/bin/activate

# Windows
venv\Scripts\activate.bat

For the purpose of this tutorial, we will use PyPDF2 and Anthropic SDK. Let's install them!

pip install PyPDF2 pycryptodome # PyPDF2 and ycryptodome are used to read PDF files
pip install anthropic # Anthropic SDK

Additionally, we can also run this in a FastApi server, so let's add these dependencies

pip install fastapi uvicorn # framework for creating APIs and a server to serve those APIs respectively

Now is the time to scaffold our API

First import the necessary libraries.

import os

from PyPDF2 import PdfReader

import anthropic

from fastapi import FastAPI, Response

Also, I have with me an API Key that I obtained from my early access.

API_KEY = "sk-ant-..."

anthropic_client = anthropic.Client(API_KEY)

app = FastAPI()

Usage

Let's first define our functions to read our pdf file and leverage Claude to analyze the document, we will also define an output structure for easy extraction of information from the response.

First, let's create a function that will analyze the legal case in the given PDF file. We will provide the path to the file, then read it, check the length of the text and if it is okay, then send it to the API for analysis!

async def mine_case(path: str, input_prompt: str) -> str:

    reader = PdfReader(path)
    text = "\n".join([page.extract_text() for page in reader.pages])

    no_tokens = anthropic.count_tokens(text)
    print(f"Number of tokens in text: {no_tokens}")

    if no_tokens > 100000:
        raise ValueError(f"Text is too long {no_tokens}.")

    prompt = prompt = f"""
    {anthropic.HUMAN_PROMPT}: here's a case file extract in <case> tags <case>{text}<case>
    {anthropic.HUMAN_PROMPT}:understand then present the key pieces such as case ID, date, Plaintiff, Appellent, what is the case type, jurisdiction, a short summary, sentiment and its impact on business, and adverse findings, and outcome and put them in separate xml tags.
        \n\n{anthropic.AI_PROMPT}:\n\ncase:"""

    res = anthropic_client.completion(prompt=prompt, model="claude-v1.3-100k", max_tokens_to_sample=1000)
    return res["completion"]

Notice that we have a prompt and used XML tags to structure our prompt and response, feel free to customize the prompt!

Note that the stop token is \n\nHuman

Now we can use it to extract information form our cases! But first, let's make a quick endpoint to invoke this function.


# Add our endpoint that invoke our function we created earlier and set it to return response as xml

@app.get("/case", response_class=Response, responses={200: {"content": {"application/xml": {}}}})
async def get_case():

    # let's define our prompt and pass it to our function
    
    return Response(content= await mine_case('test.pdf'), media_type="application/xml")

Now let's run our server, and navigate to our localhost to test the api via Swagger!


uvicorn main:app --reload

Result

<date> 2021</date>
<case>Michele Yates v. Pinellas Hematology & Oncology, P.A. , 8:16-cv-00799- WFJ -CPT</case>
<Plaintiff>Michele Yates,</Plaintiff>
<Appellent>Pinellas Hematology & Oncology, P.A.</Appellent>
<casetype>Qui Tam Action</casetype>
<jurisdiction>United States Court of Appeals For the Eleventh Circuit</jurisdiction>
<shortsummary>The jury in this qui tam case found that Pinellas Hematology & Oncology violated the False Claims Act, 31 U.S.C. § 3729 et seq., on 214 occasions, and that the United States had sustained $755.54 in damages.The district court trebled the damages and imposed statutory minimum penalties of $1,177,000.</shortsummary>
<sentiment>Adverse</sentiment>
<impact>The adverse findings and penalties imposed can negatively impact the business and reputation of the company.</impact>
<adversefindings>The jury found Pinellas Hematology & Oncology violated the False Claims Act by defrauding the federal government through the submission of 214 false claims resulting in damages of $755.54.</adversefindings>
<outcome>The district court imposed treble damages of $2266.62 and statutory penalties of $1,177,000.</outcome>

We can stop here, however as a bonus, let's add an endpoint to disseminate research papers and give us a TL;DR version of key findings, this might provide more insight into how to influence the prompt to assume a role.

We can simply define a new prompt and a new endpoint, and get to testing!


    # Example prompt

    prompt = f"""{anthropic.HUMAN_PROMPT}: here's a research article extract in <article> tags <article>{text}<article>
    {anthropic.HUMAN_PROMPT}: As an expert researcher, peer reviewer and principal investigator, read and understand the article then present the key findings and supporting arguments in bullet points.
        \n\n{anthropic.AI_PROMPT}:\n\nsummary of effective dissemination of this research:"""

We can keep going, exploring other use cases, if you would like some homework before the Hackathon, build a 'healthy and safe' News Digest using an RSS Feed from a questionable media outlet, find loopholes in tricky contracts languages, create child-friendly stories from music lyrics....

These should get you familiar with some of the tips from Anthropic to better interact with Claude!

Conclusion

As you can see, we have insight into the key information of the case (with more than 100 pages) in seconds. This means that Anthropic's Claude is able to handle large texts. What we can expore further is to use a prompt to summurize the development of the case in the court, and the most important arguments, and many more insights!

And if you want to build your own Anthropic application, you will soon have a unique opportunity to skip the waitlist. If you are a member of the lablab.ai’s community, and you signed up to Anthropic Hackathon before 23.05, check our step by step guide on how to get Anthropic Claude API before everyone else.

And if you didn't get to Anthropic this time, follow closely for upcoming Artficial Intelligence Hackathons - we are cooking something for our amazing community soon!

Get Hacking! - @newnative