Chat with your own, local data

Cloud-based LLMs (ChatGPT, Bard etc.), to LocalLLM aka PrivateGPT

May 19, 2023

The beginning

ChatGPT from OpenAI kicked off the golden era of LLMs last year. Within no time, it took the world by storm and became the reason for other big tech companies to scramble their offerings to be delivered to the public. The primary example is Bard from Google. It quickly grew to be one of the quickest applications to get 100 million users within a very short time.

Microsoft, ChatGPT and Google

Microsoft, a major player in the technology industry, has provided ChatGPT with all of its computing power and has bought a 49% stake in the company, while OpenAI has to pay the costs over a period of 10 years. ChatGPT integration with Bing was supposed to be very beneficial for Microsoft, which has aspired to compete with Google’s Chrome and the search capabilities provided by Google for many years. Interestingly, even after the integration of ChatGPT capabilities into Bing, as of today (05/19), its market share has increased to 2.79%, while Google still enjoys 92.63%. Time will tell if Microsoft can get users to use more of its Bing platform. Many have chosen to stick with the ChatGPT native interface, either by using it for free with restrictions or paying $20 for the Plus version. Recently, Plus users have been provided with all the plug-in capabilities by OpenAI that have been long sought after. Yesterday, ChatGPT dropped the iOS app from the App Store while garnering the title “Must-have iPhone Apps”. The app is sleek and cool, especially with the haptic feedback. You can turn off the haptic feedback in the settings.

microsoft and google trying to capture the market share from customers using chrome and bing respectively. high fidelity. NikonD 370

LLM evolution, Chatbots, privacy concerns and Opensource community support

While supporting the iPhone app, Apple itself has banned its employees from using ChatGPT, according to the latest news sources. It’s along the same lines as Samsung, which was banned after its employees leaked confidential data. ChatGPT has raised many concerns with respect to ethics, bias, security etc. Out of all the concerns we have around AI as such, security will be the primary one for enterprise adoption. Enterprises do see immense value in utilizing LLMs both internally and in external-facing applications that their customers use. Let’s look at the past experiences that we have all had at some point or another. Enterprises have long been implementing their own versions of chatbots primarily focused on helping their customers, with varying degrees of success. Examples include Erica from Bank of America, which is a text-based chat interface that will help customers navigate the app with deep dives into specific functionality within the app. These auto-chatbots will help customers with a pre-defined set of utterances or text-based inputs tied to specific functionality. With ChatGPT in their sights, enterprises can expand their existing services to have a greater impact on customers. However, as previously mentioned, many organizations across industries would be most concerned about the security of their internal data and customer data used by ChatGPT. This has led to looking for options to see how similar functionality can be provided on edge devices and on premise installations of ChatGPT-like offerings. Since long, the open source community has always been at the forefront of helping with innovation that helps the general public. In just the last few weeks, open source community support by Nomic (GPT4All), HuggingFace (for models, embeddings, vector databases, etc.), and many others has led to various offerings that developers can download and run on their local PCs, even on CPUs without GPU support, with considerable latency nonetheless currently. Meta, with the release of the LlAMA (Large Language Model Meta AI) LLM that has model sizes ranging from 7B to 65B parameters, has opened up the race for making the LLMs target local devices like PCs. The improvements following the release is going to target edge devices and hand held devices. In addition to this, the introduction of the Langchain framework has helped expedite the creation of LLM-based applications. Langchain supports the standard interfaces for interacting with LLMs along with a collection of pre-built chains that can be used to perform common tasks, such as question-answering and summarization.

Equipped with this background information, now, let’s look at one of the implementations that has made headlines in the last week, and that is PrivateGPT. This is based on GPT4All bindings for Python (pyGPT4all) along with Llama-CPP support. The following steps clearly outline how to make it work on a Mac. The initial version of the release had issues with the ingestion taking too long. I had to quit using the system since it took more than an hour for a 1 MB file to create the text chunks and index them. There are many tools that are put together to make this happen. You can read through the Readme.md file in the github repo to learn about all the dependencies required to run this. However, the latest update was coded to use Huggingface embeddings instead of Llama embeddings. That saw around 500x speed in creating the index files. Also, this latest release has extended support to many formats of documents. I am yet to see .csv files working properly on my system. I was successful at verifying PDF and text files at this time.

Below is a sample video of the implementation, followed by a step-by-step guide to working with PrivateGPT.

Pre-conditioning the system
(This step is highly recommended unless you have a system that is already set up)

1. Uninstall the existing Conda installation

Open the Terminal and run the following command to remove the existing Conda:

conda install anaconda-clean

anaconda-clean --yes

This will remove the Conda installation and its related files.

2. Install Miniforge for arm64

Miniforge is a community-led Conda installer that supports the arm64 architecture. Download the installer for arm64 (Apple Silicon) from the following link:

https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh

Open the Terminal, navigate to the directory where the installer was downloaded, and run the following command to install Miniforge:

chmod +x Miniforge3-MacOSX-arm64.sh ./Miniforge3-MacOSX-arm64.sh

Follow the instructions to complete the installation. Restart your Terminal after the installation is complete.

4.Create a new Conda environment for privateGPT

Run the following command to create a new Conda environment with Python:

conda create -n privateGPT python=3.10

Activate the environment using:

conda activate privateGPT

NOTE: You will see your terminal now has (privateGPT) on the left hand side. See the image below.

This means you’re in the environment you just created. Make sure you always have it activated when you are doing the following steps.

5. Install CMake

Next, make sure CMake is installed. You can install CMake using the following command:

conda install -c conda-forge cmake

Install PrivateGPT

1. Clone github

Open a new terminal in vscode and run the following:

Clone the privateGPT github from https://github.com/imartinez/privateGPT.git using the command in the directory you wish run the installation from.

git clone https://github.com/imartinez/privateGPT.git

Now whenever you want to run your GPT4All apps be sure to activate this environment before starting.

2. Environment setup

In order to set your environment up to run the code here, first install all requirements in the folder you cloned from the original gitrepo. These instructions are copied from Readme file from the git repo with the modifications I did to make this work.

pip install -r requirements.txt

Then, download the LLM model and place it in a directory of your choice:

LLM: default to ggml-gpt4all-j-v1.3-groovy.bin. If you prefer a different GPT4All-J compatible model, just download it and reference it in your .env file.
Note: This version works with LLMs that are compatible with GPT4All-J.

Rename example.env to .env and edit the variables as shown below.

MODEL_TYPE: Set to GPT4All
PERSIST_DIRECTORY: db
MODEL_PATH: models/ggml-gpt4All-j-v1.3-groovy.bin
MODEL_N_CTX: 1000
EMBEDDINGS_MODEL_NAME: all-MiniLM-L6-v2

Note: because of the way langchain loads the SentenceTransformers embeddings, the first time you run the script it will require internet connection to download the embeddings model itself.

3. Activate your privateGPT Environment in VSCode

Open a new terminal in vscode and run the following:

conda activate privateGPT

Now whenever you want to run your GPT4All apps be sure to activate this environment before starting.

4. Ingest the files

In order to start chatting with your data, you need to ingest the files by placing them in the “source_documents” folder. Cloned repo will be using “state of the union” text file as sample. You can place any PDF file there.

To do: Place a set of files that are disparate in topics and try to chat on those topics.

python3 ingest.py

5. Ask questions to your documents, locally!

In order to ask a question, run a command like:

python privateGPT.py

And wait for the script to require your input.

> Enter a query:

Hit enter. You'll need to wait 20-30 seconds (depending on your machine) while the LLM model consumes the prompt and prepares the answer. Once done, it will print the answer and the 4 sources it used as context from your documents; you can then ask another question without re-running the script, just wait for the prompt again.

Note: You could turn off your internet connection, and the script inference would still work. No data gets out of your local environment.

Type

exit

to finish the script.

How does it work?

Selecting the right local models and the power of LangChain you can run the entire pipeline locally, without any data leaving your environment, and with reasonable performance.

ingest.py uses LangChain tools to parse the document and create embeddings locally using HuggingFaceEmbeddings (SentenceTransformers). It then stores the result in a local vector database using Chroma vector store.
privateGPT.py uses a local LLM based on GPT4All-J or LlamaCpp to understand questions and create answers. The context for the answers is extracted from the local vector store using a similarity search to locate the right piece of context from the docs.
GPT4All-J wrapper was introduced in LangChain 0.0.162.

Pls. post your comments and suggestions. Thank you for reading!

Pragmatic AI

Discussion about this post