How to build an AI Chatbot in a secure environment with your data

Unleash the power of Large Language Models by building a Chatbot Application in a secure environment with your data. The blog article is a beginner’s guide to building a simple question answering chatbot app with custom data using LangChain and Llama-2.

Introduction

Our first article in this series Unleashing the Power of ChatGPT: Opportunities with Large Language Models (scalesology.com), was a brief introduction on getting started with ChatGPT and discussed the different ways organizations can leverage using large language model (LLM) powered applications successfully.

This article is a deep dive into the LLM multiverse (trust me it is a multiverse!) with a sample implementation of a simple question answering chatbot that is being driven using Llama 2, which is the latest open-source large language model released by Meta and is available free for research and commercial use (Llama 2 research paper).

In addition to Llama-2 language model, we will be building this chatbot using Python paired with the following tools:

1. Llama.cpp: llama.cpp is a Python binding for llama.cpp. It supports inference for many Large Language Models which can be accessed on Hugging Face

2. LangChain: LangChain is a framework for developing applications powered by large language models.

3. FAISS Vector Database: FAISS is a library for efficient similarity search and clustering of dense vectors. It will serve as our vector store in this article.

Retrieval Augmented Generation

Question: How do we combine private data with a large language model? Answer: Retrieval Augmented Generation!

Large language models (also known as foundation models) are usually trained offline, making the model agnostic to any data that is created after the model was trained. Additionally, foundation models are trained on very general domain corpora, making them less effective for domain-specific tasks. You can use RAG to retrieve data from outside a foundation model and enhance your prompts by adding the relevant retrieved data in context.

This sounds too focused on data about a person, when really we want to talk about private data, so private is the better word.

Image of AI Chatbot Architecture — Image Source: https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-customize-rag.html

In this article, the FAISS vector store will serve as an external data source to be used paired with the Llama-2 LLM.

Question: What is a vector store? Answer: A vector store is a database that stores vector embeddings in collections. Vector embeddings are high-dimensional values that represent unstructured data, such as images, audio, videos, and text. Vector stores are useful for storing and analyzing large amounts of data. It’s like feeding your LLM with memories.

With a RAG architecture in an application, the external data used to augment your prompts can come from multiple data sources, such as databases, APIs, personal documents having data stored in various file formats. The data from these sources is then transformed and stored in a vector database to be used by the LLM for performing queries. The first step is to convert your documents and any user queries into a compatible format to perform relevancy search. To make the formats compatible, a document collection--or knowledge library, and user-submitted queries are converted to numerical representations using embedded language models. Embedding is the process by which text is given numerical representation in a vector space. These text embeddings generated from the documents are then stored in the vector database.

Let’s Build a Chatbot with Custom Data!

1. Setup your Python environment

This article is written with the assumption that the reader has basic familiarity with installing packages and setting up virtual environments using Python programming language.

Now, create a requirements.txt file as follows for your virtual environment:

langchain==0.0.276 sentence-transformers==2.2.2 llama_cpp_python==0.2.6 pypdf==3.15.4 faiss-cpu==1.7.4

Installing these libraries in your environment should suffice as all other necessary libraries will be downloaded automatically. Note: This setup only uses CPU. GPU utilization requires additional configuration.

2. Start by importing all the necessary Python libraries. Each code snippet pasted below represents a new cell in a Jupyter Notebook, make sure your Jupyter notebook is pointing to the virtual environment we setup earlier:

3. To build this chatbot we will be using the 4-bit quantized version of the Llama-2 LLM available from the link pasted below through Hugging Face. Download and save this file locally to a ‘models’ subdirectory:

https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/blob/main/llama-2-7b-chat.Q4_K_M.gguf

4. Next, create a function to split the documents into chunks and store them in a list using the RecursiveCharacterTextSplitter module along with the split_documents method from Langchain:

5. Define a function to initialize a FAISS database instance from the chunks that were created in the previous function. The from_texts class method from Langchain helps us in ‘embedding’ the raw documents using the ‘embeddings’ variable defined earlier in the code and returns a vector store object:

6. Load our locally stored documents into a file list:

7. Next, we generate a single vector store ‘vectordb0’ which is the collation of indices of all the documents in the file list and save this vector database locally. As an example, I have loaded my locally stored vector database with a copy of the English translation of the Indian epic Mahabharata downloaded from Project Gutenberg:

8. Now load the locally saved vector database, initialize the retriever to fetch context, and create a prompt template that will be supplied to the large language model chain later:

9. Define a question for the LLM Chain to answer. Here we ask who one of the personalities from the Mahabharata is:

10. Run a query on the vector database using the retriever, create the context for the prompt template and supply the prompt to the LLM chain to generate a response from the Llama-2 model:

11. The response generated above by the LLM is correct but let’s try another question:

12. Now this answer is spot on! Let’s try one more and let’s make it a challenge this time:

This time around, the chatbot stops midway while generating a response summary of the Mahabharata.

Conclusion

As you can see above the implementation of the chatbot was successful (almost!), however more tuning is needed to minimize partial answers like the one above and hallucinations (i.e. incorrect or fictional answers). There are several ways to improve the quality of responses, but that’s a topic for another blog!

Feel free to use the sample notebook in the link to the GitHub repository provided in the reference below to get started. Ready to implement a Chatbot for your organization using your own data? Contact Scalesology today and let’s talk. Let’s ensure you scale your business with the right data insights and technology.

GitHub Repository:

https://github.com/Scalesology/ai-chatbot

References

1. Retrieval Augmented Generation (RAG) https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-customize-rag.html

2. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks - https://arxiv.org/abs/2005.11401

3. Project Gutenberg:

https://www.gutenberg.org/