In this post, I talk about how I built an AI Chatbot with Next.js, which updates it’s model on the cloud behind the scenes using LangChain, and serves the latest response(s) using Pinecone memory. HarperDB helped me persist OpenAI responses in a NoSQL database to cut down on costs of OpenAI, and enforce rate limiting right at the edge with Next.js Middleware on Vercel.
- Next.js (Front-end and Back-end)
- LangChain (framework for developing applications powered by language models)
- Pinecone (for persisting trained indexes on Cloud)
- HarperDB (Caching OpenAI Responses & Rate Limiting)
- Tailwind CSS (Styling)
- Vercel (Deployment)
- A HarperDB account (for setting up NoSQL database)
- An OpenAI account (for OpenAI API Key)
- A Pinecone account (for persisting/saving trained indexes)
- A Vercel account (for deploying your website)
Setting up the project
To set up, just clone the app repo and follow this tutorial to learn everything that's in it. To fork the project, run:
Once you have cloned the repo, you are going to create a .env file. You are going to add the values we obtain from the steps below.
Setting up HarperDB
Let’s start by creating our database instance. Sign in to Harper Studio and click on Create New HarperDB Cloud Instance
Fill in the database instance information, like here, we’ve added chatbot as the instance name with a username and password.
Go with the default instance setup for RAM and Storage Size while optimizing the Instance Region to be as close to your serverless functions region in Vercel.
After some time, you’d see the instance (here, chatbot) ready to have databases and it’s tables. The dashboard would look something like as below:
Let’s start by creating a database (here, cache_and_ratelimit) inside which we’ll spin our storage table, make sure to click the check icon to successfully spin up the database.
Let’s start by creating a table (here, all) with a hashing key (here, hash) which will be the named primary key of the table. Make sure to click the check icon to successfully spin up the table.
- Open lib/harper.js and update the database and table values per the names given above
- Click on config at the top right corner in the dashboard, and:
- Copy the Instance URL and save it as HARPER_DB_URL in your .env file
- Copy the Instance API Auth Header and save it as HARPER_AUTH_TOKEN in your .env file
Awesome, you’re good to go. This is how the data looks for a record of rate limiting and a cached response.
Setting up Pinecone
Let’s start by creating our index instance. Sign in to Pinecone and during onboarding select Chatbot Application as the use case.
Let’s proceed with Creating an Index by clicking on Create Index:
Once done, give it a name (here, chatbot) and update the PINECONE_INDEX variable in .env file. Also, copy the environment name (here, gcp-starter) and update the PINECONE_ENVIRONMENT variable in .env file.
The final step is to head to API Keys in the Pinecone dashboard, copy the value and update the PINECONE_API_KEY variable in .env file.
Nice, the whole setup is ready. Let’s dive into the code!
Configuring NoSQL CRUD helpers for HarperDB for Vercel Edge and Middleware Compatibility
To interact with the HarperDB database, we’ll use NoSQL HarperDB REST APIs called over fetch. This approach will help us opt out of any specific runtime requirements, and keep things simple and ready to deploy to Vercel Edge and Middleware.
In the code below, we’ve defined the CRUD helpers, namely insert, update, deleteRecords and searchByValue for respective actions.
Rate Limiting Requests with HarperDB and Next.js Middleware
To ensure reliability, and as less spam as possible, we’ve implemented Rate Limiting with HarperDB at Next.js Middleware. We obtain the x-forwarded-for header in the request which is the user’s IP Address, and use that as the unique key to rate limit users with.
If the rate limit exceeds, we return Rate Limit Exceeded response directly from the middleware, saving our time on running the edge function for chat API.
The logical flow of the rateLimit function is as follows:
- It searches for records by the IP Address value in HarperDB table
- If no record is found, the user is not rate limited, and a record with number of uses as 1 is set in HarperDB table
If a record is found:
- The difference between the last use time to the time now is calculated and if it exceeds the time span permitted, it updates the record with number of uses as 1 in the HarperDB table
- The number of uses derived from the record if found less the maximum number of uses allowed, it increments the uses in the HarperDB table with the latest timestamp
- Else, the request is Rate Limited!
Retrieve Persisted Vector Index from Pinecone and Caching Personalized Responses from OpenAI with HarperDB
In this section, we explore how the vector store is retrieved from Pinecone, and OpenAI API is used to serve responses while caching them with HarperDB.
Retrieval of Vector Store from Pinecone
To load the vector store from Pinecone, in each chat API request, we create a new instance of Pinecone database class, and wait for the Vector Store class instance to be derived for our existing Pinecone Index (here, chatbot)
Lazily Streaming Responses from OpenAI API
To make sure that we’re not calling OpenAI APIs for the same set of questions repeatedly, we maintain the flow for obtaining responses as follows:
- If a record pertaining to question asked from the chatbot is found by searching with the id value in HarperDB, we return answer key’s value from our existing record stored
- In case no existing record for the question is found, using Vercel Streaming we send each chunk of response from OpenAI API as soon as possible, while when the response is completely sent, we insert a record to cache the response in our HarperDB table. Notice that we use the isChat attribute so that we can clean this record after the model is updated in the model training POST request.
Training Content with LangChain and Persisting Vector Index in Pinecone for retrieval during ChatBot Conversations
With Pinecone, we’re able to save the latest indexed vector store into the cloud. This allows us to send out responses to the user that are based on the latest and relevant knowledge of the model. Let’s dive into how one can train their model on set of URLs passed in the POST request to /api/model.
In the code (for app/api/model/route.js), we’re ensuring:
- The functions runs on Vercel Edge, made possible with export const runtime = 'edge'
- The response is always dynamic, made possible with export const dynamic = 'force-dynamic'
- Waits for the train function to finish, which is invoked with the URLs list that came in with the request. Inside the train function (file lib/train.js in the project), it takes care of fetching each URL’s content, break it into LangChain compatible documents, and update the Pinecone Index with the generated documents.
- As soon as training is done, it clears out the cached conversation responses and queries in HarperDB. This is made possible by searching all records with the value of isChat key as true, and deleting all the records by passing on the primary key (here, hash) from HarperDB. This approach allows us to cache the new responses that will be generated based on the updated knowledge of the model.
By now, you’ve learnt how to cache the responses from OpenAI API and rate limit users using HarperDB. We’ve also learnt how to train the model to have the latest knowledge and save the updated vector store using Pinecone.
Deploy to Vercel
The repository is ready to deploy to Vercel. Follow the steps below to deploy seamlessly with Vercel 👇🏻
- Create a GitHub Repository with the app code
- Create a New Project in Vercel Dashboard
- Link the created GitHub Repository as your new project
- Scroll down and update the Environment Variables from the .env locally
- Deploy! 🚀
HarperDB NoSQL Operations: https://docs.harperdb.io/docs/developers/operations-api/nosql-operations
Pinecone Vector Index: https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/pinecone