August 1, 2023

Multi Region Deployment & Geo-Load Balancing w/ HarperDB: Part III

Welcome to Community Posts
Click below to read the full article.
Summary of What to Expect
Table of Contents

In part 1 and part 2, we set up two instances of HarperDB on two different regions on AWS and linked it up to Cloudflare for load balancing. We had exposed the port used by HarperDB to service REST API requests and used random assignment from Cloudflare to route requests. In this final part of this series, we will see how to set up geo load balancing and demonstrate it via a HarperDB Custom Function Application. 

Setting Up Custom Functions

In order to enable custom functions, we need to restart our Docker containers with the environment variable `CUSTOM_FUNCTIONS=true`. 

Then we need to add our custom function logic to deploy our function. Navigate to where data is stored (i.e. `harperdb` directory if you have been following along) under custom functions. We will create a custom function package called `cloudflare`:

```
cd harperdb/custom_functions
mkdir cloudflare
```

For this simple demo, we will create an endpoint to query all records from our `dev.dog` table and return the results ordered by name. To also demonstrate geo load balancing, we’ll enrich our data with a record containing our region data. We could choose to add in an IP address or some other host information, but since we’re running this on Docker, naive approaches would just return a private IP address, which won’t be useful. So instead, we’ll just enrich the results with a custom `region` field that will grab the value from environment variables. 

Since custom functions are based on Fastify, let’s utilize @fastify/env library: 

```
npm i @fastify/env
```

Next, create `index.js` under `routes` directory:

```
mkdir routes
touch routes/index.js
```

Then paste the following code:

import fastifyEnv from '@fastify/env'
import { fileURLToPath } from 'url'
import path from 'path'


const schema = {
type: 'object',
required: ['AWS_REGION'],
properties: {
AWS_REGION: {
type: 'string'
},
}
}


const __filename = fileURLToPath(import.meta.url)
const __dirname = path.dirname(__filename)


const options = {
schema,
dotenv: {
path: `${__dirname}/.env`,
debug: true
},
data: process.env
}


export default async (server, { hdbCore, logger }) => {
await server.register(fastifyEnv, options);


server.route({
url: '/',
method: 'GET',
handler: async () => {
const body = {
operation: 'sql',
sql: 'SELECT * FROM dev.dog ORDER BY dog_name',
};
const results = await hdbCore.requestWithoutAuthentication({ body });
const response = {
region: process.env.AWS_REGION,
results
}
return response
},
});
};

This code registers the Fastify env plugin, grabs the env variable called `AWS_REGION` from `.env` file and returns that information with the results from the SQL query when the custom function base route is hit. 

Add the `.env` with the following information:

AWS_REGION=us-east-1

Replace the AWS_REGION for the appropriate region for the HarperDB instances. 

Navigate to HarperDB Studio to see that custom functions are active:

Configuring Cloudflare

To configure geo load balancing, you must first enable the `Enterprise` plan on Cloudflare. If you want to just test out geo-steering, you can opt for Proximity steering policy by adding GPS coordinates to your origin pools. 

Refer to the Cloudflare docs for configuring either option:

The rest of the steps are identical to the set up steps from Part II of this series. 

Testing Geo/Proximity Load Balancing

Now that we have Cloudflare and HarperDB configured, we can now test our custom functions endpoint. You have two options to test out geo load balancing:

  1. If you have a VPN, then you can choose different region to route requests from
  2. Alternatively, you can provision small VMs on different regions on AWS to test out networking 

To test out US-East-1 region (N. Virginia), I’m first setting my location to New York:

Then let’s curl our custom functions endpoint:

```
curl http://harperdb..com:9926/cloudflare
```

We get back:

```
{
  "region": "us-east-1",
  "results": [
    {
      "age": 2,
      "dog_name": "Charlie"
    },
    {
      "age": 4,
      "dog_name": "Coco"
    },
    {
      "age": 7,
      "dog_name": "Penny"
    },
  ]
}
```

Note that region is `us-east-1` as we expect and results show our records ordered by `dog_name` (NOTE: I omitted other fields like id and time fields for clarity). 

Now let’s switch our region to the West Coast. I set my VPN to Los Angeles. Sending the same curl command now returns:

```
{
  "region": "us-west-1",
  "results": [
    {
      "age": 2,
      "dog_name": "Charlie"
    },
    {
      "age": 4,
      "dog_name": "Coco"
    },
    {
      "age": 3,
      "dog_name": "Max"
    },
    {
      "age": 7,
      "dog_name": "Penny"
    },
  ]
}
```

We can verify that it routed our traffic to our second instance of HarperDB running on US-West-1 region with the 4 records from before. 

Wrapping Up

In this three part series, we saw how to set up the replication feature in HarperDB. In part one, we inserted records directly via the exposed ports on the VMs. Then in part two, we put those ports behind a load balancer and used the random steering policy to insert records bi-directionally. Finally, in this article, we added in a custom function with a geo steering policy to show how to route requests based on the client's geo location. 

Even though we used a simple topology in this example, you can get more creative with the topology to fit the use case you need. For redundancy, for example, you may have multiple instances of HarperDB in each region. You’d use geo load balancing to route traffic between regions, and random load balancing to select an instance within the region.