July 28, 2023

Multi-Region Deployment of HarperDB & Cloudflare via Replication: Part II

Welcome to Community Posts
Click below to read the full article.
Summary of What to Expect
Table of Contents

In the previous article, we set up two instances of HarperDB on two different regions on AWS and saw how to enable replication (formerly known as clustering). We established reciprocal publish/subscribe settings on both databases so that any writes to one of them would asynchronously replicate to the other. In Part II, we will look at adding on Cloudflare load balancing in front of our HarperDB instances to route requests to our databases randomly.

Prerequisites

In order to enable Cloudflare Load Balancing features, you’ll need the following:

  • Cloudflare account
  • Domain 
  • Load Balancing Subscription ($5.00/month)

Setting Up

After creating a Cloudflare account and establishing DNS records with the domain you own, navigate to Traffic → Load Balancing tab to enable the subscription:

The basic plan includes the first two origin servers that we can use after the flat base rate of $5/month. 

Next, we need to create our load balancer, which consists of three steps:

  1. Adding an Origin Pool
  2. Attaching a health monitor
  3. Configuring traffic steering

We start with defining the hostname and whether or not you want to proxy traffic through Cloudflare (orange cloud button on the right). In my case, I am using `harperdb.<my-domain>.com`:

Next we create our Origin Pools where we can add our IP addresses to EC2 instances running HarperDB. Grab the IP address from Part I and paste them in accordingly:

Do the same for the other region:

Next, we can attach health monitors. Since HarperDB doesn’t expose a health check or metrics endpoint, we can keep it simple with a TCP check on our exposed port (9925): 

Finally, we can configure traffic steering. Cloudflare gives several options including:

  • Failover: try primary region then failover to secondary
  • Dynamic: route traffic to fastest pool based on measured latency from health checks
  • Geo: Route to specific pools based on the Cloudflare region serving the request
  • Proximity: Route requests to the closest physical pool determine by EDNS Client Subnet GeoIP
  • Random or least outstanding requests 

For this simple demo, you can choose Random. We will demonstrate geo load balancing in Part III with custom functions. 

Finally, you can click Save to finalize the changes:

Testing Load Balancing

We are now ready to test out our load balancing scheme. Let’s do a simple test to post to our endpoint and see if the records are created in both databases:

curl --location 'http://harperdb..com:9925' \
--header 'Authorization: Basic SERCX0FETUlOOnBhc3N3b3Jk' \
--header 'Content-Type: application/json' \
--data '{
"operation": "insert",
"schema": "dev",
"table": "dog",
"records": [
{
"dog_name": "Charlie",
"age": 2
}
]
}'

You can check HarperDB Studio for the record in both instances:

Note that harperdb-2 has an extra record for Max that we purposefully did not replicate to test out publish/subscribe mechanism. 

Wrapping Up

In this series so far, we saw how easy it was to create a multi-region deployment of HarperDB with asynchronous replication. Because HarperDB has first class support for replication via clustering, it is significantly easier to set this up versus traditional SQL databases where you would either need to listen to WAL logs or rely on a managed offering from a provider. 

In our simple demo, we set up two different HarperDB instances in two regions, but you can easily extend this to a more complex setup that suits your needs. Also, you don’t have to use the HarperDB Studio UI if you would like to use the REST API endpoint to programmatically create clustering topologies. Finally, replication can work in the same region if you simply need a backup option as well. 

In the next and final part of this series, we will set up a custom function to demonstrate how we can use load balancing based on geo-location.