February 7, 2023

HarperDB & Anthos: A Powerful Combination for Enterprise Data Management

Welcome to Community Posts
Click below to read the full article.
Arrow
Summary of What to Expect
Table of Contents

Intro

In this article, we will explore how to setup HarperDB clusters with Anthos followed by how to integrate HarperDB on AWS, GCP, and Azure with Terraform. (If you prefer to jump straight to the code, see the repo here.)

HarperDB is a data and application platform designed to be scalable and lightweight. It is a NoSQL database with a unique, flat design that enables it to manage enormous volumes of data with minimum setup and upkeep. HarperDB is also meant to be easy to use, with a RESTful API that allows a variety of programming languages to connect with the database with relative ease.

What is multi-cloud and why?

A multi-cloud architecture is a strategy for utilising several cloud computing services from different providers, as opposed to depending just on one. This allows enterprises to take advantage of the unique strengths and capabilities of each provider, while also ensuring redundancy and business continuity in the event of a service breakdown with one source.

Multi-cloud enables several benefits, including:

  • Flexibility: Enables enterprises to utilise the finest services from many cloud providers for a variety of use cases, hence enhancing their adaptability. This can assist with optimising expenses, security, and performance.
  • Cost Optimisation: Businesses may take advantage of a variety of pricing structures and services best suited to their workloads.
  • Compliance and Security: Businesses are able to comply with a variety of rules that may apply to their data. They can also to limit the possibility of data breaches and other security issues.
  • Business Continuity: Enables businesses to achieve high availability and disaster recovery by distributing workloads across various providers.
  • Innovation: Enables enterprises to experiment with new services and technologies from a variety of providers, hence fostering innovation and digital transformation.

Multi-cloud is a strong solution for businesses seeking to optimize their cloud infrastructure and guarantee that their data is safe, accessible, and compliant.

HarperDB and Anthos

Anthos is a hybrid and multi-cloud platform from Google Cloud that allows users to upgrade their existing applications or develop new apps using the same open-source Kubernetes technology on-premises, in various clouds, and at the edge. This enables users to install and manage their apps across several environments without being restricted to a single cloud provider. By combining HarperDB with Anthos, you can simply deploy and maintain your database across several clouds and on-premises systems, while retaining access to HarperDB's unique capabilities. This can be particularly valuable for enterprises that want a highly scalable and readily maintained database, as well as the ability to deploy their applications across numerous settings.

With HarperDB and Anthos, it is simple to deploy your database to any environment that supports Kubernetes, including on-premises, Google Cloud, AWS, and Azure. Additionally, HarperDB's flat design makes it well-suited for usage with Kubernetes and containers due to its minimum setup and maintenance requirements. This enables you to install and operate your database in a containerized environment without worrying about the underlying infrastructure.

Overall, HarperDB plus Anthos is a potent combo that enables enterprises to manage and deploy their databases across numerous settings with ease, while still making use of the unique capabilities of HarperDB. Whether you are upgrading current applications or developing new ones, HarperDB and Anthos can assist you in maximizing the value of your data.

Example Use Case- Retail

An example of using HarperDB and Anthos together could be for a retail company that has a need for a highly scalable and easily manageable database to store its customer data and inventory information in real-time. The company currently has a large number of brick-and-mortar stores, as well as an online store, and they want to be able to easily manage data in one central location.

To achieve this, the company decides to use HarperDB as its central database management system. HarperDB's distributed architecture makes it well-suited for handling large amounts of data globally and its RESTful API makes it easy to interact with the database from a variety of programming languages.

Next, the company decides to use Anthos to deploy and manage HarperDB across its different environments. This includes deploying the database on-premises (or at the edge) in their brick-and-mortar stores, as well as in the Google Cloud for their online store. Using Anthos, the company can easily manage HarperDB in a containerized environment and take advantage of the unique benefits of each environment.

With HarperDB and Anthos, the retail company is able to easily manage and deploy its customer data and inventory information across all of its different environments in real-time. This allows them to have a single source of truth for their customer data and inventory information, making it easier for them to make data-driven decisions. Additionally, the company is able to take advantage of the scalability and ease of use of HarperDB, while still being able to manage its database in a containerized environment.

In summary, HarperDB and Anthos together is a powerful combination that can help organizations easily manage and deploy their databases across multiple environments, while avoiding the skyrocketing costs that come with other solutions.

Let's set it up

Setting up HarperDB with Anthos involves several steps and configurations. Here is a detailed explanation of the process:

  1. Create a Kubernetes cluster on Anthos: You can use the GKE On-Prem or GKE on Google Cloud depending on whether you need to run the cluster on-premises or in the cloud. You can use the command gcloud container clusters create to create a cluster on GKE, specifying the name and location of the cluster, the number of nodes, and the version of Kubernetes.
  2. Install Helm: Helm is a package manager for Kubernetes that allows you to easily install and manage applications on a Kubernetes cluster. You can use the command gcloud components install kubectl to install kubectl and curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash to install Helm on your cluster.
  3. Deploy HarperDB: Once you have your Kubernetes cluster set up, you can use the HarperDB Helm chart to deploy HarperDB to the cluster. You can use the command helm install harperdb ./harperdb to deploy HarperDB, specifying the name of the release. Building the helm chart: https://faun.pub/running-harperdb-in-kubernetes-in-one-command-8c87e2788eb6
  4. Configure the HarperDB pod: Once HarperDB is deployed, you will need to configure the HarperDB pod with the necessary connection information. Verify the connection: Once the pod is restarted, you can use the command kubectl get pods to check the status of the pod, and make sure it's running. You can also use the command kubectl logs to check the logs of the pod and look for any errors.
  5. Scale the HarperDB deployment: HarperDB can be easily scaled up or down as needed. You can use Kubernetes replicas to scale the HarperDB deployment. With replicas, you can specify how many copies of the HarperDB pod you want to run in your cluster. You can use the command kubectl scale deployment to scale the deployment, specifying the name of the deployment and the number of replicas.
  6. Monitor and manage the HarperDB deployment: Once HarperDB is running on your cluster, you can use Kubernetes tools such as kubectl and Prometheus to monitor and manage your HarperDB deployment. These tools allow you to check the health of your HarperDB pods, see the logs, and troubleshoot any issues that may occur.
  7. Connect to the HarperDB instance: The ingress endpoint provides the connection point to the HarperDB instance and the RESTful API allows you to perform CRUD operations. You can use the command kubectl get ingress to get the ingress endpoint and use it to connect to the HarperDB instance.
  8. Connecting to the HarperDB instance involves using the ingress endpoint and the HarperDB RESTful API to interact with the database. Here are the general steps to connect to the HarperDB instance:

    a) Get the ingress endpoint: You can use the command kubectl get ingress to get the ingress endpoint for your HarperDB deployment. The ingress endpoint is a URL that provides access to the HarperDB instance.

    b) Use the RESTful API: Once you have the ingress endpoint, you can use the REST API to interact with your HarperDB instance. This allows you to perform CRUD operations, such as creating, reading, updating, and deleting data in the database. You can use a tool such as Postman or cURL to send HTTP requests to the ingress endpoint and interact with the HarperDB instance.

    c) Authenticate: Depending on the security configuration of your HarperDB instance, you may need to authenticate to access the RESTful API. You can use the username and password that you have set in the configuration process.

    d) Test the connection: Once you have connected to the HarperDB instance, you can test the connection by sending a simple request, such as a GET request to retrieve data from the database. This will confirm that you are able to connect to the HarperDB instance and interact with the data.

    e) Start storing, retrieving, and modifying data: After you have confirmed the connection, you can start storing, retrieving, and modifying data in the HarperDB instance using the REST API.

Now, let's explore the multi-cloud setup with Terraform.

Terraform lets users write infrastructure across cloud providers. It's great for multi-cloud management, and enables users to develop reusable modules for popular cloud provider components. Terraform users may utilise variables, conditions, and loops to modularize and dynamically code infrastructure. Terraform state management can track infrastructure state and enable rollbacks and catastrophe recovery.

Terraform automates provisioning, ensures consistency across cloud providers, and improves infrastructure management in multi-cloud systems. (If you prefer to jump straight to the code, see the repo here.)

Setup

To set up a multi-cloud Anthos HarperDB cluster using Terraform, you would need to write Terraform configuration files that define the resources you want to create, and use the provider-specific modules to provision the resources on each cloud provider. Additionally, you would need to set up the necessary networking and security configuration to allow the cluster nodes to communicate across the different clouds.

Prerequisites

Before we begin, there are a few prerequisites you will need to have in place:

  • You will need to have accounts set up with GCP, AWS, and Azure and have the necessary credentials to access them.
  • You will need to have the Terraform CLI installed on your machine.
  • You will need to have basic knowledge of containerization and container orchestration.

Step 1: Setting up the Terraform Configuration

The first step in spinning up a multi-cloud HarperDB cluster is to set up the Terraform configuration. We will create a directory called harperdb-cluster and within that, we will create a file called main.tf. This file will contain the main Terraform configuration that defines the resources we want to create.

First, we will define the providers for GCP, AWS, and Azure.

provider "google" {
  project = var.gcp_project
  region  = var.gcp_region
}
provider "aws" {
  region = var.aws_region
}
provider "azurerm" {
  version = "~> 3.2"
}


Next, we will define the HarperDB cluster nodes.

#GCP
resource "google_container_cluster" "harperdb_gcp" {
  name               = "harperdb-gcp"
  location           = var.gcp_region
  initial_node_count = var.gke_node_count
}

resource "google_container_node_pool" "harperdb_gcp" {
  name       = "harperdb-gcp-node-pool"
  cluster    = google_container_cluster.harperdb_gcp.name
  location   = var.gcp_region
  node_count = var.gke_node_count
}

resource "helm_release" "harperdb_gcp" {
  name       = "harperdb-gcp"
  chart      = "harperdb/harperdb"
  namespace  = "harperdb"
  set {
    name  = "cluster.enabled"
    value = "true"
  }
}

#AWS
resource "aws_eks_cluster" "harperdb" {
  name     = "harperdb"
  role_arn = aws_iam_role.eks_cluster.arn

  vpc_config {
    security_group_ids = [aws_security_group.harperdb.id]
    subnet_ids         = data.aws_subnet_ids.harperdb.ids
  }
}

resource "aws_iam_role" "eks_cluster" {
  name = "eks_cluster_harperdb"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "eks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF
}

resource "aws_security_group" "harperdb" {
  name        = "harperdb"
  description = "Controls access to the HarperDB cluster"

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

data "aws_subnet_ids" "harperdb" {
  vpc_id = aws_vpc.harperdb.id
}

resource "aws_vpc" "harperdb" {
  cidr_block = "10.0.0.0/16"

  tags = {
    Name = "harperdb"
  }
}

#Azure
resource "azurerm_kubernetes_cluster" "harperdb_cluster_azure" {
  name                = "harperdb-cluster-azure"
  location            = var.azure_region
  resource_group_name = var.azure_resource_group
  dns_prefix          = "harperdb-cluster"

  kubernetes_version = "1.25"
  agent_pool_profile {
    name           = "harperdb"
    count          = var.azure_node_count
    vm_size        = "Standard_DS2_v2"
    os_type        = "Linux"
  }
}

When using Terraform to spin up a multi-cloud HarperDB cluster across GCP, AWS, and Azure using Anthos, you can create and use the HarperDB Helm chart to deploy and manage the HarperDB cluster on each of the cloud providers.

For GCP:

You can use Terraform to create a GKE cluster and use the kubectl command to install the HarperDB Helm chart:

resource "google_container_cluster" "harperdb_cluster" {
  name     = "harperdb-cluster"
  location = var.gcp_region
}

resource "null_resource" "install_harperdb_helm_chart" {
  depends_on = [google_container_cluster.harperdb_cluster]

  provisioner "local-exec" {
    command = "kubectl create namespace harperdb && helm install my-harperdb harperdb/harperdb --namespace harperdb"
  }
}


For AWS:

You can use Terraform to create an EKS cluster and use the kubectl command to install the HarperDB Helm chart:

resource "aws_eks_cluster" "harperdb_cluster" {
  name     = "harperdb-cluster"
  role_arn = aws_iam_role.eks_cluster_role.arn
}

resource "null_resource" "install_harperdb_helm_chart" {
  depends_on = [aws_eks_cluster.harperdb_cluster]

  provisioner "local-exec" {
    command = "kubectl create namespace harperdb && helm install my-harperdb harperdb/harperdb --namespace harperdb"
  }
}


For Azure:

You can use Terraform to create an AKS cluster and use the kubectl command to install the HarperDB Helm chart:

resource "azurerm_kubernetes_cluster" "harperdb_cluster" {
  name                = "harperdb-cluster"
  location            = var.azure_region
  resource_group_name = var.azure_resource_group
}

resource "null_resource" "install_harperdb_helm_chart" {
  depends_on = [azurerm_kubernetes_cluster.harperdb_cluster]

  provisioner "local-exec" {
    command = "kubectl create namespace harperdb && helm install my-harperdb harperdb/harperdb --namespace harperdb"
  }
}


Step 2: Setting up the Anthos Configuration

Before you can use Anthos to spin up a multi-cloud HarperDB cluster, you will need to set up the necessary configuration for Anthos. This will include creating a GKE cluster in GCP, which will act as the management cluster. Then, you'll need to install the Anthos Config Management component and configure it to manage the other clusters in AWS and Azure.

You can use Terraform to create the GKE cluster and install the Anthos Config Management component with the following configuration:

resource "google_container_cluster" "management_cluster" {
  name     = "management-cluster"
  location = var.gcp_region
  initial_node_count = var.gcp_node_count
}

resource "google_container_cluster_addon" "config_management" {
  cluster = google_container_cluster.management_cluster.name
  config_management {
    enabled = true
  }
}


Step 3: Setting up the Networking

Once the management cluster is set up, you can connect the other clusters in AWS and Azure to it. We will need to set up the necessary networking configuration to allow the HarperDB cluster nodes to communicate across the different clouds.

One option to set up communication between the clusters is to use the Kubernetes native feature called Cluster Federation. Cluster Federation allows you to spread your workloads across different clusters and different cloud providers by creating a single control plane for multiple clusters. By using this feature, you can create a single logical view of your entire infrastructure, regardless of where the clusters are running. To set this up, you will need to deploy a federation control plane, and configure each cluster to join the federation.

Another option is to use a Kubernetes service mesh like Istio. A service mesh is a configurable infrastructure layer for a microservices application that makes communication between service instances flexible, reliable, and fast. With Istio, you can set up communication between different clusters running in different cloud providers by configuring a service mesh in each cluster and connecting the meshes together.

A third option is to use a service discovery solution like Consul. Service discovery is the process of figuring out how to connect to a service. Consul is a tool that allows services to register themselves and discover other services running in different clusters and cloud providers. By configuring Consul in each cluster, you can set up communication between the clusters by allowing services to discover and connect to each other.

It's important to note that setting up communication across different clusters running in different cloud providers is a complex task and it's strongly recommended to use a managed service like Anthos.

It's important to set up a secure connection between the different cloud providers. One way to do this is by using a Virtual Private Network (VPN) to create a secure connection between the networks.

You can use Google Cloud VPN to create a VPN connection between GCP and AWS. This can be done by creating a VPN gateway on GCP and a VPN customer gateway on AWS. Once the gateways are set up, you can create a VPN tunnel between them.

You can use Terraform to create the VPN gateways and the VPN tunnel with the following configuration:

resource "google_compute_vpn_tunnel" "harperdb_gcp_aws" {
  name           = "harperdb-gcp-aws"
  ike_version    = "2"
  peer_ip = var.aws_vpn_ip
  shared_secret = var.gcp_aws_shared_secret
  local_traffic_selector = ["0.0.0.0/0"]
  remote_traffic_selector = ["0.0.0.0/0"]
  local_tunnel_ip = var.gcp_vpn_ip
  remote_tunnel_ip = var.aws_vpn_ip
  vpn_gateway = google_compute_vpn_gateway.harperdb_gcp.self_link
  target_vpn_gateway = var.aws_vpn_gateway_link
}

resource "aws_vpn_connection" "harperdb_gcp_aws" {
  type = "ipsec.1"
  static_routes_only = true
  customer_gateway_id = "${google_compute_global_address.harperdb_gcp.id}"
  vpn_gateway_id = "${aws_vpn_gateway.harperdb_aws.id}"
  ike_policy = "${data.template_file.gcp_aws_ike_policy.rendered}"
  ipsec_policy = "${data.template_file.gcp_aws_ipsec_policy.rendered}"
}

resource "azurerm_virtual_network_gateway_connection" "harperdb_azure_gcp" {
  name = "harperdb-azure-gcp"
  location = azurerm_resource_group.harperdb.location
  resource_group_name = azurerm_resource_group.harperdb.name
  virtual_network_gateway_id = azurerm_virtual_network_gateway.harperdb_azure.id
  peer_virtual_network_id = var.gcp_vpc_id
  shared_key = var.azure_gcp_shared_key
  connection_type = "IPsec"
}

resource "azurerm_virtual_network_gateway_connection" "harperdb_azure_aws" {
  name = "harperdb-azure-aws"
  location = azurerm_resource_group.harperdb.location
  resource_group_name = azurerm_resource_group.harperdb.name
  virtual_network_gateway_id =  azurerm_virtual_network_gateway.harperdb_azure.id
  peer_virtual_network_id = var.aws_vpc_id
  shared_key = var.azure_aws_shared_key
  connection_type = "IPsec"
}

This is a sample Terraform code that creates VPN connections between GCP and AWS, Azure and GCP, and Azure and AWS using the google_compute_vpn_tunnel and azurerm_virtual_network_gateway_connection resources respectively. The VPN connections are established between the VPN gateways of each cloud provider, and the peer IP addresses, shared secrets, and traffic selectors are configured.

You can also use other VPN solutions such as OpenVPN, in this case, you will need to use the provider module for OpenVPN and configure the VPN connection.

Step 4: Connecting GKE, ECS, and AKS Clusters

For GKE, you will need to configure the kubeconfig on your local machine to connect to the management cluster. You can use Terraform to execute a shell provisioner to achieve this:

resource "null_resource" "configure_kubeconfig" {
  provisioner "local-exec" {
    command = "gcloud container clusters get-credentials ${google_container_cluster.management_cluster.name} --zone ${google_container_cluster.management_cluster.zone} --project ${var.gcp_project}"
  }
}

For EKS, you will need to create a kubeconfig for the management cluster and use the kubefed command to join the EKS cluster to the management cluster.

resource "null_resource" "join_eks_cluster" {
  provisioner "local-exec" {
    command = "kubefed join eks-cluster --host-cluster-context=management-cluster-context --cluster-context=eks-context --v=5"
  }
}


For AKS, you will need to create a kubeconfig for the management cluster and use the kubefed command to join the AKS cluster to the management cluster.

resource "null_resource" "join_aks_cluster" {
  provisioner "local-exec" {
    command = "kubefed join aks-cluster --host-cluster-context=management-cluster-context --cluster-context=aks-context --v=5"
  }
}

Step 5: Deploying the HarperDB Cluster

Once the Terraform configuration and networking and security settings are set up, we can deploy the HarperDB cluster. To do this, we will run the following command in the harperdb-cluster directory:

$ terraform apply

This will create the HarperDB cluster across GCP, AWS, and Azure, and set up the necessary networking and security to allow the cluster nodes to communicate across the different clouds.

Conclusion

In this article, we explored how to utilise Terraform to deploy a HarperDB cluster across GCP, AWS, and Azure using HarperDB. We have demonstrated how to configure Terraform, as well as the networking and security settings required for cluster nodes to connect between clouds. With Terraform, we are able to deploy and manage our infrastructure as code, making it simple to duplicate and grow our HarperDB cluster across several cloud providers.