Select Star Logo
November 18, 2020

HarperDB Containerization Journey

Generic Placeholder for Profile Picture
November 18, 2020
Zachary Fowler
Co-Founder of HarperDB

Table of Contents

Note: See a more up-to-date article on using HarperDB with Kubernetes here.

How I Single-Handedly Containered HarperDB

HarperDB is a simple database that is adequately configured, works as a distributed database verging towards serverless, and is significant for micro-service-oriented architecture. It enables developers to think responsibly about the data they are collecting. Companies can begin to isolate the important data from the noise, and store only the information they need where they need it. This is a possibility now more than ever before, and containers are a significant contributor to the paradigm.

Explaining containers is out of scope for this blog; the long and the short, a container is an isolated set of applications prerequisites that use the host systems resources.


At HarperDB, I began ushering things to be organized in a container friendly manner early on. HarperDB on Docker hub was one of our first release channels. Docker created the ability for HarperDB to spawn quickly, persist data on the host, or load data into the container for an ephemeral instance. The HarperDB application and the data store are not tightly coupled; the HarperDB application can point to any HarperDB datastore. Below is a quick Docker example of two HarperDB application instances pointing to the same database.

Create one container and make port 9925 available on the docker host instance:

docker run -d -v /tmp/docker1/:/opt/harperdb/hdb -p 9925:9925 harperdb/hdb

Login to the local HarperDB management Studio at http://localhost:9925  with username: HDB_ADMIN and password: password.

Create a schema, table and add a few records.

After Login Create Schema and Table with hash attribute (primary key) id
After Login Create Schema and Table with hash attribute (primary key) id
Created two records with the + button; Rosemerry and Billie
Created two records with the + button; Rosemerry and Billie

Now create a second container attached to the same data directory

docker run -d -v /tmp/docker1/:/opt/harperdb/hdb -p 9926:9925 harperdb/hdb

Notice the host port 9926 because 9925 is in use.

Login to a new instance and create a new record! This time at http://localhost:9926.

Created Harper on localhost:9926
Created Harper on localhost:9926

Refresh the instance on 9925!



Is this useful in real life? Probably somewhere somehow, this can do something neat!

The example illustrates that HarperDB application containers are isolated instances that can mount any HarperDB storage.

Imagine a containers' host syncing data to a remote source like AWS S3, adding S3 data sync to update an S3 bucket periodically. Another container host instance across the planet pulls down the S3 data and starts a HarperDB instance. That is for another blog.

In real life, Docker is incredible for developers and database administrators. One-off instances are useful, but clusters and application stacks are more relevant for application tiers, and where Docker compose is helpful. However, Docker compose is not as robust as Kubernetes.

It has been my most recent project to get HarperDB on Helm, the Kubernetes package manager. For updates when that happens, follow HarperDB by joining our Slack Channel and/or subscribing to our company updates.

Please go check out HarperDB on the Docker hub; it provides examples to get a lot more out of the Docker image and all the configurations available so far.

Kubernetes In A Digital Ocean

Kubernetes is a robust infrastructure that helps orchestrate running containers. Kubernetes allows users to create large-scale deployments of single or multiple containers, the life cycles of containers, and the resources containers rely on, i.e., storage, network configuration, CPU, memory. With the help of a great cloud platform, DigitalOcean, you can already deploy HarperDB in their 1-click marketplace. They also provide 1-click Kubernetes apps; HarperDB will be available soon; this is a preview of HarperDB deployed on DigitalOcean with Helm.

This example assumes that you have following preconfigured: a DigitalOcean accountkubectl, doctl, and helm installed in your development environment.

Create a Kubernetes Cluster for simplicity; a two-node cluster is sufficient.

Create two node Kubernetes cluster
Create two node Kubernetes cluster

Use doctl to configure kubectl context to point to DigitalOcean Kubernetes cluster.

Configure kubectl context for cluster
Configure kubectl context for cluster

For security reasons, Kubernetes providers implement Role-based access control (RBAC). It is a method of regulating access to a computer or network resources based on individual users' roles within your organization. A dependency of Helm is tiller; the following provides tiller access to resources.

kubectl -n kube-system create serviceaccount tiller
kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
helm init --service-account=tiller

Configure tiller access control for HELM
Configure tiller access control for HELM

Helm charts use a YAML configuration file for resources exposed by the Kubernetes cluster as well as application specific resources. A few import configurations for HarperDB in the values.yaml.

    repository: harperdb/hdb
    tag: latest
    pullPolicy: IfNotPresent
    type: ClusterIP
    port: 9925
    storageClassName: do-block-storage
    persistentClaim: harperdb-pvc
    storage: 5Gi
    volumeName: harperdb-ps
    username: HDB_ADMIN
    password: password
    cluster_enabled: true
    cluster_username: clustering
    cluster_password: password
    cluster_port: 1111
    node_name: hdb-cluster-00

Most options above should be self-explanatory in the, image and HarperDB blocks. The volumes block will require an explanation. Each Kubernetes provider will expose specific storage parameters. Provisioner and Class values are available in the Kubernetes dashboard in the DigitalOcean dashboard.

Storage Provisioner and Class Values for value.yaml
Storage Provisioner and Class Values for value.yaml

The Helm chart can now create an instance of HarperDB that persists information on the host and exposes it as a service on port 9925.

helm install
helm install

The NOTES at the bottom will allow your local development environment to connect to the HarperDB instance management Studio.

Expose studio through http://localhost:8080
Expose studio through http://localhost:8080

Connect to the cluster through http://localhost:8080, which routes traffic to the Kubernetes HarperDB instance on port 9925.

Create Schema, Table and Two records
Create Schema, Table and Two records

In the real world, HarperDB would not generally be exposed to the world, and deployments would run over HTTPS. Application containers or Pods in Kubernetes would access HarperDB within the Kubernetes network. Again, for another blog, the HarperDB instance could be ephemeral, and the container could get a HarperDB storage copy from a remote source, then sync to the remote source as data is updated. Also, the HarperDB cluster could publish new data to another instance of HarperDB.

Q.E.D, What was to be shown

Containers are robust, HarperDB is powerful; their powers combined provide opportunities to think about data storage and data flow in innovative ways. Docker makes it easy to containerize an application. Kubernetes provides a command and control center to build the distributed infrastructure and orchestrate container resources and life cycle. Helm gives software providers better access to Kubernetes with easy to install deployments. HarperDB is working hard to expand its offerings to other deployment channels. If you did not know, HarperDB also offers database as a service. Your support is appreciated.

Leave comments or feedback on the original post here

While you're here, learn about HarperDB, a breakthrough development platform with a database, applications, and streaming engine in one unified solution.

Check out HarperDB