February 1, 2024

Stream DataStax Updates with MQTT, WebSockets, and Server Sent Events

Welcome to Community Posts
Click below to read the full article.
Arrow
Summary of What to Expect
Table of Contents

Stream DataStax Updates with MQTT, WebSockets, and Server Sent Events

In Part I and Part II of this series, we demonstrated how to stream data from external databases, namely AWS DynamoDB and MongoDB Atlas, into HarperDB. Then we used MQTT, WebSockets, and Server Sent Events to stream that data to downstream consumers (e.g., other HarperDB instances, data consumers). 

In this third and final installment, we’ll demonstrate the same capabilities with DataStax. By the end of this demo, we should be able to follow along Jaxon’s original demo using all three data connections. 

Part 1: Streaming Data from DataStax

DataStax is a cloud database-as-a-service provider with products based on Apache Cassandra (Astra DB) and Apache Pulsar (Astra Streaming). In this demo, we will publish data into Astra DB and use Astra Streamingto it to get data into HarperDB. 

Astra Streaming Setup

To get started, sign up for a Astra DB account. Next, create our first Astra DB table using the Astra Portal. Since we are not using any of the Generative AI workloads, we can create a Non-Vector type database. Let’s create a new table here and make note of this. 

Next, we need to grant access to our database. On the portal, click on the Organizations dropdown and select Manage Organizations. Here we can select the Tokens tab and attach the role to access our database. Select Generate Token and download the Client ID, Client Secret, and Token. 

On the Astra Streaming side, we need to first create a Tenant. Then, we can either use the “public” topic that is created by default or create a new topic. Navigate to the source tab and list Astra DB as one of the data sources using the portal. Here we also need to make note of two things: service-url (broker service url in this format: pulsar+ssl://broker.example.com:6651 and topic in this format: persistent://stream/namespace/topic-name).

We also need to create a token for Pulsar. Follow these instructions to generate a Pulsar token. 

Finally, we need to download the secure connect bundle to let our drivers access the database. Follow these directions to download the bundle

HarperDB Setup

Assuming you have completed the setup process from previous posts, navigate to the credentials file. We will need to modify the DataStax portion of `dbs/credentials/credentials.js` 

datastax: {
   tableName: 'XXXX',
   clientId: 'XXXX',
   secret: 'XXXX',
   token: 'XXXX', // pulsar token
   serviceUrl: 'pulsar+ssl://XXXX:6651',
   topic: 'persistent://XXXX',
   subscription: 'edgetl-demo',
   secureConnectBundle: path.join(__dirname, 'datastax.secureconnect.zip'),
 },
 

Update the configuration with the information from above. Note that the token is the Pulsar Token. Also, make sure the secure connect bundle name matches and is located in the same directory.

Sending Simulated Data

Now that we have everything set up, we are ready to send some data. The example code actually has a demonstration UI that lets you publish to DataStax (it uses fake data underneath) and then pull that data into HarperDB. 

Like in previous articles, if you would like to use the UI to send data, you can follow the UI setup portion of the README to install the UI portion. Again, note that if you are running locally, you may run into issues with CORS and will have to allow CORS on your browser. Alternatively, you can run the commands under `dbs/datastax` manually as well.

The `ingest.js` file sends some random UUID with random lorem ipsum content. It crafts an insert statement and publishes data to Astra DB. The `cdc.js` file handles Change Data Capture by utilizing the Astra Streaming client to get new data from Astra Streaming topics and publishes the records to HarperDB. 

Either using the UI or manually invoking the functions, try sending some data. You should see the fake data on HarperDB Studio populate like in the demo video. 

Part 2: Setting up MQTT, WebSockets, and Server Sent Event Subscriptions

Setting up subscriptions are handled via the `harperdb-config.yaml` file. Let’s go through each one in more detail. 

For MQTT, we will set the following config:

mqtt:  
network:    
port: 1883    
securePort: 8883 # for TLS  
webSocket: true # will also enable WS support through the default HTTP interface/port  
requireAuthentication: true

In the UI demo, locate the `MQTTWS.js` file. Here we can see subscriptions to MQTT using the configuration file. It parses the messages and updates React state to show the data. 

WebSockets utilize the REST interfaces and use the `connect(incomingMessages)` method on resources. In the `WS.js` file, you can see a new WebSocket connection. It uses the addEventListener functionality to listen to new messages and parses them like the MQTT connection. 

Finally, for Server Side Events, a new EventSource is added via config. Note that Server Side Events actually use the REST server interface underneath the hood.. 

Wrapping Up

In this final installment, we were able to stream DataStax updates to Astra DB via Astra Streaming to HarperDB. We then used MQTT, WebSockets, and Server Side Events to send that data to downstream consumers. If you were following along, you can now see all three external database connections working like in the demo. 

Stepping back, in this series, we walked through how to stream data from external databases and also expose that data in real-time to other consumers all within HarperDB using components. The flexibility to use MQTT, WebSockets, or Server Sent Events allows you easily expose real-time access to data without leaving the framework.