As the Internet of Things matures we continue to see design patterns come and go, with some evolving and some disappearing. The time has come- Store and Forward needs to go. This paradigm served as an initial step in shifting to the edge, but it should not survive. In this blog I’ll introduce the concepts of Store and Forward along with an explanation of its inferiority to other, more modern edge computing paradigms.
I’ll try my best with an unbiased explanation of the Store and Forward design paradigm. The basic principle is that data will be collected on the edge and all of it will be sent to the enterprise (cloud/servers) for processing immediately. In the event that a network connection is unavailable, data will be stored in a temporary queue until network connectivity is restored. Edge devices are relatively simple as they only contain the sensor(s), networking, and enough memory to store the queue. Enterprise servers are very large in this paradigm as they must be capable of handling and storing all data they receive. The simplified architecture diagram below illustrates this concept.
As more and more sensors and devices are added, the requirements for network throughput and computing infrastructure increase linearly. In many cases, network connectivity is provided through IoT SIM cards that are priced on data throughput volume. This in conjunction with a monthly cloud subscription can result in quite a substantial cost for processing edge sensor data.
In addition to cost, there’s the challenge of network latency. In the simplified architecture shown above the sensors are merely sending data up, where often a decision needs to be made and then sent right back to the edge. The network is not only a cost driver, but can be an additional point of failure as a major bottleneck for critical decisioning.
Before we move on to alternatives, let’s think about what edge data might look like. The edge can be many things, but it can typically be generalized into sensor data that is being collected and monitored to gain information about the operational state of a thing. That thing could be a farm, space shuttle, manufacturing line, shipping truck, or anything else in between. In all the projects I’ve worked on there’s one thing in common with this sensor data: most of that data is worthless immediately after it is generated. This is because we don’t care about the data if it tells us that all systems are operational and everything is perfect.
Sensors don’t just collect data themselves, they need some sort of edge compute device. Whether it’s a microcontroller (no OS) or a microcomputer (with OS), it has the ability to make some sort of decision instead of blindly forwarding data. In most Store and Forward cases, we find that about 95% of data could be thrown out by simply setting an overly generous threshold window.
Let’s take a car for example, assume I’m monitoring oil temperature and the normal operating temperature is around 200 °F. If there is a Store and Forward device on my car sending data once per second and I drive for an hour, that means 3600 data points are being sent no matter what the readings were. Let’s assume that I can safely set a threshold of 10% variance of that temperature, but I don’t actually have cause for concern until I have a 15% variance. I’m still going to end up sending some data frequently, for instance from when I start the car until it comes up to temperature. However, even if warmup takes 10 minutes, I’ve reduced the amount of data points sent down to 600- all with basic edge thresholding. This would not only save money on network cost, but also in enterprise storage and compute cost. Keep in mind this is all occurring on a device I’ve already purchased.
By eliminating the edge-to-cloud-to-edge feedback loop and simply moving everything to the edge, we can enable true real-time decisioning. There will always be value in having edge data in the enterprise, but by moving the decision to the edge, the transmission and storage of enterprise data becomes a secondary priority. This is vital for any sort of critical machinery.
Thresholding, as described above, is an easy way to get started with edge decisioning with many alternative options. Monitoring rolling averages, trending, and algorithmic processing (to name a few) are all next steps. Microcomputers are so powerful today that people are running full blown machine learning models on the edge. The level of complexity required will vary with each implementation, but adding some sort of logic to the edge is a no-brainer.
Once those decisions are made, the results can and should be sent to the enterprise. Data can then be analyzed across all nodes for a full view of the edge. A favorite HarperDB design is to send the decision along with the data we used to make that decision up to the enterprise. This way we can rerun any algorithms and tweak them to be more accurate overtime. This data is sent after any critical actions are taken as a secondary priority, still accounting for a significant reduction in the consumption of network and enterprise resources.
Technology advances, design patterns change, and we constantly see architectures evolving. We went from mainframes, to personal computers, to the Internet of Things. This is healthy innovation. Store and Forward was fine when we were working with dumb edge devices, but it’s time to sunset this design pattern- we can even be nice and call it “legacy.” While Store and Forward was a great stepping stone, it’s time to embrace edge decision making and the numerous benefits that come along with it.