As companies in a variety of industries continue to take advantage of Industrial Internet of Things (IIoT) applications, they’re facing questions around how and where to best process and store vast amounts of IIoT data. Options include industrial edge computing, larger regional enterprise or colocation data centers, or cloud data centers.

It’s an important question because of the sheer size and scope of investments being made in IIoT applications. McKinsey & Company estimates companies will spend between $175 billion and $215 billion on IIoT hardware by 2025, including computing hardware, sensors, firmware, and storage. Gartner predicts that 75 percent of enterprise generated data will be stored, processed, analyzed, and acted upon at the edge by 2025. That’s a big number, but it still leaves 25 percent of data to be processed elsewhere.

Which gets back to the question of how to decide where best to store, process, and analyze all that IIoT data. To answer the question, you need to explore four factors about the data in question; the big Vs of Big Data: volume, variety, value, and veracity. In this blog post, I’ll explain how each helps drive the decision.

Volume: how much data is being generated

The amount of data being generated is the first factor to consider. IIoT environments generate large amounts of different data streams with wildly different volumes. Simple things like temperature, pressure, time, and volume levels produce relatively small amounts of data; that’s especially true if they’re measured only occasionally, as opposed to constantly.

On the other hand, a high-speed, high-resolution camera that monitors a process on a manufacturing floor to detect bottlenecks or defects may generate gigabytes of data every second. Clearly that will have very different network, compute, and storage requirements vs. the environmental measurements.

egge_supplemetn-v2-2020-se.PNG

The Edge 2020 Supplement

Searching for the Edge: Travel to the factories of the future, cities of tomorrow, and next generation shops to discover where the Edge will live

Variety: different kinds of data

As the volume discussion makes clear, IIoT applications involve many different varieties of data. Not all sensor data, for example, is of the low-volume type associated with, say, temperature. Even a sensor that measures the levels of a liquid may differ dramatically from one application to the next. A sensor that measures the level of liquid in a large tank may only sample every 10 seconds or less, whereas one that measures levels of syringes for a pharmaceutical company will test far more often.

Similarly, video cameras can vary dramatically from one to another. Cameras for process analysis may work at 10,000 frames per second, while the HDTV we watch at home is 30 frames/sec. So, the types of data can vary considerably depending on the relevant applications.

Value: relevance and how long you need to keep data

Value is a determination of what data you need to keep and for how long. If you measure temperature as part of a manufacturing process, does that data have any value after the process is complete? Is there a reason you need to keep it beyond today, tomorrow, or next year?

An example is an automated welding process, which involves a video camera capturing real-time images, a probe to listen to the sound the welding process makes, along with temperature and humidity sensors. The resulting data is all sent to a computer to analyze and adjust accordingly to come up with the optimum weld. But once the weld is complete and passes quality assurance, do you really need all the data that went into coming up with the optimum weld? Probably not.

However, for compliance purposes in certain countries, auto manufacturers need to document every weld they make and maintain it for decades, just in case there’s a problem down the road (pardon the pun) and they need to track down the issue. So, they do need to keep some data about each weld, but perhaps not all of it. Of course, record retention requirements vary by industry, but is important to identify and retain that data for risk management and regulatory compliance.

Veracity: getting to the truth

The final quality to consider is the veracity of the data, or whether it’s accurate. In most enormous data streams, it’s likely that some amount of the data represents outliers or inaccuracies.

For example, think of a manufacturing process that measures the diameter of an object, perhaps a tin can. There is always going to be vibration in the process that creates noise, so you wind up creating data that actually represents random noise, not what you’re really trying to measure. There’s no reason to keep that data, so you need parameters in place to verify the quality of your data and extract any that isn’t relevant. That’s most likely done right at the edge before larger data sets are being shipped elsewhere for further processing.

Putting it all together

If you apply the big Vs to a given IIoT application, it should paint a picture of where the resulting data needs to be processed, stored, protected, and how best to transport it.

The automated welding process, for example, happens in real time. To make the necessary control adjustments, the application requires low latency. There’s not enough time to ship the data off to some cloud platform for analytics. Analysis will have to be done locally, for example, in an industrial edge application. The data that automakers need to save long-term, however, is probably best sent to and stored in a regional data center or cloud-based facility.

In addition to the latency requirements, bandwidth cost can also be a factor. The more data needs to be sent, the greater the bandwidth provision needs to be. And the bigger the data pipe, the greater the investment in network capacity and infrastructure.

Sometimes, the sheer volume of data will dictate that it is best to initially deal with it locally, and then plan to send a subset of that data to a regional or cloud facility for additional processing. An example might be a process automation application where the data required to actually run the process is processed locally, while data about the process — time required, quality assurance results, health of the machines — is sent to a cloud-based analytics application to optimize the process and track the health of the machines involved. Depending on the actual use case and the aforementioned considerations, deploying a hybrid data architecture seems to make the most sense.

Finding the right balance between industrial edge computing, larger regional enterprise or colocation data centers, or cloud data centers can be tricky. If you’d like some help determining the best IT solution for your IIoT applications, join the Schneider Electric Exchange. On Exchange, you can connect with experts for advice and find qualified service providers who can help get your IIoT application from the proof of concept stage to production, where it can generate real value.