Managing the Big Data Flood

Much like constant downfalls of water, there’s lots of data being generated — and it needs to go somewhere.

Oct 10, 2019

Freak floods caused by downpours can cause devastating effects, but what’s that got to do with big data? Much like constant downfalls of water, there’s lots of data being generated — and it needs to go somewhere.

Just as a huge volume of water won’t be managed effectively without the correct infrastructure in place, neither will huge volumes of data. Yet ironically, you can use big data to prepare for flooding and other environmental or operational risks. Operational risks refer to any kind of disruption to normal processes that could lead to a loss of customers and revenue. While you can’t anticipate every possible eventuality — like a freak downpour — you can focus on getting the level of risk in your facility down to a tolerable level.

Accidents, in their broader sense, are rare events that occur when a series of failures of risk management barriers occur. However, post-incident investigations often show that several near misses actually occurred before the accident took place. While these failures may not have been observable to the human eye, they could have been detected through rigorous data collection and analysis.

So, what exactly is ‘big data’? Given the buzzword’s widespread use across industries, it is not surprising that the exact meaning of the term is often subjective. As per the IBM Institute for Business Value, it really all comes down to four Vs: volume, variety, velocity and veracity.

Volume

Big data is, unsurprisingly, big. As the name suggests, big data relies on massive datasets, with volumes such as petabytes and zetabytes commonly referenced. However, these large datasets aren’t as difficult to collect as you might imagine.

New technologies are increasing the size of data sets that every device, facility and process generates. It’s growing at an exponential rate and bringing with it new challenges. Factories, as an example, are getting overloaded with data. Every machine, process and system on the factory floor will be generating data during the plant’s operation. However, these facilities have commonly become data rich, but information poor.

For example, technologies allow plant managers to extract data on the condition of mechanical equipment, such as a motor. However, tracking huge reams of data on the condition of a motor will only go so far. You need to use data, for it to be useful. What’s the solution? Keep your data collection streamlined by investing in a data management system. Here, you gain insight into the data you care about and use it in accordance with your risk management plan.

Velocity

The second ‘V’ in the big data flood is velocity. Velocity refers to the accelerating speed at which data is being generated and the lag time between when data is generated and when it is accessible for decision making. Faster analysis leads to faster responses. However, today’s data is created at such a rate that it exceeds the capability of many existing systems.

Consider the motor condition monitoring as an example — you may be tracking 500 vibration data points per second to check its performance, but if your vibration analysis system is only able to analyze 200 data points per second, you have a problem. Ultimately, you need an entire big data infrastructure that is capable of processing this data quickly.

Variety

As more technologies begin to generate data, this information is becoming more diverse. From vibration analysis and condition monitoring, to data from enterprise systems such as market trends and product lifecycle management (PLM), organizations are finding they need to integrate increasingly complex data types from an array of systems.

This often requires the vertical integration of several different systems. Using this more complicated integration model, condition monitoring data could identify when an industrial part was showing signs of failure, then automatically cross check inventory data to see if a replacement part is in stock. If a replacement part isn’t available, this system could make even more intelligent decisions by automatically repurchasing the part using an enterprise resource planning (ERP) system. However, this is just one example.

Plant managers may find themselves wanting to collect all data types, even data that isn’t useful to them and store it in archives for historical analysis. Businesses need to keep their risk mitigation goals at the forefront of their data collection, rather than collecting data for data’s sake.

Veracity

The final ‘V’ refers to Veracity, the reliability of a particular data type. The key issue here is that the other dimensions of big data — that’s volume, velocity and variety — challenge the capacity of many existing systems.

Consider the replacement part example. The scenario sounds ideal, but how can you integrate the condition monitoring big data of a legacy motor, with the availability data of the parts supply if it needs replacing, when this data belongs to a third party?

We suggest forming a relationship with an obsolete parts supplier, such as EU Automation. To achieve true veracity, your data infrastructure must be able to make intelligent decisions, without hitting a wall when it requires action outside of the factory walls.

Don’t let your huge reams of data flood your facility. There’s no value in generating data for data’s sake. For the most effective analysis, make sure you have all four dimensions of big data being in place — volume, variety, velocity and veracity.