On a recent customer call, a prospect shared with us that “We are not at a place right now to be prioritizing the quality of the data over the exercise of aggregation and streamlining the data collection.” This exemplifies a strange cognitive dissonance, where some companies say ‘yes, we do need to aggregate and store the data, but we’re not concerned with its quality’. That is, we need the data, but we don’t need it to be usable. What??? Isn’t the crucial (and rather large) detail to using data, its accuracy and reliability?
This has all the makings of a Data Catastrophe: ignoring the crucial factor that impacts every application based on that data. To avoid such a catastrophe, you will need a comprehensive Data Strategy. But half of technology leaders in a Pulse by Gartner poll said the lack of a clearly articulated data strategy is the #1 challenge inhibiting the progress of data-driven projects; more than technology, ROI, funding, or talent.
In which functions have you made investments? I’d venture to say some but not all. In another Pulse by Gartner poll, two-thirds said data quality is causing the most pain for their company. So why do we focus so much on security, aggregation, and/or analytics? Data Quality is intrinsic to a successful data strategy, as the infrastructure layer between the historian and the uses of data. It can be the crucial design flaw to your entire strategy if it’s not done properly or ignored for higher, more visibly-calculable ROI applications of other data functions.
Data quality issues should be treated with the same severity as an engineering failure to a road, bridge, or building. Some notable plant incidents show how data quality failures severely impact performance. For example:
- The 2005 Texas City explosion at BP was caused by the failure of a single sensor (and a series of unwisely-made gut decisions) that told operators the isomerization column was not filling. They continued to pump in hydrocarbons leading to over-pressurization and then a spark caused the deadly explosion. [Engineering Catastrophes season 5, episode 3]
- In the last 11 minutes before the Milford Haven refinery explosion, two operators had to recognize and act upon 275 alarms. Another 1765 alarms were displayed as high priority, despite many of them being informative only.
- In 2019, a Top 20 Chemicals company lost €160M from unplanned outages and lost production. 80% of this lost was attributed to the accurate detection of sensor data issues. Operators found it difficult to trust the data, instead relying on experience.
- As nations drive towards a future of net zero, cities and companies must accurately report their greenhouse gas emissions to regulatory agencies. In 2020, GHG emissions were under-reported by an average 23.5%, leading to significant fines.
In these examples, an established, well-defined data strategy, where data quality is guaranteed, would have eliminated these “failures”. Operators require reliable, accurate data upon which to make data-driven decisions. Data scientists need to focus on higher-value functions like data aggregation, insight, and analytics, instead of cleaning data. Executives need accurate data upon which to base their business decisions. It begins with Data Integrity; and it’s more than just quality.