Customers ask APERIO!

Data Quality

  • APERIO’s data quality ensures the accuracy and reliability of operational data across your industrial data value chain. It is the necessary software layer between the source of the data and the applications of that same data. Without accurate, trusted data, any function that uses this data is worthless and any operational or business decisions based on this data are unreliable. Learn more.

    APERIO offers more than just a data quality or asset health check. We ensure your data is accurate, reliable, and complete, before using it in any application. See Applications below to learn more.

  • The source of inaccurate or unreliable data is not necessarily the result of entering data into a system without controls. Beyond bad data, the quality of that data could also be in question. For example, it could be the structure of the data, different units of measurement, data that’s out of range, or abrupt changes to the data, to name a few. Review APERIO’s machine learning engines to see all types of anomalies it can detect.

  • Companies have failed to solve data quality issues for decades because it’s such a huge problem. It’s just too difficult to automatically detect anomalies at scale and in real time while avoiding false positives. We asked customers, with all the technology available today, why they can’t do it. They consistently say it’s too expensive to manually manage and sensitize 1M–2M tags. And we don’t have the resources to do this at scale. Learn more.

  • At APERIO, we define data quality excellence by whether data is accurate, consistent, complete, valid, integral, and timely.

    At APERIO, we believe you cannot improve what you do not measure. That’s why we calculate several metrics that reflect the quality of your data. As well, we guide you through a contextual workflow for remediation based on these metrics and the criticality of the data. See question #5 for metrics we calculate.

  • APERIO DataWise calculates several data quality metrics, including Data Quality Index (DQI), event severity, and event duration; plus an anomaly matrix by engine type, and DQI trends over time. DQI is a measure of the quality of your data according to the APERIO DataWise engines looking for anomalies in your data set, site, or enterprise.

    In addition to measurements, APERIO DataWise also provides workflow, root cause analysis, and pattern recognition tools to drill deeper into the data to get insight and recommended actions to correct. You can also prioritize actions according to their impact on performance.


  • APERIO’s data quality ensures the accuracy and reliability of operational data across your industrial data value chain. It is the necessary software layer between the source of the data and the applications of that same data. Without accurate, trusted data, any function that uses this data is worthless and any operational or business decisions based on this data are unreliable. Learn more.

    APERIO offers more than just a data quality or asset health check. We ensure your data is accurate, reliable, and complete, before using it in any application. See Applications below to learn more.

  • Any application of data, from the control room to the corporate office, is at risk if the quality of operational data is poor. This could be an operator’s inability to assess the priority of alarms to make data-driven decisions in real time. Or good data upon which data scientists can build advanced models or AI tools to improve performance. See who needs APERIO DataWise.

  • Because APERIO DataWise is needed wherever operational data is stored or used, the use cases are numerous. The top 5 customer applications include:

    • Data Health: Assess the quality of a chosen data set, per DQI, severity, and duration. Apply tools to remediate issues and improve data quality. This can also include a historian data check.
    • Advanced Models: Assess and improve the quality of data fed to advanced models, such as, analytics, plant performance, and predictive models.
    • Abnormal Equipment Behavior: By looking across millions of sensor data, detect and alert on equipment failure before it occurs with actionable insights on preventative next steps.
    • Sensor Drift: A common problem in plant operations but extraordinarily difficult to detect, APERIO can detect which sensor is drifting and alert, on a consistent basis.
    • Reliable Data Reporting: Ensure the data you are reporting to stakeholders (shareholders, employees, community) is reliable and accurate.
  • No, APERIO can determine the data quality of all operational data, including process data, lab quality data, profile data, maintenance data, SCADA data, MES data, and ERP data. Any data stored in a historian, times series database, and/or data lake. See how APERIO DataWise for PI is used to assess the DQI of its PI historian data. Or how Seeq customers use APERIO DataWise for Seeq to ensure the data in their models is of the highest quality.

  • No. APERIO DataWise is completely agnostic when it comes to data sources. It can connect to operational data stored in process historians, time series databases, data lakes, technologies, and platforms. OSISoft PI was selected as the first historian-based product, APERIO DataWise for PI, because of its wide use across our customer base. Other historians could easily include AVEVA Wonderware, AspenTech InfoPlus.21, Canary, GE Proficy, Emerson Delta-V, etc.

Technical Specifications

  • Please contact us for a complete list of current and future engines.

    The engines deploy fully automatically for each time series data stream (channel), without input from the user. No need to indicate training data, thresholds, limits, ranges, operating states, etc. Based on the last three weeks of data, the engines will build the first model for each signal, and then continuously reiterate these models as new data arrives (that is, the first models will rely on Archive data exclusively, and the iterations will rely more heavily on snapshot data, but not necessarily exclusively). In terms of time from DataWise deployment, until the engines have built models, this can be parallelized as much as you would like, meaning that it becomes a matter of dollars, rather than time. As an example, 100K channels would cost $100 per year on AWS, and each channel takes around 3 minutes to process upon first connection, and becomes continuously processed hereafter.

  • Patented Machine Learning defines normal time series behavior based on historical data via the ‘Fingerprinting’ process.

    1. Automatically builds the digital ‘fingerprint’ for each data input.
    2. Unsupervised ML engines monitor single and multi-sensor time series data streams.
    3. Anomaly Detection engines process live data streams and assesses authenticity against individual fingerprints.
    4. A fingerprint mismatch generates an alert in real time, pinpointing the problematic input and indicating the root cause of anomaly.
  • By default, APERIO DataWise’s machine learning engines are unsupervised. This allows for a clean view of the data without added assumptions, predefined rules, or best practices. However, as you become more familiar with your process, you can make the engines more supervised per specific priorities or inherent knowledge. This can make the tool more intelligent according to your needs.

  • Some examples of manually-set rules could include:

    • Setting up ranges for numerical values
    • Performing validity check (e.g. string/integer/length of a string/precision of a value/etc.)
    • Applying timeliness aspects

    As a starting point, the system will apply the default ML policy. The engines were developed by Aperio’s algorithm engineers to fit a very wide range of signals and operations out of the box. However, in certain circumstances it can make sense to deviate from this default ML policy. Currently, we would collaborate directly with you on these settings changes to ensure that the changes we impose will yield the expected outcomes. In the future, we will allow direct adjustments to these settings within the user interface.

    In terms of aligning the ML models to the operations, we offer a ‘Control Group’ function. This functionality ensures that plant process state (e.g. on/off/ramp up/etc.) is reflected in the ML models, and you won’t see false positives during downtime, for example. The control groups can be defined automatically, based on asset hierarchies for example, or defined manually by the user (in bulk).

  • APERIO DataWise for PI is a subset of machine learning anomaly detection engines specifically for identifying missing, stale, or bad data stored in PI historians. Learn more.

    APERIO DataWise for Seeq is a different subset of machine learning anomaly detection engines specifically for assessing the DQI of data before using the data in Seeq calculations. Learn more.


  • APERIO DataWise was built for cloud, but works on-premise or as a hybrid of the two. APERIO works with all public cloud vendors: AWS, Azure, Google, etc.

  • APERIO DataWise deployment is fast, in minutes. Once deployed, APERIO DataWise can connect to any number or type of company’s data sources. It then builds digital fingerprint models for each channel and determines correlated channels based on historical data. This process, where the engines train and run, takes a few hours. For more than a million tags, you are up and running within a week.

  • APERIO DataWise is the necessary software layer between the source of the data—historians, data lakes, SCADA systems—and the applications of that same data—visualization, analytics, optimization, predictive maintenance.

  • The APERIO light agent is built with performance in mind, so the strain APERIO DataWise adds is minimal. Depending on the number of sensors, sample rate, and CPU cores, we typically see a 1-2% CPU increase per 100,000 sensors for a 1-5 second sample rate.

  • Assuming it’s raw data, it should be possible (and cost effective for all sides, as the number of integration points would be lower) and we can work together to implement. We’d only need to know which component(s) you’re using to push the data from the edge to Azure and then explore the option of fetching the data directly from Azure.

  • Customers can assign an admin, who can grant/limit certain users’ access rights. Today, this is done in the backend through the APERIO support team. An enterprise user management module is on the roadmap for mid-2023.


  • APERIO DataWise provides an interactive user interface for end users to manage root cause analysis of detected events quickly and accurately. You can use this monitoring dashboard or you can augment an existing dashboard with APERIO results without extra screens. This includes common historians like PI and analytics tools such as Seeq. You can alert and investigate events within the APERIO DataWise or integrate into PI notifications, as well as, 3rd-party tools like SIEM and BI.

  • While it is challenging to understand whether the data remains stable or the data has stopped flowing, we have integrated similar solutions for customers. For example, the MQTT Sparkplug B specification by the Eclipse foundation is designed precisely for this scenario (and implemented by some historians, i.e. Canary). As long as there is some sort of indication that data has stopped at the source (either a control tag, a virtual control tag, or any mechanism similar to the “death certificates” of Sparkplug B), APERIO can differentiate the two.

  • In addition to the high level overviews in the Dashboard, users can drill down to the individual sensor level and see the details of each data issue (including type of issue, duration, issue severity, ML model parameters, etc.). Furthermore, insights are provided at various levels of aggregation for different types of user objectives (e.g., resolving issues, communicating issues to colleagues, tracking issues over time, etc.).

  • This will be an automated feature in the near term future. Today, we offer this as a service within our Customer Success team.

  • APERIO DataWise offers both features:

    (1)  In its “Advisory” feature, the system runs pattern recognition to create clusters of sensors (channels) that are affected by the same events. The user can use this intelligence to rank the issues by which have the largest potential positive impact on data quality.

    (2)  You can also create clusters of channels manually via the “Cases” feature.

  • There is no limit on labels.

  • By default, anyone and everyone can mark, but this you can easily set it up so that it is limited to certain users.

  • Our 2023 product roadmap includes “Scopes”. Instead of seeing all channels by default (although filterable), users can define groups of channels that are relevant to their role/responsibilities as scope(s). The end user will receive key data quality metrics (e.g., DQI) on an ongoing basis for each scope. Further, this can be used to easily limit what certain users can see.

    In terms of setup, administrators or end users can define Scopes. This can be done by selecting individual channels, through asset hierarchies represented in the source databases, automatically according to which application are using the channels (e.g. all channels used in Seeq), and according to which access rights exist for a user in a database already, etc

Unlock the value of superior data

The future of your industrial data value chain is at your fingertips. Improve data accuracy, security, and value, for smarter business decisions based on real-time, trusted, superior data.

    Privacy Preferences

    When you visit our website, it may store information through your browser from specific services, usually in the form of cookies. Here you can change your Privacy preferences. It is worth noting that blocking some types of cookies may impact your experience on our website and the services we are able to offer.

    Click to enable/disable Google Analytics tracking code.
    Click to enable/disable Google Fonts.
    Click to enable/disable Google Maps.
    Click to enable/disable video embeds.
    Our website uses cookies, mainly from 3rd party services. Define your Privacy Preferences and/or agree to our use of cookies.