The Basics about the Art and Science of Mining Maintenance Data

There are many different ways to collect datasets in order to determine machine and equipment health. Several popular approaches available to engineering and maintenance professionals include measuring vibration, temperature and supersonic data, as well as data from thermal images.

Collecting datasets is necessary for a number of efficiency-centered maintenance processes and activities such as:

  • Reliability-centered maintenance (RCM), which is used to establish safe minimum levels of maintenance and to improve operating procedures in order to optimize equipment uptime and Overall Equipment Efficiency (OEE); or
  • The analysis of the impact of maintenance processes on FMEA (Failure Modes and Effects Analysis), which helps identify possible failures in a design, a process or even a service. In FMEA, the root causes for each failure mode are thoroughly analyzed and assessed to predict future failures.

FMEA approaches are employed when designing or analyzing potential failures of a process or service. This step is implemented with the active involvement of maintenance engineers.

Accordingly, it’s important to note that remedial efforts which produce the greatest value are planned (such as during preventive or predictive maintenance).

In principle, RCM and FMEA are analytical methods that can greatly benefit from the analysis of maintenance-related datasets. Analyzing such datasets helps classify parts in terms of their failure probability or even at predicting the asset longevity.

Maintenance datasets can be analyzed in order to extract maintenance knowledge, such as rules that signify the high likelihood of an asset’s failure.

Data analysis for automatic classification, production of end-of-life (EOL) predictions and extraction of rules falls in the realm of data mining and machine learning.

Machine learning hinges on the selection of a proper method and model for the problem at hand. For example, automated classification is based on the use of the collected datasets in conjunction with some classification method. It uses past data about the status of assets (e.g., failed, malfunctioning, working normally) in order to identify the most likely health status of an asset.  

There are several classification techniques and algorithms such as decision trees, which use a tree-like graph for modeling classification decisions. Decision trees represent a list of if-then-else statements on the observed data, which ultimately lead to a classification decision.

Because of their simple design, decision trees are quite easy to understand but may not be accurate enough. Consequently, data scientists and statisticians may opt to use other, more effective classification methods.

Popular Maintenance Data Mining Methodologies

The selection of a proper machine learning and data mining model is crucial for the credible estimation of parameters, including failure probabilities and EOL. To best achieve these results, workers traditionally employ disciplined methodologies to analyze data and evaluate alternative data mining models.

The most popular methodologies for analyzing and mining maintenance datasets are:

  • CRISP-DM (Cross Industry Standard Process for Data Mining)
  • KDD (Knowledge Discovery in Databases)
  • SEMMA (Sample, Explore, Modify, Model, and Assess)

These methodologies are iterative, which means that they are subject to a design, build, deploy and evaluate cycle. This cycle can be applied multiple times in order to boost continuous improvement.

These methodologies are also cross-sector, meaning that they are not only used for industrial maintenance, but can be used in different industrial sectors and applications.

By using these approaches, a user can evaluate the performance of a given model on the supplied datasets, prior to the final selection and field deployment of a data mining algorithm.

Overview of CRISP-DM

The CRISP-DM, is the most popular among the three approaches. CRISP-DM is an iterative methodology comprising six major phases, which are sequential in the sense that each is based on the outcomes of the previous one.

Due to the outcome-based nature of CRISP-DM, it is possible—and in most cases required—to revert from one phase to a previous one. In the case of mining maintenance datasets, the six phases of CRISP-DM are:

1. Understanding the Maintenance Question 

This initial phase sets the scene and decides the scope of the data mining activities. It establishes the requirements and goals of the maintenance data mining process, including the expected result.
To this end, the target maintenance-related question has to be formulated. This could be the prediction of a machine’s EOL, based on vibration and ultrasonic data or even the estimation of a failure probability, based on temperature data. In addition to formulating the target maintenance question, a preliminary plan to resolve this question is developed, including the datasets to be used and the models that should be explored.

2. Understanding the Maintenance Datasets

As part of this phase, datasets are collected and reviewed. For the success of the data mining process, it is very important to inspect the datasets to identify any quality problems. It also helps determine which models could be effective or ineffective. Even though every problem is different, experienced data scientists can prioritize the methods to be tested and evaluated simply by reviewing the available data.

3. Preparation the Maintenance Datasets To Be Used

In this phase, the final maintenance datasets to be used for extracting and evaluating the data mining model is prepared. This may involve several transformations to the raw data collected by the sensors, including:
- filtering information (e.g., selecting specific attributes);
- transforming data into different formats, combining datasets (e.g., joining datasets from different sensing modalities); and
- cleaning the data (e.g., getting rid of empty or incomplete fields).
The ultimate objective of this phase is to ensure that the data is ready for data modeling tools.

4. Maintenance Data Modeling

As already outlined, there are a variety of models that can be used for classification, prediction or even rules extraction. The purpose of this phase is to apply some of the available methods, while also calibrating them by tuning their parameters. Note that each selected model may require different datasets. Therefore, it is common to go back to the data preparation phase in order to select alternative datasets as needed.

5. Data Models Evaluation

Following the development of the data model(s) in the previous phase, this stage performs a thorough evaluation of the operation of the selected models against the target objectives. The evaluation is conducted in terms of the performance of each model. For example, it is tested to determine if a model can produce EOL productions that are very close to the known EOL of assets. However, apart from evaluating a model’s performance, it is also important to assess (at a higher level) whether the business objectives can be met. The assessment helps determine if a model can be moved to production. It is quite common for the data mining team to leave a phase and reevaluate the first step of this process in order to reformulate the data-driven maintenance target at hand.

6. Field Deployment

This phase is concerned with the deployment of successful data mining models in the field. This stage is not confined to the integration of algorithms within platforms and systems like Asset Management (AM) and Enterprise Resource Planning (ERP) systems. In this step, users decide on the best way to presenting the information to maintenance teams, which can include reports and visualizations.

Overview of KDD and SEMMA

KDD and SEMMA are less popular than CRISP-DM, but still deployed in several data mining settings and applications.  They are also staged and iterative. In particular, KDD’s main stages include:

  1. Selection, which creates the maintenance dataset to be used as a basis for the data mining process.
  2. Pre-processing, which includes data cleaning and pre-processing activities in order to ensure that the maintenance data to be used are consistent.
  3. Data Transformation, which focuses on the transformation of the datasets in order to focus on specific maintenance related values and attributes (e.g., vibration values, timestamps, asset status etc.) that are likely to contain maintenance knowledge.
  4. Data Mining, which is the stage where patterns of interest are sought in-line with the business/maintenance target such as predicting the status or the lifetime of an asset.
  5. Interpretation/Evaluation, which is the stage where the effectiveness of the chosen method is evaluated from a business perspective.

Likewise, SEMMA comprises also five data preparation and processing phases, including the following activities:

  1. Sampling, which focuses on sampling the data towards deriving a dataset that contains significant information, yet can be managed effectively. It’s quite similar to KDD’s selection phase.
  2. Exploration, which is devoted to searching the data for unanticipated trends and anomalies in order to ensure that the data are properly understood, like the pre-processing phase of KDD.
  3. Modification, which includes the process of creating, selecting, and transforming variables to focus the model selection process. It is very similar to KDD’s data transformation phase.
  4. Modeling, which aims at modeling the data towards searching for a combination of data that reliably predicts a desired outcome. It is essentially like KDD’s data mining process.
  5. Assessment, which is the phase where the outcomes of the selected data modeling method, along with its performance are assessed, comparable to the interpretation and evaluation phase of KDD.

One can easily observe a direct mapping between the stages and activities of KDD and SEMMA, as well as their pertinence to the phases of the CRISP-DM. This reveals the similarities of the three methodologies.

Building The Data Mining Team

One of the major challenges of the data mining process is to assemble the right team that will be in charge of deployment. Indeed, the implementation of the methodology outlined above requires at least the involvement of experts from three different disciplines:

  1. Maintenance experts and engineers, who will be in charge of formulating the business problem and the maintenance questions that should be answered based on the data analysis. Maintenance experts judge whether the selected data model serves their business objectives as part of FMEA, RCM and other processes. Maintenance workers with a deepknowledge of field processes are likely to be included to provide insights on the interpretation of the data. They can also help evaluate the ultimate result of the deployment, including reports and visualizations.
  2. Data mining experts and data scientists, who will be in charge of selecting, applying and evaluating machine learning and data mining models. Data scientists will play a major role in the inspection of the data, as a means of identifying which models could be effective for the problem at hand.
  3. IT developers and experts, who will undertake all programming tasks, including tasks that involve data preparation and transformation. They can also access data and model deployment in various IT systems, business information systems and databases.

Maintenance workers will remain at the heart of data-driven maintenance processes. Nevertheless, effective mining of maintenance data also requires additional experts (such as data scientists) in areas where there is still a big talent gap in disciplines such as statistics, machine learning and deep learning.

The practice of mining maintenance data and deriving hidden patterns of maintenance is both art and science.

The presented methodologies provide a sound basis for understanding the scientific part. However, they also reveal the importance of a maintenance data scientist’s creativity for processes like inspecting the data, selecting suitable data mining and machine learning models, as well as evaluating the business relevance of the results.

This is the reason why a good multi-disciplinary team is required, including people with field experience. Is your organization ready to efficiently process maintenance data to help achieve maintenance excellence?