6 Methodology

The following section outlines the methodology of the research project, detailing the work to be done at each step, the underlying motivation, as well as the contribution to the research objectives. As highlighted in Figure 6.1, the research methodology will have a linear flow, with each procedure building upon the work done in the previous. The first step will provide an introduction into the data and respective methods of collection. The next step will detail the steps to be involved in pre-processing the data in order to prepare it for analysis. After pre-processing, a descriptive analysis will be carried out with each respective data set. The fourth step will detail the procedures to be used to build models evaluating the reliability of the equipment. The final step will detail the steps involved in creating a predictive model that integrates data from all existing sources.

Overview of research methodology

Figure 6.1: Overview of research methodology

6.1 Data collection

Although the data to be used in the coming analysis was not collected by the researcher, the following section will detail the methods and motivation used for collection, in addition to providing a description of the data sources themselves.

6.1.1 Plant stoppage event data

The available stoppage data consists of a repository of 157 Microsoft Excel documents,collected from 2015,2016, and 2017, listing all of the stoppage events that occurred during the week. As displayed in Figure 6.2, as a stoppage occurs within the plant, a maintenance engineer records the date, time, equipment, and plant section, along with a brief description of the stoppage cause or solution. Additionally, the group responsible for resolving the stoppage is also noted.

The stoppages for each day are arranged in a list under a common heading, and additional stoppages are appended to the list as they occur. Using this method, any stoppage occurring within the plant is cataloged and can be reviewed by opening the historical report for the respective week. In this current format, the reports provide a summary of the stoppage events for a given week, but are not intended for statistical analysis.

A sample of one of the weekly stoppage event reports

Figure 6.2: A sample of one of the weekly stoppage event reports

6.1.2 Vibration measurement data

As mentioned previously, the vibration observations were recorded for a subset of all equipment items as deemed vital by the plant organization. The measurements were observed using a handheld probe, without causing interruption or downtime to the machine, and input into an electronic record. The purpose for collecting vibration measurements is to provide an indication as to the health status of the equipment, referred to as condition monitoring.

The vibration source data consisted of a repository of tables and graphs, displayed in an HTML file, as depicted in Figure 6.3. Each recorded vibration measurement contained the individual equipment code, the component being monitored(fan, motor, etc.), the type of vibration, the amount of vibration, and the percentage change in vibration since the previous measurement. With this given structure, each piece of equipment may consist of several components, with different types of vibrations being recorded from each component.

A sample of the observed vibration measurement data reports

Figure 6.3: A sample of the observed vibration measurement data reports

6.1.3 Monthly production data

In addition to stoppage event records, the plant also maintains a record of the amount of production or processing by each area of the plant. This figure is recorded as the total production for each month, measured in tons. The plant has also provided information regarding the specific maximum throughput, or capacity rate for each plant section measured in tons of material per hour.

6.2 Data pre-processing

As each data source was primarily recorded for the purpose of obtaining a high-level overview of the operational status of the plant, the format of each source make it unsuitable for statistical analysis. The following section will detail the necessary procedures for aggregating, cleaning, and transforming each dataset in preparation for analysis.

6.2.1 Plant stoppage event data

Designed to provide a high-level synopsis of plant availability during each week, there is significant value to be realized from aggregating the reports, and structuring the data, such that management can understand how the reliability of the system, subsystems, and components has evolved over the past three years.

In order to begin pre-processing the stoppage event data, the records must first be collected from each of the 157 files, and aggregated into one worksheet The pandas library in the Python language will be used extensively to aggregate all of the files into a single dataframe, and systematically strip out stylistic formatting and non-informational headings.

In addition to equipment code, plant section and interval time stamps, each stoppage record contains a Comments field in which the person who recorded the stoppage could insert additional information about the event. This would allow the employee to provide information otherwise not recorded in the other fields, potentially describing the failure mechanism of the equipment or the maintenance action performed in response to the failure, among other possibilities.

The ability to extract additional information about each failure event will allow for further classification of stoppage events, such as identifying whether an equipment failed because of a faulty bearing or leaking seal. More detailed information about the failure mode of an equipment can improve the effectiveness of prescribing specific maintenance interventions, a maintenance engineers will have a narrower scope of inspection, hopefully resulting in shorter maintenance durations and less equipment downtime.

Furthermore, insight into the maintenance action taken in response to equipment failure can be used to identify the effect of the action on the equipment reliability. For example, a complete history of failure mechanisms and maintenance actions may reveal a difference in effectiveness of equipment replacement versus equipment repair in response to failure. Additionally, this history may identify the extent to which a specific maintenance action is effective in response to a specific failure mode, or matched to certain equipment. Perhaps repair, or other maintenance action, is largely ineffective on certain equipment, or perhaps only a finite number of repairs are effective, after which point a replacement must occur.

It is advantageous to attempt to extract as much meaningful information about each stoppage from the Comments field as possible. Doing so will greatly improve the modelling capabilities and allow for more accurate predictions regarding the timing and specifics of maintenance interventions. As the Comments field is entirely unstructured, and may vary depending upon the stoppage category, plant section, individual employee, etc. text mining will be useful to understand the meaning, value, and patterns of the available information. The tidytext package in the R software provides extensive functionality for text mining based on the concepts outlined by Robinson and Silge (2017).

6.2.2 Vibration measurement data

The vibration data was provided in a large HTML document, in which the observed readings are presented in a combination of data tables and graphical plots. In order to transform the data into a usable format, the Python package BeautifulSoup will be used to parse the HTML file and extract the source data from consecutive table elements. As the graphical plots represent the same data from the tables, they will be ignored. After extracting the observed vibration readings into a single data frame, human input errors will be corrected for the date, equipment code, and measurement type.

In addition to being recorded at irregular intervals, measurements were taken by hand causing observations made at essentially the same time to have differing time stamps. In order to combine readings from the same equipment and sub-component taken on the same day, only the date will be retained, ignoring the exact time of day.

6.2.3 Monthly production data

Plant production values were recorded for 10 sub-sections of the plant, and presented in monthly summary reports. Aggregating this data source will involve manually opening each report, copying the monthly total, and entering it into a new table. Once aggregated, the final table will contain one row for each sub-section, with the columns representing the total production tonnage for each month.

6.3 Descriptive analysis

Following the pre-processing of both data sets, an exploratory statistical analysis will be performed for each. This data exploration will yield some preliminary insights into what kind of information the datasets contain, what information can possibly be extracted, the completeness of the data, as well as any limitations for future modeling. In addition to describing the data sets themselves, the exploratory phase will serve to identify several critical sections and items of equipment that will be examined more closely in the subsequent analyses.

6.3.1 Plant stoppage event data

After processing and combining the weekly stoppage reports into a single data source, spanning the three year period, valuable insights can be realized from the historical record. This can allow maintenance engineering to track downtime and availability over a much longer time duration. Attempting to improve reliability and availability of production equipment when only viewing a weeks’ worth of information makes identifying trends very difficult. In addition, it is difficult to separate between more reliable and less reliable systems, that is, systems needing maintenance and those that do not, which may cause maintenance engineering to misidentify equipment or entire systems.

As the stoppage event data provides the largest source of information regarding the performance of the cement plant, it will serve as the basis for a criticality analysis. This is because each stoppage event is classified by respective event category, equipment, and plant section; thus, visualizing the distribution of failure and non-failure events throughout the plant will aid in identification of critical areas of focus. Plant sections or specific equipment in which failure events are large contributors to overall downtime will be identified as the areas in which the largest improvement in downtime reduction can be made. Similarly, the total number of failure events per section or equipment provides a similar endpoint for measuring criticality.

6.3.2 Vibration measurement data

In addition to the value provided by stoppage and production records, the vibration readings may serve as an adequate indicator of the overall mechanical health of an item of equipment prior to experiencing a failure. A descriptive analysis of this data should facilitate a better understanding of the extent to which these variables represent equipment health, and how best to incorporate them into future models. Given that the objective is to build models using an integration of all data sources, and not all plant equipment is monitored for vibrations, the descriptive analysis will also indicate which equipment is most frequently monitored, identifying candidates for future integrated models.

A descriptive analysis of this data may include boxplots to assess and compare the range of values across variables, time series plots, and correlation analysis. Additionally, given the 9 possible vibration variables, dimensionality reduction techniques such as principal component analysis may provide useful.

6.3.3 Monthly production data

A descriptive analysis of the monthly production data will aim to provide insight into the distribution of workload between subsections of the plant. Such knowledge may provide significant explanatory power for reliability models in the forthcoming analysis. As the data consists of monthly figures, time series plots may be the most useful method for descriptive analysis.

6.4 Data integration

Following the criticality analysis of the stoppage event records, a section of the plant will be identified as the focus of further analysis. Ideally, the criticality analysis will reveal a specific type of equipment about which sufficient data has been recorded such that it will be possible to model the recurrent failure events, while accounting for the condition of the equipment between failures, using other data, such as vibration measurements.

In order to facilitate attaining the main objective of the study, using predictive modelling, it is imperative that the three types of data are integrated or fused. Given a specific critical equipment, the integrated data will contain a record of all stoppage events, production figures, and vibration measurement the equipment has experienced.

This means that in addition to identifying the gaps between failure events, the integrated data will also account for maintenance interventions and non-failure stoppages that occur during these gap times. Furthermore, the data will contain the full production rate history, such that the predictive models can use the changes in production to better estimate equipment reliability. Finally, the integrated data will also include all observed vibration measurements in order to assess the condition of the equipment between failures.

There are no established procedure for data integration, as it entirely depends on the structure of the available data as well as the intended use. However, the aim of this integration is to use both the production and vibration observations to augment the stoppage records. Each data set contains both an equipment identifier and a time variable, which will serve as the reference values for the integration. Given the failure records for a certain equipment, the timestamps of the failure events can be referenced against the production records to create time-dependent covariates for production. The procedure will be repeated for the vibration data, using timestamps to identify all vibration measurements that were observed between failure events, and incorporating them into the integrated data as additional time-dependent covariates. In this form, the integrated data will represents all available information about the performance and condition of an equipment in between failure events.

6.5 Reliability models

After identifying several equipment items through the criticality analysis, several basic reliability models, using only the failure event history will be built using survival analysis techniques. Although not yet incorporating all of the data sources, the basic models will still provide some preliminary insights into the reliability characteristics of the critical equipment. Tests for trend, distributional assessments, and model estimation using non-parametric, semi-parametric- and fully parametric methods will be performed.

6.6 Integrated predictive models

The final element of the analysis will involve an extension of the previously established models to identify the potential for a model with the ability to predict future failure events from all available data sources. Some of the survival analysis methods to be used include extended Cox models and parametric accelerated failure time models. Additionally, the use of these models for the purpose of predicting equipment reliability and MTBF will be demonstrated.

In addition to predicting equipment reliability, classification models such as artificial neural networks and linear regression may be used to assess the capability to predict future maintenance actions, using the integrated dataset. ANN models are highly flexible classifiers, while logistic regression provides simple interpretation of covariate effects, both of which may prove insightful.

In short, the integrated predictive models will assess the extent to which the available data can be used to improve the understanding of the reliability characteristics of specific equipment. The integrated predictive models will be used to make inference about reliability in addition to providing examples of how the models can be used in practice.