1 Introduction

The mining industry was an early adapter of sensor-based technologies in monitoring various processes, especially in mineral processing plants. Even for a moderately sized mine, the economic benefits of sensor usage can be in millions of dollars in annual gains [2]. However, sensors are prone to errors causing direct (poor quality of data) and indirect (mineral recovery) losses. The US economy for instance suffers an estimated loss of 20 billion dollars annually due to production losses (3–8%) attributed to sensor faults in oil industry alone [3]. In this context, sensor fault detection became very important aspect of plant economics. Sensor error detection and recalibration have many benefits like reduction in equipment downtimes, increase in production, and improvement of overall safety in the industry. In a study conducted on a crude distillation unit, it was detected that sensor “biases” in non-control variables were causing $7.36 million in losses. It was also found that, by using data reconciliation methods, the loss could be reduced to $7.12 million [4]. The influence of sensor faults is significant on instrumentation costs, which comprise of 2–8% of the total fixed costs of any process plant [5]. Sensor errors that are high in magnitude (“gross errors”) are more common and include white noise, failures (flat-outs), short faults, etc. These are obvious and can easily be detected through any classical statistical approaches that generally depend on linear relation between variables. They can be fixed through regular maintenance or calibration processes. In contrast, their subtle counter parts like small “bias” are errors that usually result from sensors straying or drifting between calibrations; hence, are hard to identify. For highly non-linear and non-stationary processes like mineral processing operations, these subtle errors are indistinguishable from process fluctuations. Particularly, the bias errors which are present in the data often in the form of an added offset value to the original (true) reading.

Data quality plays an important role in the decision-making process for industries. The data quality must be known and well established beyond a reasonable doubt before it can be utilized. In this context, accuracy is the most desirable requirement for any measurements [6]. Periodic calibration plays a vital role in that context and is required to maintain the sensor accuracy. While most of the industry standards emphasize accuracy [7, 8], they do not specify the frequency of calibration. The federal laws are stringent for instruments critical to safety like gas detectors, and require more frequent calibrations [9]. Other sensors however, are calibrated once a year across industries [1]. This could be one reason why sensor biases in between calibration intervals (less than a year) are often ignored or overlooked by industry operators which can cause production losses over time. The losses are even significant if such errors occur in sensors that monitor critical industrial processes; for instance, temperature monitoring sensors in gold stripping vessels. Calibration is a time-consuming process and at times results in equipment downtimes and production losses. Additionally, the process required expert skillsets. Due to these reasons, industry operators are less motivated to conduct frequent recalibrations. In a survey conducted that covered many sectors of industry which include chemical, power, energy, and manufacturing, 56% of the respondents said they calibrated their instruments no more than once a year [1]. In this context, one solution to encourage recalibration and improve on frequency in industry is to device in situ error detection methods that do not require physical removal of sensors from the host equipment for comparison with standard devices or well-calibrated sensors.

Several researchers in the past used statistical methods to detect sensor errors. Least-squares estimation (LSE) is one of the data fitting or estimation-based technique that is frequently used in flagging faulty readings in sensor networks. Linear models have been extensively used in sensor fault detection in the recent past [10]. The “error” is defined in these network models as the difference between the observed sensor data and the fit. The approach works well for data variables that exhibit linear relations and follow Gaussian distributions. When multiple sensors are present, the sensors are dependent on each other, and their statistical interrelations are non-linear, then the approach is less effective. Fitting a large set of data usually compromises the sensitivities of data to small errors. Due to the reason, the estimation-based methods are only suited for finding high magnitude gross errors, but are found disadvantageous to detect low magnitude bias errors. In time series-based methods, error is generally estimated by observing the difference between forecasted trends and actual (observed) trends of data, using techniques like moving average [11], exponential smoothing [12], etc. When the data observed is highly variable in nature—like in mineral processing industries—it is very difficult to forecast the trends accurately. This is one disadvantage in the context of detection of calibration bias errors. Time series-based methods are capable of finding short duration faults but proved disadvantageous for finding long duration faults like noise or bias. In the mining or mineral processing industry, it is common to find differences between observed (erroneous) and actual data. In order to correct the errors and improve the quality of the data, data reconciliation methods are used in process industries [13, 14]. Since they depend on linear relations between input and output variables; they fail to detect data errors in mineral processing circuits. Among non-linear approaches, artificial neural networks (ANNs) are promising to some extent. They mimic the behavior of the complex non-linear systems based on learning from a training data (input) set [10]. The difference between the neural net predicted value and the actual reading is considered as an error. Several researchers in the past used the ANNs to quantify the sensor errors [15, 16]. With neural networks, sensor faults are identified based on the magnitude of the error. Training the ANNs for ever-changing process-related input data is a time-consuming process; once the process changes, the ANN needs to be retrained in order to predict the sensor readings, which is cumbersome. Contrary to the mainstream linear statistical models like classical, descriptive, and inferential statistics, Bayesian statistics use prior or posterior knowledge of known constraints to predict behavior of targeted variable or a sensor error. Comparing the predicted sensor readings to their normal readings, some researchers were able to calculate the probabilities for possible (classified) error types [17, 18]. The disadvantage for Bayesian models is that there is no correct way to choose a prior model, and the choice could heavily influence the posterior model.

From the literature, it is evident that for highly non-linear and non-stationary processes, developing analytical models based on classical or fundamental statistical methods is difficult. Moreover, sensor error validation for such processes is equally tough due to production of false alarms in excessive amounts [10]. Data-mining techniques proved successful in such cases and have a wide range of industrial applications. They are generally used to extract knowledge from huge datasets. In order to process (filter and classify) large amounts of data, data-mining methods heavily use algorithms. Computer algorithms aid to increase the data preparation and processing rates. Data-mining process with the algorithms typically involves cleaning large amounts of sensor data for outliers, filtering the data of interest, calculation of statistics that measure the magnitude of errors, and ultimately use criteria (data-mining technique) to detect errors. For instance, data-mining techniques like “association”-based, exploit relation between variables; in case of “classification”-based ones, each class of items (types of errors) are studied, “clustering”-based, where one or more attributes of the classes are examined and grouped together, and “pattern recognition”-based, where trends or regular occurrences are identified [19]. Techniques like clustering and pattern recognition are used to predict the sensor data trends and can detect anomalies like bias [10, 20]. Due to the complexity of industrial data and underlying processes, often times a combination of data-mining techniques are used in error detection. For instance, the “prediction”-based techniques use a combination of classification, pattern recognition, and association (an example is forecasting of a company’s stock performance). The “decision trees” use classification and prediction together to find various sensor faults [21]. In another approach, “rough set” theory—used to discover structural relationship within imprecise and noisy data—along with the artificial neural networks were used to identify sensor faults in a heating, ventilating, and air conditioning (HVAC) system [22]. Often times, industry problems are complex and require several combinations of these techniques and algorithms (“hybrid methods”) or entirely innovative approaches that are specific to a particular problem. Even though the data-mining-based techniques are highly applicable to non-linear processes like mineral processing operations, the bulk of the literature is devoted to approaches for finding large (“gross”) errors in the sensor data. In general, bias errors below ± 2% (over the true reading) are indistinguishable from process fluctuations. Due to the reason, it is valuable for process industries like Pogo Mine, if the detection is possible at ± 2%, hence the motivation for the research [23]. The data-mining techniques used in the research are broadly a combination of classification, decision-trees, and pattern recognition methods. Excluding the generic data reading and filtering techniques utilized, no other existing algorithms were modified and used to accomplish this research. The algorithm(s) developed and presented are completely novel.

2 Preliminary Research

In an attempt to detect bias errors, certain experiments were conducted by the authors in the past on the sensor data collected from a semi-autogenous grinding (SAG) mill circuit in Fort Knox mine, and a carbon stripping circuit in Pogo Mine of Alaska. In order to glean information from large amounts of sensor data, data characterization methods like aggregation, signal processing methods like Fast Fourier Transform (FFT), and a data-mining-based method developed, peak-readings count and sensitivity analysis (PRCSA) were used [23]. Aggregation methods were used to reduce the size of the data streams by approximating (averaging) the readings at certain intervals. Aggregation helps observe the underlying trends without compromising the overall accuracy of data [24]. When data exhibits certain level of periodicity, like the temperature cycles in carbon stripping circuits, FFT can be useful in observing the overall trends of such data. FFT allows one to observe the large amounts of data in “frequency domain,” i.e., in terms of frequencies of certain trends or cycles of data, which is a more manageable and concise form of such large data sets. However, the important requirement for FFT methods to be successful is high level of consistent periodicity of the data. Data collected from the mineral processing circuits is often dynamic and non-stationary in nature due to ever changing ore characteristics. Hence, cycles change often and, therefore, the FFT methods were ineffective when applied.

The PRCSA method chiefly depends on observing trends of sensor data in terms of certain “responsive statistics,” which are explained below. The method is implemented with an algorithm that at first, aggregates sensor data streams and then applies filters (or cut-offs) to observe cyclical data trends at desired operational ranges. Desired (optimal) temperature range of operation for strip vessels in a carbon-in-pulp (CIP) circuit is around 280 °F (hence the cut-off). The number of cycles or peaks of temperature above this value are of interest. As an example, suppose the PRCSA algorithm normally observes up to 10 cycles in a span of 220 h, above the cut-off for clean data. If it observes 15 cycles—which is a drastic 50% increase—in the presence or induction of bias (like + 10%), the algorithm flags it as an anomaly or error. Broadly, the percentage in increase in such (and similar) statistics is the basis for the bias detection. However, the method depends on comparison of a clean data set (calibrated set) to its bias-induced version (biased set). This is impossible in real life situations where only one data stream is available per sensor, either it is clean or biased. Moreover, the parameters sensors observed in any industrial circuit are dependent on each other by underlying processes they observe, which can be described aptly as “multi-sensor environment.” For instance, heat of the barren solution in CIP circuit can drive the temperature in the strip vessels. There might be a statistic that describes the temperature sensor reading in a strip vessel as a function of heat sensor reading in barren solution (interrelation). In the absence of biased set, such interrelations can be exploited (or based upon) to suspect anomalies or errors. Due to these shortcomings, the preliminary experiments conducted in the past were not successful in finding bias as high as 10% in magnitude.

3 Carbon Stripping Circuit at Pogo Mine

A carbon stripping circuit in Pogo Mine of Alaska is chosen for the studies beyond PRCSA method. The stripping circuit sensor data collected from the Pogo mill is very large—200,000 readings a year per sensor, if collected in one-min intervals. The methodology explored in this paper exploits sensor relations in terms of certain statistics (explained in the subsequent paragraphs) in a multi-sensor environment and thus overcome the drawbacks of previous methods like PRCSA. Pogo Mine is one of the major gold producers in Alaska. The mine has produced 245,494 troy ounces (269,342 oz) of gold in the year 2016 according to Alaska Department of Natural Resources [25]. Pogo Mill processes up to 3175 t (3500 tons) of gold ore daily. Carbon stripping circuit is an integral part of Pogo mill facility. The mill flow is comprised of a crusher, semi-autogenous grinding (SAG) mill, ball mill, floatation circuit, leaching tanks, carbon-in-pulp (CIP) circuit, stripping circuit, and an electro-winning circuit. The schematic of the CIP circuit with strategic placement of various sensors is shown in Fig. 1. The two strip vessels (1 and 2) are operated in tandem. Pogo uses “pressurized Zadra stripping,” a method in which elute is circulated at certain pressure (448 kPa or 65 psi) and temperature (138 °C or 280 °F) to maximize gold separation. The following is the breakdown of a typical strip cycle that lasts for 11 h: loading the vessel (1 h), circulating elution (8 h), carbon cooling (1 h), and unloading carbon from the vessel (half hour) [26]. In order to maintain optimum temperature range in strip vessels, i.e., 132–138 °C (270–280 °F), monitoring of the strip vessels became important. Thus, detection of the sensors’ errors is crucial. False “optimal temperature” readings for S1 and S2 can result in poor process control which in turn detrimentally affects gold recoveries. Therefore, S1 and S2 are two important sensors of focus for this study.

Fig. 1
figure 1

Pogo stripping circuit schematic diagram with sensor placements [23]

4 Experimental Methodology

In order to observe the seasonal trends of various parameters of interest, the stripping circuit sensor data (raw data) was collected in 10-min average intervals for a period of 12 months: from Jan 1 to December 31, 2015 (Table 1). Since the raw data measurements were in US customary units, the original data analysis conducted is based on these units. However, SI system is followed in this paper with original units in parentheses. Raw data streams for the sensors of interest in 24 h span are shown in Fig. 2. For the purpose of experimentation, the raw data collected was assumed error-free, hence called “clean set.” Since “bias” is the focus of the study, it is artificially induced into the data set to create “biased set.” Bias is expressed as a percentage over true reading (Eq. 1). As explained in the previous section, finding bias as low as ±2 % is valuable for Pogo Mine, hence is the goal of the research.

$$ Biased\ reading,{x}_{\mathrm{bias}}= True\ reading\ \left({x}_{\mathrm{true}}\right)+ bias\ \left(e\hat{\mkern6mu} {}_{\mathrm{bias}}\right) $$
(1)
Table 1 A snapshot of raw sensor data collected at 10-min average intervals
Fig. 2
figure 2

Behavior of various sensors in 24 h time span

Bias detection experiments are conducted on one sensor at a time (multiple sensors having errors at the same time are out of scope of this paper). This is similar to the effect of industry standard practice of comparing a sensor of calibration interest to a well-calibrated sensor or standard device/measurement [8]. Since strip vessel sensors are the focus of this study, S1 is chosen for the bias experimentation. Once bias is induced, it stays until the end of the data set (biased set). For instance, if a + 2% bias was injected into S1 on March 12, all S1 data starting March 12 was corrupted with a + 2% bias. Algorithms were designed to detect the seeded bias. In order to improve the algorithms’ efficiency, bias was introduced at random times of the year (mimicking the real-life situations).

Data “filtering” is one important aspect of the developed algorithms. Filtering of unwanted data helps ensure that unimportant data features are not absorbed into the algorithm. Corrupted or unreliable data readings were removed using “cut-off” or “cleansing threshold” values (Thcleanse) before being analyzed. The data stream is further filtered based on the optimum strip vessel operational temperature ranges maintained by Pogo. Data below this range is eliminated at different “threshold (Th)” values to carefully examine strip vessel temperature cycles or “peaks.” A strip vessel “peak” (P) is a special case of its cycle and is defined as the continuous rise and maintenance of a sensor’s temperature above a certain threshold of choice. The algorithms observe (capture in to array tables or database) peaks in terms of “peak start-time” and “peak end-time” were noted at these moments: when temperature rises above the threshold, and when temperature falls below the threshold, respectively (Fig. 3).

Fig. 3
figure 3

Peaks vs thresholds in a clean set of S1 sensor data

4.1 Multiple Ratio Function Analysis

In order to mitigate the disadvantages of previous methods like PRCSA, the interrelations (explained earlier) between sensors were exploited with multiple ratio function analysis (MRFA), an innovative data-mining or hybrid method developed for this research. MRFA is the basis for the two powerful algorithms developed (in progression) and presented in this paper. It is obvious from the circuit diagram (Fig. 1) that the overall heat, or simply “heat,” in the barren solution drives the temperatures in strip vessels (through continuous circulation). If this relation can be mathematically expressed and represented by a ratio statistic called “ratio function,” or simply “ratio,” it can be easy to observe trends in sensors’ data by simply observing barren solution temperature and strip vessel temperatures. For instance, if the strip vessel-1 temperature to heat ratio is observed consistently to range between 1 and 4 when sensors are well calibrated, and suddenly changes to 1–3 range, it would be an anomaly. The mathematical representation of the process stripping circuit is as follows.

The heat to the strip vessels is supplied by the circulation of a hot “barren solution” through a series of four heat exchangers and their corresponding boilers. Sensors H1, H2, H3, and H4 measure the outlet temperatures of these four heat exchangers. In fact, the sum of all the heat sensors’ measurements, simply called “heat,” directly affects the temperature measurements of S1 and S2. In other words, the S1 and S2 measurements are a function of “heat.” In a similar fashion, increased barren and glycol flows (BARNFL and GLYFL, respectively) indicate more heat supplied to strip vessels (drives S1 and S2 readings to high values). For operational efficiency and high gold recoveries, one strip vessel is operated for production cycle and the other vessel is prepared (loaded with activated carbon) for the next cycle. Thus, the two vessels operate in tandem to ensure continuity of gold recovery. The behavior in the form of temperature cycles is represented by corresponding sensors in Fig. 2 (S1-red line, S2-green line). It can be observed that in a span of 24 h, S1 and S2 each completes a cycle that lasts approximately 11 h. This behavior is normal when the underlying process is normal and the sensors are well-calibrated. This relation (S1 and S2), if disrupted, indicates some anomaly, which is also exploited by the algorithms in addition to “ratio functions” (explained in the subsequent paragraphs). A schematic diagram for various sensor interrelations in the stripping circuit is provided in Fig. 4. In this context, “ratio function” is an important aspect of the detection process.

Fig. 4
figure 4

Various sensors and their interrelations

The relation in the form of a ratio function is defined in Eq. 2; S1 and heat are chosen for illustration purpose.

$$ {Ratio}^{\ast }=f\left[{T}_{\mathrm{peak}-\mathrm{ave}}\ (Heat),{T}_{\mathrm{peak}-\mathrm{ave}}\ (S1)\right] $$
(2)

For sensor S2, the relation is expressed in Eq. 3:

$$ {Ratio}^{\ast }=f\left[{T}_{\mathrm{peak}-\mathrm{ave}}\ (Heat),{T}_{\mathrm{peak}-\mathrm{ave}}\ (S2)\right] $$
(3)

*This is not a “simple ratio,” since several conditions have to be met (as detailed below) in order for the function to exist. Due to this reason, “ratio function” is used mostly throughout the text for distinction. However, for simplicity at times when “ratio” is used, it should be understood as ratio function.

Since S1 and S2 monitor similar processes, it is always helpful to observe the trends of these sensors by comparison. A way to observe this is through comparison of ratio functions they form with heat (S1: Heat vs. S2: Heat). In order to achieve this, the algorithm captures tandem cycles of S1 and S2 when they occur within 24 h window of each other (normal behavior). If cycles occur outside this window, that could be indicative of process (or strip vessel) downtimes due to maintenance, repair, etc. In this context, it should be noticed that the algorithms implement 24 h window criteria only to examine the relevant cycles.

The following are some of the key features of the ratio function as depicted in Fig. 5.

  1. 1)

    The process of capturing S1 and S2 peaks (and related data) within 24 h span and comparing their ratio functions is called “matching forward.” For instance, ratios computed from the sensor data collected during an S1 peak is matched to ratios computed from sensor data collected during an S2 peak that appears 24 h after S1 occurs (forward of S1 when data is viewed as in a graph like Fig. 5). The term “matching” when used in the subsequent sections should be understood as “matching forward,” since “matching backward” was not used at all.

  2. 2)

    The peaks are truncated cycles resulting from using a preset “truncation threshold” (trunc Th). The trunc Th value is set where the ratio function sensitivity to bias is high. This is explained in the subsequent sections.

  3. 3)

    Based on the peak-start and peak-end times of S1 and S2, the corresponding (occurred in the same time frames) heat exchanger sensor peaks (H1 through H4) are captured and the total “heat” is calculated.

  4. 4)

    The average temperature (Tpeak-ave) value for each and every peak is calculated.

  5. 5)

    Lastly, the Tpeak-ave ratios of S1 to heat and S2 to heat are calculated.

  6. 6)

    All the peaks are at least 1 h in duration. This is to avoid very small peaks (noise or small disruptions), which can distract the focus of the study.

Fig. 5
figure 5

Ratio functions and matching

Statistics of a peak can also include a maximum value of peak readings (peak-max) and the average value of peak readings (peak-ave) which are good indicators of temperature changes or sensitivities to bias. Revisiting Fig. 4, multiple ratio function relations can be exploited between various sensors in order to identify a biased sensor. For instance, if bias is induced in S1 sensor data—while the other sensors are kept well-calibrated—the ratios between S1 to all other sensors’ readings and S2 to all other sensors’ readings can be compared to find bias in S1. Since S1 and S2 are tandem processes, in an error-free state, the “ratio functions” they form with other sensors show similarities. In the presence of bias, the “ratio functions” as biased sensor (S1) forms with other sensors should show significant departure from that of a sensor with no bias (S2). As noted earlier, this approach is clearly meritorious compared with PRCSA, where a clean set of data—for baseline statistics—is always required. MRFA is designed as a single test that only exploits ratio relations between strip vessel sensors (S1 and S2) and heat, called heat test. The ratios that S1 and S2 forms with other parameters like barren flow and glycol flow are exploited with more tests that are added to MRFA at later stages to produce a more efficient and automated MRFA with automation (MRFAA) algorithm.

4.2 Algorithm Development

Two algorithms were developed in progression to accomplish the research based on MRFA analysis. The first one is called MRFA algorithm which has disadvantages in terms of speed and capability to conduct multiple tests to find bias with high success rates. When automation and multiple test capabilities are added to the algorithm it becomes the MRFAA algorithm (MRFA with automation), which is the final product of this research and responsible for the results presented. The major steps and the progression of algorithm development are depicted in a flowchart (Fig. 6). Detailed progression with corresponding flowcharts is presented in the subsequent sections. However, the data flow and major steps can be briefly summarized as follows: the algorithm reads the data (step 1); prepares the data, i.e., cleans the data for outliers or corrupted records to store data of interest and induces artificial bias in S1 data set (step 2); identifies and stores the S1 and S2 cycles based on matching criteria (step 3); calculates ratios between S1: heat and S2: heat and cycle statistics like peak average value and creates a data table or array with the ratio values of the peak-average values (step 4); matches S1 and S2 peaks (step 5); compares biased and clean datasets in a table form (step 6); plots the S1: heat and S2: heat ratios in graphical form to facilitate observation of bias data trends in comparison to clean (no bias) S2 ratios data (step 7). Steps 6 and 7 being described were part of MRFA algorithm, but when replaced by the addition of cross-score mechanism and alarm logic (automatic detection of bias), the algorithm becomes MRFAA algorithm (Fig. 6).

Fig. 6
figure 6

Flowchart depicting progression of MRFAA algorithm development

4.3 MRFA Algorithm

For the purpose of the algorithm description, S1 is assumed to have bias. The flowchart with detailed steps and descriptions is given in Fig. 7. From the previous descriptions and flowchart, it can be understood that MRFA algorithm conducts one test called heat ratio test. The following describes how bias is observed with the test which can also help in understanding how multiple tests are performed with MRFAA. In Fig. 8, it can be observed that the ratios of S1 bias: heat (red line) when compared with clean data ratios (blue line, S2: heat), exhibits an upward or downward trend (or simply “shift” in position) on Y-axis. In the figure, three scenarios are depicted in three corresponding graphs. The middle graph depicts the trends of S1: heat ratios (red line) to S2: heat ratios (blue line) when S1 and S2 are both clean. In a broad sense, visually the lines representing ratios (red/blue) intertwined consistently, exhibiting similar trends; exactly depicting the tandem nature of strip vessel cycles. When introduced with bias of + 2% (see 35th peak on X-axis), the red line suddenly shifted upwards after the event (a positive shift on Y-axis). The shift is downwards or negative when introduced with − 2% bias (see bottom graph). This shift will be more obvious at higher magnitudes of bias (> 2%) which is the reason gross errors can be detected easily with classical methods. However, it is very difficult to detect at a level of 2% magnitude, which the MRFA algorithm achieved. Once the shift is observed, the algorithm can help trace the peak number at which this occurred and the corresponding time stamp (when it occurred).

Fig. 7
figure 7

Flowchart for the MRFA algorithm

Fig. 8
figure 8

Comparison of positive (+ 2%) and negative (− 2%) bias effects on S1 sensor clean data set

After conducting several heat ratio tests (or simply heat tests) at different thresholds, it was observed that at a truncation threshold (trunc Th) of 135 °C (275 °F), the S1 sensor “shift” has more sensitivity to bias. This is called “effective threshold” (eff Th). The algorithm stores this threshold value to apply for future experiments, which makes finding bias much easier and faster. However, plotting graphs and observing trends to trace the time when bias exactly occurred is a tedious process. It should be automated for reasonably swift results. Several tests were added to MRFA algorithm in this context.

4.4 MRFAA Algorithm

One more drawback of the MRFA algorithm is its inability to detect errors at certain periods of time where temperatures of strip vessels ran below the threshold of choice, due to process adjustments to ore characteristics (after September month of 2015). Due to the reason, truncation threshold strategy proved disadvantageous in capturing peaks throughout the year. In order to compensate for the drawbacks of the MRFA algorithm, several changes were made to the algorithm as outlined below. The improved algorithm (MRFAA) is automated to find bias and the time of its occurrence, and is more reliable. The following are the additions to MRFA in terms of small algorithms, tests, and features that contribute to the MRFAA’s improved speed of execution.

  1. 1)

    Cross-score algorithm: as discussed in the previous sections, shift in biased data ratios (S1: heat) is indicative of the presence of bias, and thus an important variable to observe. A “cross-score” algorithm that detects the shift in terms of a “score” is developed as part of MRFAA algorithm. The algorithm assists various tests to detect bias automatically after it is introduced in the data. It finds the relative position of S1: heat ratio to S2: heat ratio (red line to blue line in Fig. 9) and assigns a score. For instance, at a particular peak matched (X-axis in Fig. 9), if redline is observed above blue line, a positive score of “1” is assigned. Otherwise, a score of “− 1” is assigned corresponding to the peak. These values are then populated in a table or array. At any given peak number (on X-axis again), the sum of past 10 scores corresponding to past 10 peaks observed is indicative of the relative position (up or down) of redline with respect to blue line (or biased vs clean ratios). The moving sum of past 10 scores at a given matched peak number is considered a “cross-score,” corresponding to that matched peak number. Thus, scores range between − 10 and + 10. If the crossing is even, a zero score is possible indicating no bias. On the other hand, continuous positive scores indicate S1 ratios on top of S2 ratios, indicating positive bias in S1 (vice versa for negative bias). Using the algorithm, starting from the beginning of the year in the input dataset, the MRFAA algorithm continuously checks cross-scores at each peak against “cross-score thresholds.” The cross-score thresholds are simply arbitrary levels that indicate a certain level of bias in sensors or operation. Higher cross-scores indicate high shift and thus high bias. The moment cross-scores start to grow past a certain threshold, the corresponding peak start-time is noted as “bias identification date” along with the cross-score threshold. The threshold is automatically changed by the algorithm. The aim is to find a threshold value at which bias identification is possible at the earliest time after induction date (with high success rate).

  2. 2)

    Multiple tests were added to MRFA in addition to heat test. For instance, BARNFL test compares S1: BARNFL to S2: BARNFL temperatures. All tests feature the cross-score algorithm, i.e., ratio tests such as, heat, BARNFL, GLYFL, and value tests such as “ave value test” and “max value test.” Various tests involved are represented in a tree form in Fig. 10.

  3. 3)

    A “dynamic thresholding” strategy is included to compensate the disadvantages of “truncation threshold” strategy of MRFA algorithm. The strategy dynamically adjusts the threshold based on peak-max value (maximum temperature registered during a peak).

Fig. 9
figure 9

Depiction of cross-score calculation mechanism

Fig. 10
figure 10

Classification of multiple tests

In a typical experiment, bias is introduced on a random day and it stays until the end of the year from that day, mimicking the real life industrial situation. A test designed in this context needs to find the bias after it occurs (or introduced in this case). Ideally it should find the bias, on the day it was introduced (algorithms always try), which is difficult due to the process variabilities and the algorithms’ logical response. If a test finds bias after it is introduced, the result is a “success” and the alarm designated is “True alarm” (value = 1). When a test identifies a date as bias introduction date even before the date it is introduced, it is called “False alarm” or “failure” (value = 0). The time till find days (TTFD) value is an absolute value and just indicates the difference in days between bias “introduction date” and the “bias identification date.” Only the alarm mechanism indicates if a test is true (always desired) or false (not desired) at a particular cross-score threshold value.

The flowchart for the algorithm is shown in the Fig. 11. As indicated in the algorithm development section previously, the flowchart is shown as an extension of MRFA algorithm, from Step 5 (Match S1, S2 Peaks) onwards. Given the previous descriptions of various aspects of algorithm development, the flowchart is easy to follow. However, it should be noted that MRFAA is a Fortran®-based algorithm as opposed to Matlab®-based MRFA. Since the input data consists of large arrays of numerical data, Fortran proved to be a better choice for faster data execution and manipulation. The following are some salient features of the MRFAA algorithm.

  1. 1)

    Fully automated algorithm, right from the choice of date for bias induction to the writing of results to the output file of choice (text or Excel).

  2. 2)

    Ability to select different thresholds: can switch between dynamic and truncation threshold if necessary. A combination of these thresholds is also possible. Ability to add an automated new test by adding a few lines of code.

  3. 3)

    Ability to validate the user choice of test by calculating performance statistics for the success (true alarms) and failure rates (false alarms).

  4. 4)

    Ability to identify either positive or negative bias.

Fig. 11
figure 11

Flowchart for MRFAA algorithm

5 Results

This section presents results from various experimentations (tests) with MRFAA algorithm along with validation in terms of success rates. Using MRFAA, multiple bias identification tests were conducted by inducing bias (± 2%) on each day of the year (365 days of the year) in the S1 sensor data set. Then, each test is validated by capturing TTFD values. The performance of each test in terms of success (true alarms) and failure rates (false alarms) is also captured (see Table 2). The aim was always to detect bias at the earliest possible date after induction with high success rate (true alarms). Various tests were conducted with a combination of dyn Th, i.e., 90% of the peak-max value of any matched peak, and a trunc Th of 255 °F (the bigger of the two is selected as the dynamic threshold for a peak). In this context, the trunc Th is only used to prevent the dynamic threshold from capturing peaks with too low value readings (generally < 124 °C or 255 °F). Moreover, too low values can compromise the sensitivity of “ratios” to bias, thus preventing detection by algorithm. For each cross-score threshold (ranges between − 10 and 10), bias was introduced at the first day of the year (2015) in dataset and the tests capture bias identification date and the TTFD for that threshold. The process was repeated for 365 days of the year. When a test (cross-score algorithm) completely failed to identify bias, a value of 365 was assigned as TTFD. This was to symbolically indicate the test attempted to detect bias by checking all the peaks that were matched (representing the whole year) and failed. A snapshot of various tests and results are presented in Table 2 for a typical run. A + 2% and − 2% bias were introduced in two separate cases, in each and every day of the year 2015 for S1 data to observe the effect on TTFD values. The success rates for the “alarm” tests and their corresponding TTFD performances are shown in Fig. 12. The results from the analysis where + 2% bias was induced in S1 data were presented in this paper. However, it should be noted that similar trends were observed for − 2% bias. The tests with minimum TTFD values and corresponding cross-score thresholds can be observed (highlighted text) in Table 2.

Table 2 Multiple tests with dynamic thresholding (90% of peak-max value): results report for day 92 (of 365); S1 bias-induced period (start and end date/time) April 2, 2015 12:00 AM–December 30, 2015 12:00 AM; Induced bias, + 2%; truncation threshold (Trunc Ths1) = 255 °F
Fig. 12
figure 12

Combined test: alarm and TTFD performances

A “combined test” is designed to capture the minimum TTFD value of all tests together at a particular threshold. For instance, it can be observed from Table 2 that when bias is induced on April 2, 2015, the algorithm that can find it with success (alarm = 1) at a minimum TTFD (42.42 days) is heat test at a cross-score threshold of 5 or 6. The combined test is also capable of providing the success (true alarms) and failure (false alarm) rates of all the tests together. After conducting all the possible tests for the year, it was found that in 75% of the cases, the algorithms together find the bias within 39.5 days at cross-score thresholds 5 and 6 (Fig. 13), whereas heat alone did it within 33 days; hence considered most effective test of all.

Fig. 13
figure 13

Combined test performance

6 Discussion

When it comes to algorithm development, the main disadvantage with MRFA algorithm is that it cannot achieve the bias finding process automatically, which is time consuming and a major hindrance for conducting multiple tests on multiple days. The MRFA algorithm’s efficiency is mainly dependent on the sensitivity (change in values) of ratios to bias at higher temperatures, i.e., between 132 and 138 °C (270–280 °F). The “truncation threshold” value was designed based on that premise and eliminates peaks (and ratios) that are less sensitive to bias. The disadvantage with establishing truncation threshold at higher values is that it eliminates some peaks that are essential in identifying bias; for instance, the peaks from the last quarter of the year 2015 where temperature run low in response to ore characteristics. The MRFA is chiefly a heat ratio test whereas MRFAA facilitates multiple tests. A dynamic threshold strategy of MRFAA adjusts the threshold to the maximum value of the peak-readings, and compensates for the disadvantages of truncation threshold; a 90% of peak-maximum value is selected in this research. Coming to the program of choice, The Fortran-based MRFAA algorithm is more efficient than MRFA due to its superior handling of array data and module functions.

It can be observed from the combined test results that all the tests together have a success (true alarms) rate of 95% at the cross-score thresholds of 5 and 6. Out of 365 tests conducted by inducing bias on each day of the year, 347 tests showed true alarm performance and 18 tests failed. This is a high success rate, which indicates the reliability of the test. It is important for a test to identify true bias more times compared with falsely identifying bias. In those “true” alarms, 75% of the cases, all tests together (combined test) are finding the bias within 39.5 days; and within 63 days for 95% of the cases. Out of 18 “false” alarms, 75% of the cases, the tests completely failed (TTFD = 365). At the cross-score thresholds of 7 or 8, the true alarm success rate is 90% (327) out of 365 tests; however, the TTFD (242.6 days for 75% of the cases) performance at these thresholds is poor when compared with the thresholds at 5 and 6. Due to the reason, cross-score thresholds 5 and 6 are best choices for finding bias in future datasets. In this context, finding the thresholds at which earliest bias detection is possible is crucial, specifically for nonlinear or ever fluctuating processes like mineral processing. Once these thresholds are detected and established by the algorithm, error detection in future datasets is uncomplicated. Given the fact that a combined cycle of strip vessel sensors (S1 and S2) takes a day to complete, identifying the bias within a month span of time is very valuable for the industries and helps them improve on the calibration frequencies especially when the general standard of frequency across industries is 1 year [1].

The performance of tests when bias is not presented in the data is also observed for validation. When cross-score values are as low as − 2 to 2, in 95% of the cases, all tests failed to detect bias. This is understandable (hence, validated) since S1 and S2 (when no bias is presented) exhibit perfect tandem cyclical behavior, and due to the canceling effect, the cross-score values stayed low (− 2 to 2). It is observed that if the cross-scores start to sway towards positive high values (> 4) or negative low values (<− 4), the algorithm detects bias. Coming to the individual tests, the heat test was capable of detecting bias within 33 days after induction for 75% of the cases, and within 55 days for 95% of the cases at cross-score thresholds of 5 and 6. It should be pointed out that TTFD is also impacted by the number of days strip vessels are operational, i.e., the algorithm may be similarly effective in two instances, but report a very different TTFD if at least one strip vessel was not operated for a few days for one of the instances. The “max value test” identified bias in 44 days for 75% of the cases at thresholds of 7 and 8. The average value test can find bias in 65 days for 50% of the cases. This could be attributed to the minimizing effect of the “average” on peak-readings and bias, which is disadvantageous for the sensitivity at very lower magnitudes of bias (2%). Average test may be better suited for errors of higher magnitude but for the subtle errors like bias, it might be disadvantageous. The barren flow (BARNFL) and glycol flow (GLYFL) tests did not show a promise, and are less responsive to the dynamic thresholding strategy. The ratios of flows to strip vessel temperatures are very minute and produced low cross-score values. At such low cross-scores, the algorithm is less effective and sensitive to thresholds in finding bias.

7 Conclusions

Sensor calibration errors of higher magnitude (gross errors) can be identified with any classical statistical methods. However, subtle bias errors of magnitude as low as ± 2% are indistinguishable from process fluctuations particularly for nonlinear processes. Hence, the detection is hard but valuable for mineral processing industries like Pogo Mine. Data-mining based methods are promising in dealing with nonlinear processes, but majority of past research was devoted to finding gross errors. The need for detection of subtle errors is addressed by innovative algorithms developed for this research. Detection of errors and calibration of sensors need physical intervention and cause downtimes; hence, in situ early detection methods are warranted to help industry operators save in mineral recoveries. Identification of bias as low as ± 2% in magnitude is crucial for Pogo in maintaining the optimum operation of the strip vessels. It can be concluded from the results that bias as low as 2% in magnitude can be identified in a reasonable amount of time with the data-mining-based automated algorithm (MRFAA) developed for the research without physical intervention or removal of sensor for comparison to well-calibrated sensor or standard device. Out of all the tests the algorithm can perform, the heat test is the most effective, which is capable of identifying the bias within 33 days (33 strip vessels’ cycles) after induction, 75% of the time at cross-score thresholds of 5 and 6. Unfortunately, average calibration frequency across the industries is once a year, and therefore, being able to identify sensor straying in about a month encourages the operators to increase the frequency. Increased frequency means more sensor accuracy and process control, hence a major economic benefit.

The algorithms developed in this paper are based on exploiting statistical interrelations between sensors’ readings in a carbon stripping circuit at Pogo Mine, in the form of “ratio function” relations. The results from the “combined test” performed by the MRFAA algorithm proved that all the tests together have a high success rate (95%) in terms of true alarms at the cross-score thresholds of 5 and 6. Some of the tests (GLYFL and BARNFL) were not very effective due to their poor response to bias. Overall, it can be concluded that at cross-score thresholds of 5 and 6, all the tests together have high chance of detecting bias for any future data set at Pogo Mine. Finding these thresholds is key to the detection of subtle errors for nonlinear processes like mineral processing operations. In a similar fashion, optimum cross-score thresholds can be quickly established for comparable industrial processes by using the automated algorithm and tests. The algorithm and the data-mining concepts used in the research are perhaps first of its kind for the mining industry, and the flexibility of the algorithm allows room for additional new tests in the future to improve its TTFD performance. Quantification of the economic benefits from the early bias detection at this stage is beyond the scope of the research; however, operators could use the algorithm alarms to initiate calibration exercises. The limitations of the algorithms are the assumptions, i.e., only one sensor is problematic at a time, that there is a period of time when all sensors are calibrated (when data is collected and parameters are tuned), and that the subtle error is in the form of a step change (even it is as small as 2%). Despite the limitation of the algorithm’s automatic detection capabilities, flexibility to implement constraints, and accommodate new tests, always help scalability and generalization to any industry in the future in addition to mining and mineral processing industries.