From Sensor Readings to Predictions: On the Process of Developing Practical Soft Sensors

  • Marcin Budka
  • Mark Eastwood
  • Bogdan Gabrys
  • Petr Kadlec
  • Manuel Martin Salvador
  • Stephanie Schwan
  • Athanasios Tsakonas
  • Indrė Žliobaitė
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8819)


Automatic data acquisition systems provide large amounts of streaming data generated by physical sensors. This data forms an input to computational models (soft sensors) routinely used for monitoring and control of industrial processes, traffic patterns, environment and natural hazards, and many more. The majority of these models assume that the data comes in a cleaned and pre-processed form, ready to be fed directly into a predictive model. In practice, to ensure appropriate data quality, most of the modelling efforts concentrate on preparing data from raw sensor readings to be used as model inputs. This study analyzes the process of data preparation for predictive models with streaming sensor data. We present the challenges of data preparation as a four-step process, identify the key challenges in each step, and provide recommendations for handling these issues. The discussion is focused on the approaches that are less commonly used, while, based on our experience, may contribute particularly well to solving practical soft sensor tasks. Our arguments are illustrated with a case study in the chemical production industry.


Root Mean Square Error Feature Selection Partial Little Square Partial Little Square Regression Data Preparation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Budka, M.: Clustering as an example of optimizing arbitrarily chosen objective functions. In: Advanced Methods for Comp. Collective Intell., pp. 177–186 (2013)Google Scholar
  2. 2.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM Computing Surveys 41(3), 1–58 (2009), doi:10.1145/1541880.1541882CrossRefGoogle Scholar
  3. 3.
    Fortuna, L.: Soft sensors for monitoring and control of industrial processes. Springer (2007)Google Scholar
  4. 4.
    Han, C., Lee, Y.: Intelligent integrated plant operation system for six sigma. Annual Reviews in Control 26, 27–43 (2002)CrossRefGoogle Scholar
  5. 5.
    Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley (2002)Google Scholar
  6. 6.
    Kadlec, P., Gabrys, B.: Architecture for development of adaptive on-line prediction models. Memetic Computing 1(4), 241–269 (2009)CrossRefGoogle Scholar
  7. 7.
    Kadlec, P., Gabrys, B., Strandt, S.: Data-driven soft sensors in the process industry. Computers and Chemical Engineering 33(4), 795–814 (2009)CrossRefGoogle Scholar
  8. 8.
    Kadlec, P., Grbic, R., Gabrys, B.: Review of adaptation mechanisms for data-driven soft sensors. Computers & Chemical Engineering 35(1), 1–24 (2011)CrossRefGoogle Scholar
  9. 9.
    Lin, B., Recke, B., Knudsen, J., Jorgensen, S.: A systematic approach for soft sensor development. Computers & chemical engineering 31(5-6), 419–425 (2007)CrossRefGoogle Scholar
  10. 10.
    Mandelbrot, B.: The fractal geometry of nature. W.H. Freeman (1983)Google Scholar
  11. 11.
    Netzeva, T., Worth, A., Aldenberg, T., Benigni, R., Cronin, M., Gramatica, P., Jaworska, J., Kahn, S., Klopman, G., Marchant, C., et al.: Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. Alternatives to Laboratory Animals 33(2), 1–19 (2005)Google Scholar
  12. 12.
    Park, S., Han, C.: A nonlinear soft sensor based on multivariate smoothing procedure for quality estimation in distillation columns. Computers & Chemical Engineering 24(2-7), 871–877 (2000)CrossRefGoogle Scholar
  13. 13.
    Pearson, R.K.: Mining imperfect data. Society for Industrial and Applied Mechanics, USA (2005)Google Scholar
  14. 14.
    Qin, J.: Recursive PLS algorithms for adaptive data modeling. Computers & Chemical Engineering 22(4-5), 503–514 (1998)CrossRefGoogle Scholar
  15. 15.
    Žliobaitė, I., Gabrys, B.: Adaptive preprocessing for streaming data. IEEE Trans. on Knowledge and Data Engineering 26, 309–321 (2014)CrossRefGoogle Scholar
  16. 16.
    Warne, K., Prasad, G., Rezvani, S., Maguire, L.: Statistical and computational intelligence techniques for inferential model development: a comparative evaluation and a novel proposition for fusion. Eng. Appl. of Artif. Intell. 17, 871–885 (2004)CrossRefGoogle Scholar
  17. 17.
    Willmott, C., Matsuura, K.: Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research 30, 79–82 (2005)CrossRefGoogle Scholar
  18. 18.
    Witten, I., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Marcin Budka
    • 1
  • Mark Eastwood
    • 2
  • Bogdan Gabrys
    • 1
  • Petr Kadlec
    • 3
  • Manuel Martin Salvador
    • 1
  • Stephanie Schwan
    • 3
  • Athanasios Tsakonas
    • 1
  • Indrė Žliobaitė
    • 4
  1. 1.Bournemouth UniversityUK
  2. 2.Coventry UniversityUK
  3. 3.Evonik IndustriesGermany
  4. 4.Aalto University and HIITFinland

Personalised recommendations