From Sensor Readings to Predictions: On the Process of Developing Practical Soft Sensors
Automatic data acquisition systems provide large amounts of streaming data generated by physical sensors. This data forms an input to computational models (soft sensors) routinely used for monitoring and control of industrial processes, traffic patterns, environment and natural hazards, and many more. The majority of these models assume that the data comes in a cleaned and pre-processed form, ready to be fed directly into a predictive model. In practice, to ensure appropriate data quality, most of the modelling efforts concentrate on preparing data from raw sensor readings to be used as model inputs. This study analyzes the process of data preparation for predictive models with streaming sensor data. We present the challenges of data preparation as a four-step process, identify the key challenges in each step, and provide recommendations for handling these issues. The discussion is focused on the approaches that are less commonly used, while, based on our experience, may contribute particularly well to solving practical soft sensor tasks. Our arguments are illustrated with a case study in the chemical production industry.
KeywordsRoot Mean Square Error Feature Selection Partial Little Square Partial Little Square Regression Data Preparation
Unable to display preview. Download preview PDF.
- 1.Budka, M.: Clustering as an example of optimizing arbitrarily chosen objective functions. In: Advanced Methods for Comp. Collective Intell., pp. 177–186 (2013)Google Scholar
- 3.Fortuna, L.: Soft sensors for monitoring and control of industrial processes. Springer (2007)Google Scholar
- 5.Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley (2002)Google Scholar
- 10.Mandelbrot, B.: The fractal geometry of nature. W.H. Freeman (1983)Google Scholar
- 11.Netzeva, T., Worth, A., Aldenberg, T., Benigni, R., Cronin, M., Gramatica, P., Jaworska, J., Kahn, S., Klopman, G., Marchant, C., et al.: Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. Alternatives to Laboratory Animals 33(2), 1–19 (2005)Google Scholar
- 13.Pearson, R.K.: Mining imperfect data. Society for Industrial and Applied Mechanics, USA (2005)Google Scholar
- 18.Witten, I., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann (2011)Google Scholar