Visualisation of High Dimensional Data by Use of Genetic Programming: Application to On-line Infrared Spectroscopy Based Process Monitoring
In practical data mining and process monitoring problems high-dimensional data has to be analyzed. In most of the cases it is very informative to map and visualize the hidden structure of complex data in a low-dimensional space. Industrial applications require easily implementable, interpretable and accurate projection. Nonlinear functions (aggregates) are useful for this purpose. A pair of these functions realise feature selection and transformation but finding the proper model structure is a complex nonlinear optimisation problem. We present a Genetic Programming (GP) based algorithm to generate aggregates represented in a tree structure. Results show that the developed tool can be effectively used to build an on-line spectroscopy based process monitoring system; the two-dimensional mapping of high dimensional spectral database can represent different operating ranges of the process.
KeywordsGenetic programming Nonlinear data projection High dimensional data Visualisation
The financial support of the TAMOP-4.2.2/B-10/1-2010-0025 and the TAMOP-4.2.2.A-11/1/KONV-2012-0071 projects are gratefully acknowledged.
- 2.Narendra, P., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Trans. Comput., C-269, 917–922 (1977)Google Scholar
- 4.Madr, J., Abonyi, J., Szeifert, F.: Genetic programming for the identification of nonlinear input-output models. Ind. Eng. Chem. Res., 44(9), 3178–3186 (2005)Google Scholar
- 5.Jolliffe, T.: Principal Component Analysis. Springer, New York (1996)Google Scholar
- 8.Kohonen, T.: Self-Organizing Maps. Springer, Berlin (2001)Google Scholar
- 12.Sonbul, Y.R.: Topological near infrared analysis modeling of petroleum refinery products (2005). US6.897.071 B2Google Scholar
- 13.Yang, J., Lee, I.: Common Clustering Algorithms. Comprehensive Chemometrics. Elsevier, Amsterdam, pp 577–618, (2009)Google Scholar
- 16.Venna, J., Kaski, S.: Local multidimensional scaling with controlled tradeoff between trustworthiness and continuity. In: Proceedings of the Workshop on Self-organizing Maps, pp 695–702Google Scholar
- 17.Venna, J., Kaski, S.: Local multidimensional scaling. Neural Netw., 19(6), 889–899 (2006)Google Scholar
- 18.Kaski, S., Nikkilä, J., Oja, M., Venna, J., Törönen, J., Castrén, E.: Trustworthiness and metrics in visualizing similarity of gene expression. BMC Bioinform., 4(1), 48, (2003)Google Scholar
- 19.Descales, B., Lambert, D., Llinas, J.R., Martens, A., Osta, S., Sanchez, M., Bages, S.: Method for determining properties using near infra-red (nir), spectroscopy (2000). US6.070.128Google Scholar