Coupling the PAELLA Algorithm to Predictive Models
This paper explores the benefit of using the PAELLA algorithm in an innovative way. The PAELLA algorithm was originally developed in the context of outlier detection and data cleaning. As a consequence, it is usually seen as a discriminant tool that categorizes observations into two groups: core observations and outliers. A new look at the information contained in its output provides ample opportunity in the context of data driven predictive models. The information contained in the occurrence vector is used through the experiments reported in a quest for finding how to take advantage of that information. The results obtained in each successive experiment guide the researcher to a sensible use case in which this information proves extremely useful: probabilistic sampling regression.
KeywordsProbabilistic sampling Outlier detection
We gratefully acknowledge the financial support of Spanish Ministerio de Economía, Industria y Competitividad through grant DPI2016-79960-C3-2-P. We would like to also express our gratitude to Castilla y León Supercomputing Center whose cooperation allowed us to run around one million neural network trainings for the experiments reported on this paper.
- 2.Gonzalez-Marcos, A., Alba-Elias, F., Castejon-Limas, M., Ordieres-Mere, J.: Development of neural network-based models to predict mechanical properties of hot dip galvanised steel coils. Int. J. Data Min. Model. Manag. 3(4), 389–405 (2011)Google Scholar
- 5.Dasu, T., Johnson, T.: Exploratory Data Mining and Data Cleaning (2003)Google Scholar
- 7.Limas, M.C., Meré, J.B.O., Ascacibar, F.J.M.D.P., González, E.P.V.: Outlier detection and data cleaning in multivariate non-normal samples: the PAELLA algorithm. In: Data Mining and Knowledge Discovery (2004)Google Scholar
- 8.Pernía-Espinoza, A.V., Ordieres-Meré, J.B., Martínez-de Pisón, F.J., González-Marcos, A.: TAO-robust backpropagation learning algorithm. Neural Netw. 18(2), 191–204 (2005). http://www.sciencedirect.com/science/article/pii/S0893608004002345 CrossRefGoogle Scholar