Abstract
Two new preprocessing systems developed on the basis of the evolutionary methodology are proposed and used in artificial neural networks (ANN). These systems show a high ability for empowerment of standard ANN model performance when used for prediction/classification problems with complex datasets characterized by nonlinear relations between the variables. Training and testing systems are robust data resampling techniques that are able to arrange the source sample into subsamples that all possess a similar probability density function. In this way, the data is split into two or more subsamples in order to train, test, and validate the ANN models more effectively. The IS system is an evolutionary wrapper system able to reduce the amount of data while conserving the largest amount of information available in the dataset. The performances of such systems were tested in a classification task carried out on two different well-known datasets. The classification accuracy reached by a standard back-propagation ANN model trained first on a random subset and then on subsamples selected by T&T systems, while simultaneously using IS to select the variables, is compared. The results show a significant enhancement of the standard ANN classification ability when the proposed preprocessing systems are applied.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bennett, K. P., & Mangasarian, O. L. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1, 23–34.
Buscema, M. (2004). Genetic Doping Algorithm (GenD): Theory and applications. Expert Systems, 21(2), 63–79.
John, G., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problems. In 11th International Conference on Machine Learning, pp. 121–129.
Kolen, J. F., & Pollack, J. B. (1990). Back-propagation is sensitive to initial conditions. Complex Systems, 4(3), 269–280.
Kudo, M., & Sklansky, J. (2000). Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33(1), 25–41.
Mangasarian, O. L., & Wolberg, W. H. (1990). Cancer diagnosis via linear programming. SIAM News, 23(5), 1–18.
Mangasarian, O. L., Setiono, R., & Wolberg, W. H. (1990). Pattern recognition via linear programming: Theory and application to medical diagnosis. In T. F. Coleman & Y. Li (Eds.), Large-scale numerical optimization (pp. 22–30). Philadelphia: Siam Publications.
Merz, C. J., & Murphy, P. M. (1998). UCI repository of machine learning databases. Irvine: University of California, Department of Information and Computer Science.
Quinlan, J. R. (1987). Generating production rules from decision tree. In International Joint Conference on Artificial Intelligence (pp. 304–307). Milan.
Quinlan, J. R. (1993). C 4.5: Programs for machine learning. San Mateo: Morgan Kaufmann.
Siedlecki, W., & Slansky, J. (1989). A note on genetic algorithms for large scale on feature selection. Pattern Recognition Letters, 10, 335–347.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.
Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences of the United States of America, 87, 9193–9196.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Buscema, M., Mancini, A., Breda, M. (2013). Preprocessing Tools for Nonlinear Datasets. In: Buscema, M., Tastle, W. (eds) Intelligent Data Mining in Law Enforcement Analytics. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-4914-6_8
Download citation
DOI: https://doi.org/10.1007/978-94-007-4914-6_8
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-4913-9
Online ISBN: 978-94-007-4914-6
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)