Preprocessing Tools for Nonlinear Datasets

Buscema, Massimo; Mancini, Alessandra; Breda, Marco

doi:10.1007/978-94-007-4914-6_8

Massimo Buscema³,
Alessandra Mancini³ &
Marco Breda³

1321 Accesses

Abstract

Two new preprocessing systems developed on the basis of the evolutionary methodology are proposed and used in artificial neural networks (ANN). These systems show a high ability for empowerment of standard ANN model performance when used for prediction/classification problems with complex datasets characterized by nonlinear relations between the variables. Training and testing systems are robust data resampling techniques that are able to arrange the source sample into subsamples that all possess a similar probability density function. In this way, the data is split into two or more subsamples in order to train, test, and validate the ANN models more effectively. The IS system is an evolutionary wrapper system able to reduce the amount of data while conserving the largest amount of information available in the dataset. The performances of such systems were tested in a classification task carried out on two different well-known datasets. The classification accuracy reached by a standard back-propagation ANN model trained first on a random subset and then on subsamples selected by T&T systems, while simultaneously using IS to select the variables, is compared. The results show a significant enhancement of the standard ANN classification ability when the proposed preprocessing systems are applied.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bennett, K. P., & Mangasarian, O. L. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1, 23–34.
Article Google Scholar
Buscema, M. (2004). Genetic Doping Algorithm (GenD): Theory and applications. Expert Systems, 21(2), 63–79.
Article Google Scholar
John, G., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problems. In 11th International Conference on Machine Learning, pp. 121–129.
Google Scholar
Kolen, J. F., & Pollack, J. B. (1990). Back-propagation is sensitive to initial conditions. Complex Systems, 4(3), 269–280.
Google Scholar
Kudo, M., & Sklansky, J. (2000). Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33(1), 25–41.
Article Google Scholar
Mangasarian, O. L., & Wolberg, W. H. (1990). Cancer diagnosis via linear programming. SIAM News, 23(5), 1–18.
Google Scholar
Mangasarian, O. L., Setiono, R., & Wolberg, W. H. (1990). Pattern recognition via linear programming: Theory and application to medical diagnosis. In T. F. Coleman & Y. Li (Eds.), Large-scale numerical optimization (pp. 22–30). Philadelphia: Siam Publications.
Google Scholar
Merz, C. J., & Murphy, P. M. (1998). UCI repository of machine learning databases. Irvine: University of California, Department of Information and Computer Science.
Google Scholar
Quinlan, J. R. (1987). Generating production rules from decision tree. In International Joint Conference on Artificial Intelligence (pp. 304–307). Milan.
Google Scholar
Quinlan, J. R. (1993). C 4.5: Programs for machine learning. San Mateo: Morgan Kaufmann.
Google Scholar
Siedlecki, W., & Slansky, J. (1989). A note on genetic algorithms for large scale on feature selection. Pattern Recognition Letters, 10, 335–347.
Article Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.
Google Scholar
Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences of the United States of America, 87, 9193–9196.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Semeion Research Center of Sciences of Commuication, via Sersale 117, Rome, Italy
Massimo Buscema, Alessandra Mancini & Marco Breda

Authors

Massimo Buscema
View author publications
You can also search for this author in PubMed Google Scholar
Alessandra Mancini
View author publications
You can also search for this author in PubMed Google Scholar
Marco Breda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Massimo Buscema .

Editor information

Editors and Affiliations

Semeion Research Center of Sciences of C, Via Sersale 117, Rome, 00128, Roma, Italy
Massimo Buscema
Ithaca College, Park Business Center 424, Ithaca, 14850, New York, USA
William J. Tastle

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Buscema, M., Mancini, A., Breda, M. (2013). Preprocessing Tools for Nonlinear Datasets. In: Buscema, M., Tastle, W. (eds) Intelligent Data Mining in Law Enforcement Analytics. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-4914-6_8

Download citation

DOI: https://doi.org/10.1007/978-94-007-4914-6_8
Published: 11 September 2012
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-4913-9
Online ISBN: 978-94-007-4914-6
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)

Publish with us

Policies and ethics