Skip to main content

Preprocessing Tools for Nonlinear Datasets

  • Chapter
  • First Online:
Intelligent Data Mining in Law Enforcement Analytics

Abstract

Two new preprocessing systems developed on the basis of the evolutionary methodology are proposed and used in artificial neural networks (ANN). These systems show a high ability for empowerment of standard ANN model performance when used for prediction/classification problems with complex datasets characterized by nonlinear relations between the variables. Training and testing systems are robust data resampling techniques that are able to arrange the source sample into subsamples that all possess a similar probability density function. In this way, the data is split into two or more subsamples in order to train, test, and validate the ANN models more effectively. The IS system is an evolutionary wrapper system able to reduce the amount of data while conserving the largest amount of information available in the dataset. The performances of such systems were tested in a classification task carried out on two different well-known datasets. The classification accuracy reached by a standard back-propagation ANN model trained first on a random subset and then on subsamples selected by T&T systems, while simultaneously using IS to select the variables, is compared. The results show a significant enhancement of the standard ANN classification ability when the proposed preprocessing systems are applied.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Bennett, K. P., & Mangasarian, O. L. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1, 23–34.

    Article  Google Scholar 

  • Buscema, M. (2004). Genetic Doping Algorithm (GenD): Theory and applications. Expert Systems, 21(2), 63–79.

    Article  Google Scholar 

  • John, G., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problems. In 11th International Conference on Machine Learning, pp. 121–129.

    Google Scholar 

  • Kolen, J. F., & Pollack, J. B. (1990). Back-propagation is sensitive to initial conditions. Complex Systems, 4(3), 269–280.

    Google Scholar 

  • Kudo, M., & Sklansky, J. (2000). Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33(1), 25–41.

    Article  Google Scholar 

  • Mangasarian, O. L., & Wolberg, W. H. (1990). Cancer diagnosis via linear programming. SIAM News, 23(5), 1–18.

    Google Scholar 

  • Mangasarian, O. L., Setiono, R., & Wolberg, W. H. (1990). Pattern recognition via linear programming: Theory and application to medical diagnosis. In T. F. Coleman & Y. Li (Eds.), Large-scale numerical optimization (pp. 22–30). Philadelphia: Siam Publications.

    Google Scholar 

  • Merz, C. J., & Murphy, P. M. (1998). UCI repository of machine learning databases. Irvine: University of California, Department of Information and Computer Science.

    Google Scholar 

  • Quinlan, J. R. (1987). Generating production rules from decision tree. In International Joint Conference on Artificial Intelligence (pp. 304–307). Milan.

    Google Scholar 

  • Quinlan, J. R. (1993). C 4.5: Programs for machine learning. San Mateo: Morgan Kaufmann.

    Google Scholar 

  • Siedlecki, W., & Slansky, J. (1989). A note on genetic algorithms for large scale on feature selection. Pattern Recognition Letters, 10, 335–347.

    Article  Google Scholar 

  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.

    Google Scholar 

  • Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences of the United States of America, 87, 9193–9196.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massimo Buscema .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Buscema, M., Mancini, A., Breda, M. (2013). Preprocessing Tools for Nonlinear Datasets. In: Buscema, M., Tastle, W. (eds) Intelligent Data Mining in Law Enforcement Analytics. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-4914-6_8

Download citation

Publish with us

Policies and ethics