How to Perform Data Mining: The “Persons Arrested” Dataset

  • Massimo Buscema


This paper presents an example of how to apply nonlinear auto-associative systems to data analysis. For this reason we have presented data and equations in a style that is unusual for such a paper.

Nonlinear auto-associative systems often fall under the generic name of nonsupervised Artificial Neural Networks (ANNs). These systems, however, represent a powerful set of techniques for data mining and they do not deserve simply a generic name. We propose to name this set of ANNs “Auto-poietic ANNs” (that is, systems that organize their behaviors by themselves).

Auto-poietic ANNs are a complex mix of different topologies, learning rules, signal dynamics, and cost functions. So, their mathematics can be very different from one to another and their capability to discover hidden connections within the same dataset can be very different too. This represents both the strength and the weakness of these algorithms.

All the Auto-poietic ANNs, in fact, can determine within a dataset how each (independent) variable is associated with the others, also considering nonlinear associations involved in parallel many-to-many relationships. But, because of the specific mathematics of each one of these algorithms, the final findings of their application to the same dataset can be different. Consequently, when we apply different Auto-poietic ANNs to the same sample of data, we can find as the result of their learning process different frames of associations among the same set of variables. The problem, at this point, is: which of these frames is more grounded? If the dataset represents a real situation, which of the resulting frames should we follow to organize a productive strategy of manipulation in the real world?

At the end of this paper we propose a new method to create a complex fusion of different Auto-poietic ANNs and we name this the Models Fusion Methodology (MFM).


Artificial Neural Network Hide Layer Minimum Span Tree Sexual Offence Pruning Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Buscema, M. (2007a). A novel adapting mapping method for emergent properties discovery in data bases: experience in medical field. In 2007 IEEE International Conference on Systems, Man and Cybernetics (SMC 2007). Montreal, Canada, 7–10 October.Google Scholar
  2. Buscema, M. (2007b). Squashing Theory and Contractive Map Network, Semeion Technical Paper #32, Rome.Google Scholar
  3. Buscema, M., Didoné, G., and Pandin, M. (1994). Reti Neurali AutoRiflessive, Teoria, Metodi, Applicazioni e Confronti. Quaderni di Ricerca, Armando Editore, n.1, [Self-reflexive networks: Theory, methods, applications and comparison. Semeion Research book by Armando Publisher, n.1].Google Scholar
  4. Buscema, M. and Grossi E. (2008a) The semantic connectivity map: An adapting self-organizing knowledge discovery method in data bases. Experience in Gastro-oesophageal reflux disease. Int. J. Datamining and Bioinfo. 2(4), 362–404.Google Scholar
  5. Buscema, M., Helgason, C., and Grossi, E. (2008b). Auto-contractive maps, H function and maximally regular graph: theory and applications. Special session on artificial adaptive systems in medicine: applications in the real world, NAFIPS 2008 (IEEE), New York, May 19–22.Google Scholar
  6. Buscema, M., Grossi, E., Snowdon, D., and Antuono, P. (2008c) Auto-contractive maps: An artificial adaptive system for data mining. An application to Alzheimer disease. Curr. Alzheimer Res. 5, 481–498.Google Scholar
  7. Fredman, M. L. and Willard, D. E. (1990). Trans-dichotomous algorithms for minimum spanning trees and shortest paths. 31st IEEE Symp. Foundations of Comp. Sci. 719–725.Google Scholar
  8. Chauvin, Y. and Rumelhart D. E., (Eds.). (1995). Backpropagation: Theory, architectures, and applications. 365 Broadway Hillsdale, New Jersey: Lawrence Erlbaum Associates, Inc. Publishers.Google Scholar
  9. Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. (2001). Introduction to Algorithms (2nd edn, pp. 567–574). MIT Press and McGraw-Hill. ISBN 0-262-03293-7. Section 23.2: The algorithms of Kruskal and Prim.Google Scholar
  10. Fahlman, S. E. (1988). An empirical study of learning speed in back-propagation networks. CMV Technical Report, CMV-CS-88-162.Google Scholar
  11. Gabow, H. N., Galil, Z., Spencer, T., and Tarjan, R. E. (1986). Efficient algorithms for finding minimum spanning trees in undirected and directed graphs. Combinatorica 6, 109–122.CrossRefGoogle Scholar
  12. Karger, D. R., Klein, P. N., Tarjan, R. E. (1995). A randomized linear-time algorithm to find minimum spanning trees. J. ACM 42, 321–328.CrossRefGoogle Scholar
  13. Kohonen, T., (1990). The self-organizing map. Proceedings IEEE 78, 1464–1480.CrossRefGoogle Scholar
  14. Kohonen, T. (1995). Learning vector quantization. In Arbib (Ed.), The handbook of brain theory and neural networks. A Bradford Book, Cambridge MA, London, England: The MIT Press.Google Scholar
  15. Kohonen, T. (1995b). Self-organizing maps. Berlin, Heidelberg: Springer Verlag.CrossRefGoogle Scholar
  16. Kruskal J. B., (1956). On the shortest spanning subtree of a graph and the traveling salesman problem. In Proc Amer. Math. Soc. 7(1), 48–50.CrossRefGoogle Scholar
  17. Licastro, F., Porcellini, E., Chiappelli, M., Forti, P., Buscema, M. et al. (2010). Multivariable network associated with cognitive decline and dementia, in Neurobiology of Aging, 31, 257–269.Google Scholar
  18. McClelland, J. L., and Rumelhart, D. E., (1988). Explorations in parallel distributed processing. Cambridge MA: The MIT Press.Google Scholar
  19. Rumelhart, D. E., McClelland J. L, (Eds.). (1986). Parallel distributed processing. Vol. 1: Foundations, explorations in the microstructure of cognition. Vol. 2: Psychological and biological models. Cambridge MA: The MIT Press.Google Scholar
  20. Research Software Google Scholar
  21. Buscema, M. (2002). Contractive Maps, Ver. 1.0, Semeion Software #15, Rome, 2000–2002.Google Scholar
  22. Buscema, M. (2007). Constraints Satisfaction Networks, Ver 11.0, Semeion Software #14, Rome, 2001–2007.Google Scholar
  23. Buscema, M. (2008). MST, Ver 5.1, Semeion Software #38, Rome, 2006–2008.Google Scholar
  24. Massini, G. (2007a). Trees Visualizer, Ver 3.0, Semeion Software #40, Rome, 2007.Google Scholar
  25. Massini, G. (2007b). Semantic Connection Map, Ver 1.0, Semeion Software #45, Rome, 2007.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  1. 1.Semeion Research Center, Via SersaleRomeItaly

Personalised recommendations