How to Perform Data Mining: The “Persons Arrested” Dataset
This paper presents an example of how to apply nonlinear auto-associative systems to data analysis. For this reason we have presented data and equations in a style that is unusual for such a paper.
Nonlinear auto-associative systems often fall under the generic name of nonsupervised Artificial Neural Networks (ANNs). These systems, however, represent a powerful set of techniques for data mining and they do not deserve simply a generic name. We propose to name this set of ANNs “Auto-poietic ANNs” (that is, systems that organize their behaviors by themselves).
Auto-poietic ANNs are a complex mix of different topologies, learning rules, signal dynamics, and cost functions. So, their mathematics can be very different from one to another and their capability to discover hidden connections within the same dataset can be very different too. This represents both the strength and the weakness of these algorithms.
All the Auto-poietic ANNs, in fact, can determine within a dataset how each (independent) variable is associated with the others, also considering nonlinear associations involved in parallel many-to-many relationships. But, because of the specific mathematics of each one of these algorithms, the final findings of their application to the same dataset can be different. Consequently, when we apply different Auto-poietic ANNs to the same sample of data, we can find as the result of their learning process different frames of associations among the same set of variables. The problem, at this point, is: which of these frames is more grounded? If the dataset represents a real situation, which of the resulting frames should we follow to organize a productive strategy of manipulation in the real world?
At the end of this paper we propose a new method to create a complex fusion of different Auto-poietic ANNs and we name this the Models Fusion Methodology (MFM).
KeywordsArtificial Neural Network Hide Layer Minimum Span Tree Sexual Offence Pruning Algorithm
- Buscema, M. (2007a). A novel adapting mapping method for emergent properties discovery in data bases: experience in medical field. In 2007 IEEE International Conference on Systems, Man and Cybernetics (SMC 2007). Montreal, Canada, 7–10 October.Google Scholar
- Buscema, M. (2007b). Squashing Theory and Contractive Map Network, Semeion Technical Paper #32, Rome.Google Scholar
- Buscema, M., Didoné, G., and Pandin, M. (1994). Reti Neurali AutoRiflessive, Teoria, Metodi, Applicazioni e Confronti. Quaderni di Ricerca, Armando Editore, n.1, [Self-reflexive networks: Theory, methods, applications and comparison. Semeion Research book by Armando Publisher, n.1].Google Scholar
- Buscema, M. and Grossi E. (2008a) The semantic connectivity map: An adapting self-organizing knowledge discovery method in data bases. Experience in Gastro-oesophageal reflux disease. Int. J. Datamining and Bioinfo. 2(4), 362–404.Google Scholar
- Buscema, M., Helgason, C., and Grossi, E. (2008b). Auto-contractive maps, H function and maximally regular graph: theory and applications. Special session on artificial adaptive systems in medicine: applications in the real world, NAFIPS 2008 (IEEE), New York, May 19–22.Google Scholar
- Buscema, M., Grossi, E., Snowdon, D., and Antuono, P. (2008c) Auto-contractive maps: An artificial adaptive system for data mining. An application to Alzheimer disease. Curr. Alzheimer Res. 5, 481–498.Google Scholar
- Fredman, M. L. and Willard, D. E. (1990). Trans-dichotomous algorithms for minimum spanning trees and shortest paths. 31st IEEE Symp. Foundations of Comp. Sci. 719–725.Google Scholar
- Chauvin, Y. and Rumelhart D. E., (Eds.). (1995). Backpropagation: Theory, architectures, and applications. 365 Broadway Hillsdale, New Jersey: Lawrence Erlbaum Associates, Inc. Publishers.Google Scholar
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. (2001). Introduction to Algorithms (2nd edn, pp. 567–574). MIT Press and McGraw-Hill. ISBN 0-262-03293-7. Section 23.2: The algorithms of Kruskal and Prim.Google Scholar
- Fahlman, S. E. (1988). An empirical study of learning speed in back-propagation networks. CMV Technical Report, CMV-CS-88-162.Google Scholar
- Kohonen, T. (1995). Learning vector quantization. In Arbib (Ed.), The handbook of brain theory and neural networks. A Bradford Book, Cambridge MA, London, England: The MIT Press.Google Scholar
- Licastro, F., Porcellini, E., Chiappelli, M., Forti, P., Buscema, M. et al. (2010). Multivariable network associated with cognitive decline and dementia, in Neurobiology of Aging, 31, 257–269.Google Scholar
- McClelland, J. L., and Rumelhart, D. E., (1988). Explorations in parallel distributed processing. Cambridge MA: The MIT Press.Google Scholar
- Rumelhart, D. E., McClelland J. L, (Eds.). (1986). Parallel distributed processing. Vol. 1: Foundations, explorations in the microstructure of cognition. Vol. 2: Psychological and biological models. Cambridge MA: The MIT Press.Google Scholar
- Research Software Google Scholar
- Buscema, M. (2002). Contractive Maps, Ver. 1.0, Semeion Software #15, Rome, 2000–2002.Google Scholar
- Buscema, M. (2007). Constraints Satisfaction Networks, Ver 11.0, Semeion Software #14, Rome, 2001–2007.Google Scholar
- Buscema, M. (2008). MST, Ver 5.1, Semeion Software #38, Rome, 2006–2008.Google Scholar
- Massini, G. (2007a). Trees Visualizer, Ver 3.0, Semeion Software #40, Rome, 2007.Google Scholar
- Massini, G. (2007b). Semantic Connection Map, Ver 1.0, Semeion Software #45, Rome, 2007.Google Scholar