Granular Computing Techniques for Classification and Semantic Characterization of Structured Data
- 240 Downloads
We propose a system able to synthesize automatically a classification model and a set of interpretable decision rules defined over a set of symbols, corresponding to frequent substructures of the input dataset. Given a preprocessing procedure which maps every input element into a fully labeled graph, the system solves the classification problem in the graph domain. The extracted rules are then able to characterize semantically the classes of the problem at hand. The structured data that we consider in this paper are images coming from classification datasets: they represent an effective proving ground for studying the ability of the system to extract interpretable classification rules. For this particular input domain, the preprocessing procedure is based on a flexible segmentation algorithm whose behavior is defined by a set of parameters. The core inference engine uses a parametric graph edit dissimilarity measure. A genetic algorithm is in charge of selecting suitable values for the parameters, in order to synthesize a classification model based on interpretable rules which maximize the generalization capability of the model. Decision rules are defined over a set of information granules in the graph domain, identified by a frequent substructures miner. We compare the system with two other state-of-the-art graph classifiers, evidencing both its main strengths and limits.
KeywordsGranular computing Automatic semantic interpretation Frequent substructures miner Graph matching Graph classification Evolutionary optimization Watershed segmentation
Compliance with Ethical Standards
Conflict of Interest
Filippo Maria Bianchi, Simone Scardapane, Antonello Rizzi, Aurelio Uncini, and Alireza Sadeghian declare that they have no conflict of interest.
All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.
Human and Animal Rights
This article does not contain any studies with human or animal subjects performed by any of the authors.
- 4.Bargiela A, Pedrycz W. Granular computing: an introduction. Springer Science & Business Media; 2012.Google Scholar
- 6.Bianchi FM, Livi L, Rizzi A. Two density-based k-means initialization algorithms for non-metric data clustering. Pattern Anal Appl. 2015. doi: 10.1007/s10044-014-0440-4.
- 7.Bianchi FM, Maiorino E, Livi L, Rizzi A, Sadeghian A. An agent-based algorithm exploiting multiple local dissimilarities for clusters mining and knowledge discovery. Soft Comput. 2015. doi: 10.1007/s00500-015-1876-1.
- 8.Bianchi FM, Scardapane S, Livi L, Uncini A, Rizzi A. An interpretable graph-based image classifier. In: 2014 International Joint Conference on Neural Networks (IJCNN), p. 2339–2346. IEEE (2014).Google Scholar
- 15.Del Vescovo G, Rizzi A. Automatic Classification of Graphs by Symbolic Histograms. In: Granular Computing, 2007. GRC 2007. IEEE International Conference on, p. 410–410.Google Scholar
- 16.Del Vescovo G, Rizzi A. Online Handwriting Recognition by the Symbolic Histograms Approach. In: Proceedings of the 2007 IEEE International Conference on Granular Computing., GRC ’07, p. 686–700. IEEE Computer Society, Washington, DC (2007).Google Scholar
- 17.Eichinger F, Bohm K. Software-bug localization with graph mining. In: Managing and mining graph data. Springer; 2010. vol. 40, p. 515–546. doi: 10.1007/978-1-4419-6045-0_17.
- 18.Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;3:1157–82.Google Scholar
- 22.Huan J, Wang W, Prins J. Efficient mining of frequent subgraphs in the presence of isomorphism. In: 2003 Third IEEE International Conference on Data Mining (ICDM’03), p. 549–552. IEEE (2003).Google Scholar
- 23.Ketkar NS, Holder LB, Cook DJ. Mining in the Proximity of Subgraphs. In: ACM KDD Workshop on Link Analysis: Dynamics and Statics of Large Networks (2006).Google Scholar
- 24.Lange J, von der Malsburg C, et al. Distortion invariant object recognition by matching hierarchically labeled graphs. In: 1989 International Joint Conference on Neural Networks (IJCNN’89), p. 155–159. IEEE (1989).Google Scholar
- 25.Li LJ, Su H, Fei-Fei L, Xing EP. Object bank: A high-level image representation for scene classification & semantic feature sparsification. In: Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A, editors. Advances in neural information processing systems 23. Curran Associates, Inc., 2010. p. 1378–86.Google Scholar
- 26.Livi L, Del Vescovo G, Rizzi A. Combining graph seriation and substructures mining for graph recognition. In: Pattern recognition - applications and methods. Advances in intelligent systems and computing. Berlin Heidelberg: Springer; 2013. vol. 204, p. 79–91. doi: 10.1007/978-3-642-36530-0_7.CrossRefGoogle Scholar
- 27.Livi L, Del Vescovo G, Rizzi A, Frattale Mascioli FM. Building Pattern Recognition Applications with the SPARE Library. ArXiv preprint arXiv:1410.5263 (2014).
- 31.Neuhaus M, Bunke H. Bridging the gap between graph edit distance and kernel machines. Series in machine perception and artificial intelligence. London: World Scientific; 2007.Google Scholar
- 32.Nijssen S, Kok JN. A quickstart in frequent structure mining can make a difference. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, p. 647–652. ACM (2004).Google Scholar
- 35.Rizzi A, Del Vescovo G. A symbolic approach to the solution of F-classification problems. In: 2005 Proceedings of the IEEE International Joint Conference on Neural Networks, 2005, vol. 3, p. 1953–1958. IEEE (2005).Google Scholar
- 36.Rizzi A, Del Vescovo G. Automatic Image Classification by a Granular Computing Approach. In: Proceedings of the 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing, p. 33–38. IEEE (2006).Google Scholar
- 37.Roerdink JB, Meijster A. The watershed transform: definitions, algorithms and parallelization strategies. Fundam Inform. 2000;41(1):187–228.Google Scholar
- 39.SPImR2: A set of 24 Instances of Synthetic and Photographic Image Classification problems. 2014. http://infocom.uniroma1.it/~rizzi/index.htm.
- 40.Theodoridis S, Koutroumbas K. Pattern recognition. Elsevier: Academic Press; 2006.Google Scholar
- 45.Yan X, Han J. gspan: Graph-based substructure pattern mining. In: 2002 IEEE International Conference on Data Mining (ICDM’02), p. 721–724. IEEE (2002).Google Scholar