Abstract
Taxonomies are essential tools for fast information retrieval and classification of knowledge. Many existing techniques for automatic taxonomy generation strongly depend on the specific properties of a particular domain and are consequently hard to apply to other domains. Some attempts have been made to design taxonomies for multiple domains. Unfortunately, they induce high hierarchical classification error rates for some datasets. The automatic design of a taxonomy requires the capability of measuring the similarity between classes. More precisely, the fact that two classes are near intuitively implies that some elements of one class are scattered in the neighborhood of some elements of the other class. This observation is used in this paper to propose a new generic technique for automatic taxonomy generation. A topological analysis of the neighborhood of each instance is first performed. The results of this analysis are used to initialize and train a hidden Markov model for each class. The model of a given class c captures the frequencies of the classes found in the neighborhood of the instances of c, from the most dominant class to the least dominant. The similarities between these models are finally used to derive a taxonomy. Hierarchical classification experiments realized on 20 datasets from various domains showed an average accuracy of \(97.22\%\) and a standard deviation of \(4.11\%\). Comparison results revealed that the proposed approach outperforms existing work with accuracy gains reaching \(38.62\%\) for one dataset.
Similar content being viewed by others
References
Sujatha R, Bandaru R, Rao R (2011) Taxonomy construction techniques–issues and challenges. Indian J Comput Sci Eng IJCSE 2(5):661–671
Li T, Anand SS (2008) Automated taxonomy generation for summarizing multi-type relational datasets. In: International conference on data mining (DMIN 2008), Las Vegas, USA, pp 571–577
Treeratpituk P, Khabsa M, Giles CL (2013) Graph-based approach to automatic taxonomy generation (grabtax). arXiv preprint arXiv:1307.1718
Kang D-K, Silvescu A, Zhang J, Honavar V (2004) Generation of attribute value taxonomies from data for data-driven construction of accurate and compact classifiers. In: Fourth IEEE International Conference on Data Mining (ICDM’04), pp 130–137. IEEE
Punera K, Rajan S, Ghosh J (2006) Automatic construction of n-ary tree based taxonomies. In: null, pp 75–79. IEEE
Jo H, Na Y-C, Oh B, Yang J, Honavar V (2008) Attribute value taxonomy generation through matrix based adaptive genetic algorithm. In: 2008 20th IEEE International Conference on Tools with Artificial Intelligence, vol 1, pp 393–400. IEEE
Kang D-K, Sohn K (2009) Learning decision trees with taxonomy of propositionalized attributes. Pattern Recognit 42(1):84–92
Cagliero L, Garza P (2013) Improving classification models with taxonomy information. Data Knowl Eng 86:85–101
Iloga S, Romain O, Tchuenté M (2019) A sequential pattern mining approach to design taxonomies for hierarchical music genre recognition. Pattern Anal Appl 21(2):363–380
Iloga S, Romain O, Tchuenté M (2019) An accurate hmm-based similarity measure between finite sets of histograms. Pattern Anal Appl 22(3):1079–1104
Chien L-F, Huang C-C, Teng J-W, Chuang S-L (2002) Automatic taxonomy generation for speech archives. In: International Symposium on Chinese Spoken Language Processing
Yang H, Callan J (2009) A metric-based framework for automatic taxonomy induction. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp 271–279
Liu X, Song Y, Liu S, Wang H (2012) Automatic taxonomy construction from keywords. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1433–1441. ACM
Mao Y, Ren X, Shen J, Gu X, Han J (2018) End-to-end reinforcement learning for automatic taxonomy induction. arXiv preprint arXiv:1805.04044
Sánchez D, Moreno A (2004) Automatic generation of taxonomies from the www. In: International Conference on Practical Aspects of Knowledge Management, pp 208–219. Springer
Costa E, Lorena A, Carvalho ACPLF, Freitas A (2007) A review of performance evaluation measures for hierarchical classifiers. In: Evaluation methods for machine learning II: Papers from the AAAI-2007 workshop, pp 1–6
Sritha S, Mathumathi B (2016) A survey on various approaches for taxonomy construction. Indian J Innov Dev 5:6
Burred JJ, Lerch A (2003) A hierarchical approach to automatic musical genre classification. In: Proceedings of the 6th international conference on digital audio effects, pp 8–11. Citeseer
Li T, Ogihara M (2005) Music genre classification with taxonomy. In: Proceedings.(ICASSP’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005, vol 5, pp v–197. IEEE
Brecheisen S, Kriegel H-P, Kunath P, Pryakhin A (2006) Hierarchical genre classification for large music collections. In: 2006 IEEE international conference on multimedia and expo, pp 1385–1388. IEEE
Silla JCN, Freitas AA, et al (2009) Novel top-down approaches for hierarchical classification and their application to automatic music genre classification. In: SMC, pp 3499–3504
Zhang L , Liu S, Pan Y, Yang L (2004) Infoanalyzer: a computer-aided tool for building enterprise taxonomies. In: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pp 477–483. ACM
Gates SC, Teiken W, Cheng K-SF (2005) Taxonomies by the numbers: building high-performance taxonomies. In: Proceedings of the 14th ACM international conference on Information and knowledge management, pp 568–577. ACM
Picca D, Popescu A (2007) Using wikipedia and supersense tagging for semi-automatic complex taxonomy construction. In: Computer aided language processing workshop 2007, Wolverhampton
Pachet F, Cazaly D (2000) A taxonomy of musical genres. In: Content-Based Multimedia Information Access-Volume 2, pp 1238–1245. LE CENTRE DE HAUTES ETUDES INTERNATIONALES D’INFORMATIQUE DOCUMENTAIRE
Sasirekha K, Baby P (2013) Agglomerative hierarchical clustering algorithm-a. Int J Sci Res Publ 83:83
Li T, Anand SS (2007) Diva: a variance-based clustering approach for multi-type relational data. In: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp 147–156. ACM
Karypis G, Kumar V (1998) Multilevel algorithms for multi-constraint graph partitioning. In: IEEE/ACM Conference on Supercomputing, 1998, SC98, pp 28–28. IEEE
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392
Panchenko A, Faralli S, Ruppert E, Remus S, Naets H, Fairon C, Ponzetto SP, Biemann C (2016) Taxi at semeval-2016 task 13: a taxonomy induction method based on lexico-syntactic patterns, substrings and focused crawling. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp 1320–1327, 2016
Bansal M, Burkett D, De MG, Klein D (2014) Structured learning for taxonomy induction with belief propagation. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1041–1051
Tan P-N, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 32–41. ACM
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th international joint conference on artificial intelligence (IJCAI-93), Chambèry, pp 1022–1027
Richard CD, Anil KJ (1988) Algorithms for clustering data. Prentice Hall, NJ
Thair NP (2009) Survey of classification techniques in data mining. Proc Int MultiConf Eng Comput Sci 1:18–20
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001) Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In: ICCCN, pp 0215. IEEE
Lesh N, Zaki MJ, Oglhara M (2000) Scalable feature mining for sequential data. IEEE Intell Syst Appl 15(2):48–56
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Lidy TRA (2005) Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. In: ISMIR, pp 34–41
Bahlmann C, Burkhardt H (2001) Measuring hmm similarity with the bayes probability of error and its application to online handwriting recognition. In: ICDAR, p 0406. IEEE
Chen L, Man H (2005) Fast schemes for computing similarities between gaussian hmms and their applications in texture image classification. EURASIP J Adv Signal Process 2005(13):164742
Falkhausen M, Reininger H, Wolf D (1995) Calculation of distance measures between hidden Markov models. In: Fourth European Conference on Speech Communication and Technology
Lyngso RB, Pedersen CN, Nielsen H (1999) Metrics and similarity measures for hidden Markov models. In: Proc Int Conf Intell Syst Mol Biol, pp 178–186
Sahraeian SME, Yoon B-J (2011) A novel low-complexity hmm similarity measure. IEEE Signal Process Lett 18(2):87–90
Do MN (2003) Fast approximation of kullback-leibler distance for dependence trees and hidden Markov models. IEEE Signal Process Lett 10(4):115–118
Silva J, Narayanan S (2008) Upper bound kullback-leibler divergence for transient hidden Markov models. IEEE Trans Signal Process 56(9):4176–4188
Zeng J, Duan J, Chengrong W (2010) A new distance measure for hidden Markov models. Expert Syst Appl 37(2):1550–1555
Tan P-N, Steinbach M, Kumar V (2016) Introduction to data mining. Pearson Education India
Iloga S, Romain O, Bendaouia L, Tchuente M (2014) Musical genres classification using Markov models. In: 2014 international conference on audio, language and image processing (ICALIP), pp 701–705. IEEE
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Iloga, S., Romain, O. & Tchuenté, M. An efficient generic approach for automatic taxonomy generation using HMMs. Pattern Anal Applic 24, 243–262 (2021). https://doi.org/10.1007/s10044-020-00918-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-020-00918-0