TIE algorithm: a layer over clustering-based taxonomy generation for handling evolving data

Irfan, Rabia; Khan, Sharifullah; Rajpoot, Kashif; Qamar, Ali Mustafa

doi:10.1631/FITEE.1700517

TIE algorithm: a layer over clustering-based taxonomy generation for handling evolving data

Published: 07 August 2018

Volume 19, pages 763–782, (2018)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Rabia Irfan ORCID: orcid.org/0000-0002-7789-5338¹,
Sharifullah Khan¹,
Kashif Rajpoot^1,2 &
…
Ali Mustafa Qamar^1,3

78 Accesses
1 Citation
Explore all metrics

Abstract

Taxonomy is generated to effectively organize and access large volume of data. A taxonomy is a way of representing concepts that exist in data. It needs to continuously evolve to reflect changes in data. Existing automatic taxonomy generation techniques do not handle the evolution of data; therefore, the generated taxonomies do not truly represent the data. The evolution of data can be handled by either regenerating taxonomy from scratch, or allowing taxonomy to incrementally evolve whenever changes occur in the data. The former approach is not economical in terms of time and resources. A taxonomy incremental evolution (TIE) algorithm, as proposed, is a novel attempt to handle the data that evolve in time. It serves as a layer over an existing clustering-based taxonomy generation technique and allows an existing taxonomy to incrementally evolve. The algorithm was evaluated in research articles selected from the computing domain. It was found that the taxonomy using the algorithm that evolved with data needed considerably shorter time, and had better quality per unit time as compared to the taxonomy regenerated from scratch.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting Smooth Cluster Changes in Evolving Graph Structures

Pragmatic Ontology Evolution: Reconciling User Requirements and Application Performance

A mutual information based online evolving clustering approach and its applications

Article 15 July 2017

References

Baeza-Yates R, Ribeiro-Neto B, 2011. Modern Information Retrieval: the Concepts and Technology Behind (2^nd Ed.). Pearson Education Limited, New York, USA.
Google Scholar
Blumberg R, Atre S, 2003. The problem with unstructured data. DM Rev, 13(2):42–46.
Google Scholar
Camiña SL, 2010. A Comparison of Taxonomy Generation Techniques Using Bibliometric Methods: Applied to Research Strategy Formulation. MS Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA.
Google Scholar
Carmel D, Roitman H, Zwerdling N, 2009. Enhancing cluster labeling using Wikipedia. Proc 32^nd Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.139–146. https://doi.org/10.1145/1571941.1571967
Google Scholar
Cha SH, 2007. Comprehensive survey on distance/similarity measures between probability density functions. Int J Math Models Methods Appl Sci, 1(4):300–307.
MathSciNet Google Scholar
Cimiano P, Hotho A, Staab S, 2005. Learning concept hierarchies from text corpora using formal concept analysis. J Artif Intell Res, 24(1):305–339.
Article MATH Google Scholar
Dawelbait G, Mezher T, Woon WL, et al., 2010. Taxonomy based trend discovery of renewable energy technologies in desalination and power generation. Proc Technology Management for Global Economic Growth, p.1–8.
Google Scholar
Deerwester S, Dumais ST, Furnas GW, et al., 1990. Indexing by latent semantic analysis. J Am Soc Inform Sci Technol, 41(6):391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
Article Google Scholar
Dietz EA, Vandic D, Frasincar F, 2012. TaxoLearn: a semantic approach to domain taxonomy learning. Proc IEEE/ WIC/ACM Int Conf on Web Intelligence and Intelligent Agent Technology, p.58–65. https://doi.org/10.1109/WI-IAT.2012.129 Enhanced Taxonomy Generation. USA Patent 20 100 274 733.
Google Scholar
Fountain T, Lapata M, 2012. Taxonomy induction using hierarchical random graphs. Proc Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.466–476.
Google Scholar
Glover E, Pennock DM, Lawrence S, et al., 2002. Inferring hierarchical descriptions. Proc 11^th Int Conf on Information and Knowledge Management, p.507–514. https://doi.org/10.1145/584792.584876
Google Scholar
Hedden H, 2010. The Accidental Taxonomist. Information Today Inc., Medford, New Jersey, USA, p.18–28.
Google Scholar
Irfan R, Khan S, 2016. TIE: an algorithm for incrementally evolving taxonomy for text data. Proc 15^th IEEE Int Conf on Machine Learning and Applications, p.687–692. https://doi.org/10.1109/ICMLA.2016.0121
Google Scholar
Jain AK, Murty MN, Flynn PJ, 1999. Data clustering: a review. ACM Comput Surv, 31(3):264–323. https://doi.org/10.1145/331499.331504
Article Google Scholar
Kashyap V, Ramakrishnan C, Thomas C, et al., 2005. TaxaMiner: an experimentation framework for automated taxonomy bootstrapping. Int J Web Grid Serv, 1(2): 240–266. https://doi.org/10.1504/IJWGS.2005.008322
Article Google Scholar
Koff W, Gustafson P, 2011. Data Revolution. Technical Report, Computer Sciences Corporation Leading Edge Forum.
Google Scholar
Kumar AA, Chandrasekhar S, 2012. Text data pre-processing and dimensionality reduction techniques for document clustering. Int J Eng Res Technol, 1(5):1–6.
Google Scholar
Lefever E, 2015. LT3: a multi-modular approach to automatic taxonomy construction. Proc 9^th Int Workshop on Semantic Evaluation, p.944–948.
Chapter Google Scholar
Li T, Anand SS, 2009. Exploiting domain knowledge by automated taxonomy generation in recommender systems. Proc 10^th Int Conf on E-commerce and Web Technologies, p.120–131.
Chapter Google Scholar
Manning CD, Raghavan P, Schütze H, 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA.
Book MATH Google Scholar
Marcacini RM, Rezende SO, 2010. Incremental construction of topic hierarchies using hierarchical term clustering. Proc 22^nd Int Conf on Software Engineering and Knowledge Engineering, p.553–558.
Google Scholar
Medelyan O, Manion S, Broekstra J, et al., 2013. Constructing a focused taxonomy from a document collection. Proc 10^th Int Conf on the Semantic Web: Semantics and Big Data, p.367–381. https://doi.org/10.1007/978-3-642-38288-8_25
Chapter Google Scholar
Meijer K, Frasincar F, Hogenboom F, 2014. A semantic approach for extracting domain taxonomies from text. Dec Support Syst, 62:78–93. https://doi.org/10.1016/j.dss.2014.03.006
Article Google Scholar
Muller A, Dorre J, Gerstl P, et al., 1999. The TaxGen framework: automating the generation of a taxonomy for a large document collection. Proc 32^nd Annual Hawaii Int Conf on Systems Sciences, Article 2034.
Google Scholar
Nadkarni PM, Ohno-Machado L, Chapman WW, 2011. Natural language processing: an introduction. J Am Med Inform Assoc, 18(5):544–551. https://doi.org/10.1136/amiajnl-2011-000464
Article Google Scholar
Neshati M, Alijamaat A, Abolhassani H, et al., 2007. Taxonomy learning using compound similarity measure. Proc IEEE/WIC/ACM Int Conf on Web Intelligence, p.487–490. https://doi.org/10.1109/WI.2007.135
Chapter Google Scholar
Paukkeri MS, García-Plaza AP, Fresno V, et al., 2012. Learning a taxonomy from a set of text documents. Appl Soft Comput, 12(3):1138–1148. https://doi.org/10.1016/j.asoc.2011.11.009
Article Google Scholar
Qi XG, Yin DW, Xue ZZ, et al., 2010. Choosing your own adventure: automatic taxonomy generation to permit many paths. Proc 19^th ACM Int Conf on Information and Knowledge Management, p.1853–1856. https://doi.org/10.1145/1871437.1871746
Google Scholar
Sánchez D, Moreno A, 2004. Automatic generation of taxonomies from the WWW. Proc 5^th Int Conf on Practical Aspects of Knowledge Management, p.208–219. https://doi.org/10.1007/978-3-540-30545-3_20
Chapter Google Scholar
Sclano F, Velardi P, 2007. TermExtractor: a web application to learn the common terminology of interest groups and research communities. Proc 3^rd Int Conf on Interoperability for Enterprise Software and Applications p.85–94.
Google Scholar
Spangler WS, Kreulen JT, Newswanger JF, 2006. Machines in the conversation: detecting themes and trends in informal communication streams. IBM Syst J, 45(4):785–799. https://doi.org/10.1147/sj.454.0785
Article Google Scholar
Steinbach M, Karypis G, Kumar V, 2000. A comparison of document clustering techniques. World Text Mining Conf, p.1–2.
Google Scholar
Sujatha R, Krishna Rao BR, 2011. Taxonomy construction techniques—issues and challenges. Ind J Comput Sci Eng, 2(5):661–671.
Google Scholar
Thada V, Jaglan DV, 2013. Comparison of jaccard, dice, cosine similarity coefficient to find best fitness value for Web retrieved documents using genetic algorithm. IntJ Innov Eng Technol, 2(4):202–205.
Google Scholar
Treeratpituk P, Callan J, 2006. Automatically labeling hierarchical clusters. Proc Int Conf on Digital Government Research, p.167–176. https://doi.org/10.1145/1146598.1146650
Google Scholar
Turner V, Gantz J, Reinsel D, 2014. The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things. IDC White Paper, p.1–5. https://doi.org/10.7790/ajtde.v2n3.47
Google Scholar
Velardi P, Faralli S, Navigli R, 2013. OntoLearn reloaded: a graph-based algorithm for taxonomy induction. Comput Ling, 39(3):665–707. https://doi.org/10.1162/COLI_a_00146
Article Google Scholar
Weng SS, Liu CK, 2004. Using text classification and multiple concepts to answer e-mails. Expert Syst Appl, 26(4): 529–543. https://doi.org/10.1016/j.eswa.2003.10.011
Article Google Scholar
Yang HC, Lee CH, Hsiao HW, 2015. Incorporating selforganizing map with text mining techniques for text hierarchy generation. Appl Soft Comput, 34:251–259. https://doi.org/10.1016/j.asoc.2015.05.005
Article Google Scholar
Yao JJ, Cui B, Cong G, et al., 2012. Evolutionary taxonomy construction from dynamic tag space. World Wide Web, 15(5-6):581–602. https://doi.org/10.1007/s11280-011-0150-4
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, National University of Sciences and Technology, Islamabad, 44000, Pakistan
Rabia Irfan, Sharifullah Khan, Kashif Rajpoot & Ali Mustafa Qamar
School of Computer Science, University of Birmingham, Birmingham, B15 2TT, UK
Kashif Rajpoot
Department of Computer Science, College of Computer, Qassim University, Al Mulaida, Buraydah, 52344, Saudi Arabia
Ali Mustafa Qamar

Authors

Rabia Irfan
View author publications
You can also search for this author in PubMed Google Scholar
Sharifullah Khan
View author publications
You can also search for this author in PubMed Google Scholar
Kashif Rajpoot
View author publications
You can also search for this author in PubMed Google Scholar
Ali Mustafa Qamar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rabia Irfan.

Additional information

A preliminary version was presented at the 15^th IEEE International Conference on Machine Learning and Applications, Anaheim, CA, USA, December 18–20, 2016

Rights and permissions

Reprints and permissions

About this article

Cite this article

Irfan, R., Khan, S., Rajpoot, K. et al. TIE algorithm: a layer over clustering-based taxonomy generation for handling evolving data. Frontiers Inf Technol Electronic Eng 19, 763–782 (2018). https://doi.org/10.1631/FITEE.1700517

Download citation

Received: 04 August 2017
Accepted: 03 December 2017
Published: 07 August 2018
Issue Date: June 2018
DOI: https://doi.org/10.1631/FITEE.1700517

Key words

CLC number

TP312

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TIE algorithm: a layer over clustering-based taxonomy generation for handling evolving data

Abstract

Access this article

Similar content being viewed by others

Detecting Smooth Cluster Changes in Evolving Graph Structures

Pragmatic Ontology Evolution: Reconciling User Requirements and Application Performance

A mutual information based online evolving clustering approach and its applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

TIE algorithm: a layer over clustering-based taxonomy generation for handling evolving data

Abstract

Access this article

Similar content being viewed by others

Detecting Smooth Cluster Changes in Evolving Graph Structures

Pragmatic Ontology Evolution: Reconciling User Requirements and Application Performance

A mutual information based online evolving clustering approach and its applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation