Abstract
The importance of hierarchical clustering in data analytics is escalating because of the exponential growth of digital content. Often, these digital contents are unorganized, and there is limited preliminary field knowledge available. One of the challenges in organizing these huge digital contents is the computational complexity involved. Aiming in this direction, we have proposed an efficient approach whose aim is to improve the efficiency of traditional agglomerative hierarchical clustering method that is used to organize the data. This is done by making use of disjoint-set data structure and a variation of Kruskal’s algorithm for minimum spanning trees. The disjoint sets represent the clusters, and the elements inside the sets are the records. This representation makes it easy to efficiently merge two clusters and to easily locate the records in any cluster. For evaluating this approach, the algorithm is tested on a sample input of 50,000 records of unorganized e-books. The experimental results of the proposed approach show that e-resources can be efficiently clustered without compromising the clustering performance.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Everitt, B., Landau, S., Leese, M.: Cluster Analysis. ser. Hodder Arnold Publication. Wiley (2001). https://books.google.co.in/books?id=htZzDGlCnQYC
Gan, G., Ma, C., Wu, J.: Data clustering: theory, algorithms, and applications. SIAM (2007)
Calandriello, D., Niu, G., Sugiyama, M.: Semi-supervised information-maximization clustering. Neural Netw. 57, 103–111 (2014)
Baghshah, M.S., Afsari, F., Shouraki, S.B., Eslami, E.: Scalable semi-supervised clustering by spectral kernel learning. Pattern Recogn. Lett. 45, 161–171 (2014)
Roul, R.K., Nanda, A., Patel, V., Sahay, S.K.: Extreme learning machines in the field of text classification. In: 2015 16th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 1–7. IEEE (2015)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Roul, R.K., Varshneya, S., Kalra, A., Sahay, S.K.: A novel modified apriori approach for web document clustering. In: Computational Intelligence in Data Mining-Volume 3, pp. 159–171. Springer (2015)
Murtagh, F., Contreras, P.: Algorithms for hierarchical clustering: an overview. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 2(1), 86–97 (2012)
Sibson, R.: Slink: an optimally efficient algorithm for the single-link cluster method. Comput. J. 16(1), 30–34 (1973)
Gower, J.C., Ross, G.: Minimum spanning trees and single linkage cluster analysis. Appl. Stat. 54–64 (1969)
Prim, R.C.: Shortest connection networks and some generalizations. Bell Labs Tech. J. 36(6), 1389–1401 (1957)
Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7(1), 48–50 (1956)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Agarwal, A., Roul, R.K. (2018). A Novel Hierarchical Clustering Algorithm for Online Resources. In: Sa, P., Bakshi, S., Hatzilygeroudis, I., Sahoo, M. (eds) Recent Findings in Intelligent Computing Techniques . Advances in Intelligent Systems and Computing, vol 708. Springer, Singapore. https://doi.org/10.1007/978-981-10-8636-6_49
Download citation
DOI: https://doi.org/10.1007/978-981-10-8636-6_49
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8635-9
Online ISBN: 978-981-10-8636-6
eBook Packages: EngineeringEngineering (R0)