A Novel Hierarchical Clustering Algorithm for Online Resources

Agarwal, Amit; Roul, Rajendra Kumar

doi:10.1007/978-981-10-8636-6_49

A Novel Hierarchical Clustering Algorithm for Online Resources

Amit Agarwal¹⁸ &
Rajendra Kumar Roul¹⁸

Conference paper
First Online: 05 November 2018

687 Accesses
1 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 708))

Abstract

The importance of hierarchical clustering in data analytics is escalating because of the exponential growth of digital content. Often, these digital contents are unorganized, and there is limited preliminary field knowledge available. One of the challenges in organizing these huge digital contents is the computational complexity involved. Aiming in this direction, we have proposed an efficient approach whose aim is to improve the efficiency of traditional agglomerative hierarchical clustering method that is used to organize the data. This is done by making use of disjoint-set data structure and a variation of Kruskal’s algorithm for minimum spanning trees. The disjoint sets represent the clusters, and the elements inside the sets are the records. This representation makes it easy to efficiently merge two clusters and to easily locate the records in any cluster. For evaluating this approach, the algorithm is tested on a sample input of 50,000 records of unorganized e-books. The experimental results of the proposed approach show that e-resources can be efficiently clustered without compromising the clustering performance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://www.nltk.org/.
2.
http://www.gutenberg.org/dirs/.

References

Everitt, B., Landau, S., Leese, M.: Cluster Analysis. ser. Hodder Arnold Publication. Wiley (2001). https://books.google.co.in/books?id=htZzDGlCnQYC
Gan, G., Ma, C., Wu, J.: Data clustering: theory, algorithms, and applications. SIAM (2007)
Google Scholar
Calandriello, D., Niu, G., Sugiyama, M.: Semi-supervised information-maximization clustering. Neural Netw. 57, 103–111 (2014)
Article Google Scholar
Baghshah, M.S., Afsari, F., Shouraki, S.B., Eslami, E.: Scalable semi-supervised clustering by spectral kernel learning. Pattern Recogn. Lett. 45, 161–171 (2014)
Article Google Scholar
Roul, R.K., Nanda, A., Patel, V., Sahay, S.K.: Extreme learning machines in the field of text classification. In: 2015 16th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 1–7. IEEE (2015)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Article Google Scholar
Roul, R.K., Varshneya, S., Kalra, A., Sahay, S.K.: A novel modified apriori approach for web document clustering. In: Computational Intelligence in Data Mining-Volume 3, pp. 159–171. Springer (2015)
Google Scholar
Murtagh, F., Contreras, P.: Algorithms for hierarchical clustering: an overview. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 2(1), 86–97 (2012)
Google Scholar
Sibson, R.: Slink: an optimally efficient algorithm for the single-link cluster method. Comput. J. 16(1), 30–34 (1973)
Article MathSciNet Google Scholar
Gower, J.C., Ross, G.: Minimum spanning trees and single linkage cluster analysis. Appl. Stat. 54–64 (1969)
Article MathSciNet Google Scholar
Prim, R.C.: Shortest connection networks and some generalizations. Bell Labs Tech. J. 36(6), 1389–1401 (1957)
Article Google Scholar
Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7(1), 48–50 (1956)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

BITS-Pilani, K.K. Birla Goa Campus, Pilani, India
Amit Agarwal & Rajendra Kumar Roul

Authors

Amit Agarwal
View author publications
You can also search for this author in PubMed Google Scholar
Rajendra Kumar Roul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amit Agarwal .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Rourkela, Odisha, India
Pankaj Kumar Sa
Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Rourkela, Odisha, India
Sambit Bakshi
Department of Computer Engineering and Informatics, University of Patras, Patras, Greece
Ioannis K. Hatzilygeroudis
Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Rourkela, Odisha, India
Manmath Narayan Sahoo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Agarwal, A., Roul, R.K. (2018). A Novel Hierarchical Clustering Algorithm for Online Resources. In: Sa, P., Bakshi, S., Hatzilygeroudis, I., Sahoo, M. (eds) Recent Findings in Intelligent Computing Techniques . Advances in Intelligent Systems and Computing, vol 708. Springer, Singapore. https://doi.org/10.1007/978-981-10-8636-6_49

Download citation

DOI: https://doi.org/10.1007/978-981-10-8636-6_49
Published: 05 November 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8635-9
Online ISBN: 978-981-10-8636-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics