Skip to main content

A Novel Hierarchical Clustering Algorithm for Online Resources

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 708))

Abstract

The importance of hierarchical clustering in data analytics is escalating because of the exponential growth of digital content. Often, these digital contents are unorganized, and there is limited preliminary field knowledge available. One of the challenges in organizing these huge digital contents is the computational complexity involved. Aiming in this direction, we have proposed an efficient approach whose aim is to improve the efficiency of traditional agglomerative hierarchical clustering method that is used to organize the data. This is done by making use of disjoint-set data structure and a variation of Kruskal’s algorithm for minimum spanning trees. The disjoint sets represent the clusters, and the elements inside the sets are the records. This representation makes it easy to efficiently merge two clusters and to easily locate the records in any cluster. For evaluating this approach, the algorithm is tested on a sample input of 50,000 records of unorganized e-books. The experimental results of the proposed approach show that e-resources can be efficiently clustered without compromising the clustering performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.nltk.org/.

  2. 2.

    http://www.gutenberg.org/dirs/.

References

  1. Everitt, B., Landau, S., Leese, M.: Cluster Analysis. ser. Hodder Arnold Publication. Wiley (2001). https://books.google.co.in/books?id=htZzDGlCnQYC

  2. Gan, G., Ma, C., Wu, J.: Data clustering: theory, algorithms, and applications. SIAM (2007)

    Google Scholar 

  3. Calandriello, D., Niu, G., Sugiyama, M.: Semi-supervised information-maximization clustering. Neural Netw. 57, 103–111 (2014)

    Article  Google Scholar 

  4. Baghshah, M.S., Afsari, F., Shouraki, S.B., Eslami, E.: Scalable semi-supervised clustering by spectral kernel learning. Pattern Recogn. Lett. 45, 161–171 (2014)

    Article  Google Scholar 

  5. Roul, R.K., Nanda, A., Patel, V., Sahay, S.K.: Extreme learning machines in the field of text classification. In: 2015 16th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 1–7. IEEE (2015)

    Google Scholar 

  6. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)

    Article  Google Scholar 

  7. Roul, R.K., Varshneya, S., Kalra, A., Sahay, S.K.: A novel modified apriori approach for web document clustering. In: Computational Intelligence in Data Mining-Volume 3, pp. 159–171. Springer (2015)

    Google Scholar 

  8. Murtagh, F., Contreras, P.: Algorithms for hierarchical clustering: an overview. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 2(1), 86–97 (2012)

    Google Scholar 

  9. Sibson, R.: Slink: an optimally efficient algorithm for the single-link cluster method. Comput. J. 16(1), 30–34 (1973)

    Article  MathSciNet  Google Scholar 

  10. Gower, J.C., Ross, G.: Minimum spanning trees and single linkage cluster analysis. Appl. Stat. 54–64 (1969)

    Article  MathSciNet  Google Scholar 

  11. Prim, R.C.: Shortest connection networks and some generalizations. Bell Labs Tech. J. 36(6), 1389–1401 (1957)

    Article  Google Scholar 

  12. Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7(1), 48–50 (1956)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amit Agarwal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Agarwal, A., Roul, R.K. (2018). A Novel Hierarchical Clustering Algorithm for Online Resources. In: Sa, P., Bakshi, S., Hatzilygeroudis, I., Sahoo, M. (eds) Recent Findings in Intelligent Computing Techniques . Advances in Intelligent Systems and Computing, vol 708. Springer, Singapore. https://doi.org/10.1007/978-981-10-8636-6_49

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8636-6_49

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8635-9

  • Online ISBN: 978-981-10-8636-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics