A Novel Clustering Approach Using Hadoop Distributed Environment

Vadaparthi, Nagesh; Srinivas Rao, P.; Srinivas, Y.; Athmaja, M.

doi:10.1007/978-981-287-338-5_9

Nagesh Vadaparthi⁵,
P. Srinivas Rao⁵,
Y. Srinivas⁶ &
…
M. Athmaja⁷

Part of the book series: SpringerBriefs in Applied Sciences and Technology ((BRIEFSFOMEBI))

1010 Accesses
1 Citations

Abstract

Nowadays, information retrieval plays a vital role by allowing users to retrieve documents of their interest based on relevance score. Such systems can be implemented either in distributed systems or parallel systems to achieve high throughput. If such kind of framework is deployed in a cloud, grouping of relevant documents is essential to retrieve documents of interest. Hence, an efficient and scalable clustering is required to process huge volume of documents. To handle huge documents and to provide scalability while processing Apache Hadoop is efficient with its powerful feature map reduce. Hence, in this paper, a novel approach is proposed that is capable of clustering bulk data with high throughput. This paper also demonstrates the need of parallel caching approach for obtaining effective results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lynch C (2008) Big data: how do your data grow? Nature 455(7209):28–29
Article Google Scholar
Ye K et al (2012) vHadoop: a scalable hadoop virtual cluster platform for mapreduce-based parallel machine learning with performance consideration. In: IEEE international conference on cluster computing workshops, pp 152–160
Google Scholar
Dean J et al (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
White T (2010) Hadoop: the definitive guide. Yahoo Press
Google Scholar
Vadaparthi Nagesh et al (2011) Segmentation of brain MR images based on finite skew gaussian mixture model with fuzzy C-Means clustering and -EM algorithm. Int J Comput Appl 28(10):18–26
Google Scholar
Sabena S et al (2011) Image retrieval using canopy and improved K mean clustering. In: International conference on emerging technology trends (ICETT) 2011, pp 15–19
Google Scholar
McCallum A et al (2011) Efficient clustering of high-dimensional data sets with application to reference matching. White papers
Google Scholar
Bradley PS et al (1998) Scaling clustering algorithms to large databases. In: Proceeding of 4th international conference on knowledge discovery and data mining (KDD-98). AAAI Press, Menlo Park
Google Scholar

Download references

Author information

Authors and Affiliations

MVGR College of Engineering, Vizianagaram, India
Nagesh Vadaparthi & P. Srinivas Rao
GIT, GITAM University, Visakhapatnam, India
Y. Srinivas
Tata Consultancy Services, Hyderabad, India
M. Athmaja

Authors

Nagesh Vadaparthi
View author publications
You can also search for this author in PubMed Google Scholar
P. Srinivas Rao
View author publications
You can also search for this author in PubMed Google Scholar
Y. Srinivas
View author publications
You can also search for this author in PubMed Google Scholar
M. Athmaja
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nagesh Vadaparthi .

Editor information

Editors and Affiliations

C.R. Rao Advn Inst of Mat, Stat and Comp Sci, Hyderabad, India
Naresh Babu Muppalaneni
Annamacharya Inst. of Tech and Sci, Kadapa, India
Vinit Kumar Gunjan

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vadaparthi, N., Srinivas Rao, P., Srinivas, Y., Athmaja, M. (2015). A Novel Clustering Approach Using Hadoop Distributed Environment. In: Muppalaneni, N., Gunjan, V. (eds) Computational Intelligence Techniques for Comparative Genomics. SpringerBriefs in Applied Sciences and Technology(). Springer, Singapore. https://doi.org/10.1007/978-981-287-338-5_9

Download citation

DOI: https://doi.org/10.1007/978-981-287-338-5_9
Published: 02 December 2014
Publisher Name: Springer, Singapore
Print ISBN: 978-981-287-337-8
Online ISBN: 978-981-287-338-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics