An effective density based approach to detect complex data clusters using notion of neighborhood difference

Nagaraju, S.; Kashyap, Manish; Bhattachraya, Mahua

doi:10.1007/s11633-016-1038-7

An effective density based approach to detect complex data clusters using notion of neighborhood difference

Research Article
Published: 29 December 2016

Volume 14, pages 57–67, (2017)
Cite this article

International Journal of Automation and Computing Aims and scope Submit manuscript

S. Nagaraju¹,
Manish Kashyap¹ &
Mahua Bhattachraya¹

178 Accesses
6 Citations
7 Altmetric
Explore all metrics

Abstract

The density based notion for clustering approach is used widely due to its easy implementation and ability to detect arbitrary shaped clusters in the presence of noisy data points without requiring prior knowledge of the number of clusters to be identified. Density-based spatial clustering of applications with noise (DBSCAN) is the first algorithm proposed in the literature that uses density based notion for cluster detection. Since most of the real data set, today contains feature space of adjacent nested clusters, clearly DBSCAN is not suitable to detect variable adjacent density clusters due to the use of global density parameter neighborhood radius N _rad and minimum number of points in neighborhood N _pts. So the efficiency of DBSCAN depends on these initial parameter settings, for DBSCAN to work properly, the neighborhood radius must be less than the distance between two clusters otherwise algorithm merges two clusters and detects them as a single cluster. Through this paper: 1) We have proposed improved version of DBSCAN algorithm to detect clusters of varying density adjacent clusters by using the concept of neighborhood difference and using the notion of density based approach without introducing much additional computational complexity to original DBSCAN algorithm. 2) We validated our experimental results using one of our authors recently proposed space density indexing (SDI) internal cluster measure to demonstrate the quality of proposed clustering method. Also our experimental results suggested that proposed method is effective in detecting variable density adjacent nested clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Approaches for Density-Based Spatial Clustering of Applications with Noise

A Novel Approach to Determining the Radius of the Neighborhood Required for the DBSCAN Algorithm

A Clustering Algorithm for Multi-density Datasets

References

J. W. Han, M. Kamber, J. Pei. Data Mining: Concepts and Techniques, Amsterdam, Netherlands: Elsevier, 2011.
MATH Google Scholar
A. K. Jain. Data clustering: 50 years beyond K-means. Pattern Recognition Letters, vol. 31, no. 8, pp. 651–666, 2010.
Article Google Scholar
P. Berkhin. A survey of clustering data mining techniques. Grouping Multidimensional Data, J. Kogan, C. Nicholas, M. Teboulle, Eds., Berlin Heidelberg, Germany: Springer, pp. 25–71, 2006.
Chapter Google Scholar
R. Xu, D. Wunsch. Survey of clustering algorithms. IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 645–678, 2005.
Article Google Scholar
A. K. Jain, M. N. Murty, P. J. Flynn. Data clustering: A review. ACM Computing Surveys (CSUR), vol. 31, no. 3, pp. 264–323, 1999.
Article Google Scholar
R. Sibson. SLINK: An optimally efficient algorithm for the single-link cluster method. The Computer Journal, vol. 16, no. 1, pp. 30–34, 1973.
Article MathSciNet Google Scholar
D. Defays. An efficient algorithm for a complete link method. The Computer Journal, vol. 20, no. 4, pp. 364–366, 1977.
Article MathSciNet MATH Google Scholar
G. Karypis, E. H. Han, V. Kumar. Chameleon: Hierarchical clustering using dynamic modeling. Computer, vol. 32, no. 8, pp. 68–75, 1999.
Article Google Scholar
T. Zhang, R. Ramakrishnan, M. Livny. BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, vol. 1, no. 2, pp. 141–182, 1997.
Article Google Scholar
S. Ramaswamy, R. Rastogi, K. Shim. Efficient algorithms for mining outliers from large data sets. ACM SIGMOD Record, vol. 29, no. 2, pp. 427–438, 2000.
Article Google Scholar
J. A. Hartigan, M. A. Wong. Algorithm AS 136: A K-Means clustering algorithm. Applied Statistics, vol. 28, no. 1, pp. 100–108, 1979.
Article MATH Google Scholar
T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, A. Y. Wu. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881–892, 2002.
Article MATH Google Scholar
H. S. Park, C. H. Jun. A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications, vol. 36, no. 2, pp. 3336–3341, 2009.
Article Google Scholar
J. C. Bezdek, R. Ehrlich, W. Full. FCM: The fuzz c-means clustering algorithm. Computers & Geosciences, vol. 10, no. 2–3, pp. 191–203, 1984.
Article Google Scholar
S. Krinidis, V. Chatzis. A robust fuzzy local information C-means clustering algorithm. IEEE Transactions on Image Processing, vol. 19, no. 5, pp. 1328–1337, 2010.
Article MathSciNet Google Scholar
M. Ghanavati, R. K. Wong, S. Fong, M. R. Gholamian. Extending the grenade explosion approach for effective clustering. In Proceedings of the 10th International Conference on Digital Information Management, IEEE, Jeju, South Korea, pp. 28–35, 2015.
Google Scholar
K. J. Ahn, G. Cormode, S. Guha, A. McGregor, A. Wirth. Correlation clustering in data streams. In Proceedings of the 32nd International Conference on Machine Learning, JMLR, Lille, France, vol. 37, pp. 2237–2246, 2015.
Google Scholar
A. B. S. Serapiö, G. S. Corra, F. B. Gonalves, V. O. Carvalho. Combining K-means and K-harmonic with fish school search algorithm for data clustering task on graphics processing units. Applied Soft Computing, vol. 41, pp. 290–304, 2016.
Article Google Scholar
M. Ester, H. Kriegel, J. Sander, X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of International Conference on Knowledge Discovery and Data Mining, Portland, USA, pp. 226–231, 1996.
Google Scholar
S. Roy, D. K. Bhattacharyya. An approach to find embedded clusters using density based techniques. Distributed Computing and Internet Technology, G. Chakraborty, Ed., Berlin Heidelberg, Germany: Springer, vol. 3816, pp. 523–535, 2005.
Google Scholar
M. Kashyap, M. Bhattacharya. A density invariant approach to clustering. Neural Computing and Applications, [Online], Available: http://link.springer.com/article/10.1007/s00521-015-2145-z.
Y. H. Lv, T. H. Ma, M. L. Tang, J. Cao, Y. Tian, A. Al-Dhelaan, M. Al-Rodhaan. An efficient and scalable densitybased clustering algorithm for datasets with complex structures. Neurocomputing, vol. 171, pp. 9–22, 2016.
Article Google Scholar
J. H. Friedman, J. L. Bentley, R. A. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software (TOMS), vol. 3, no. 3, pp. 209–226, 1977.
Article MATH Google Scholar
D. A. White, R. Jain. Similarity indexing with the SStree. In Proceedings of the 12th International Conference on Data Engineering, IEEE, Washington, USA, pp. 516–523, 1996.
Google Scholar
A. Guttman. R-trees: A dynamic index structure for spatial searching. ACM SIGMOD Record, vol. 14, no. 2, pp. 47–57, 1984.
N. Katayama, S. Satoh. The SR-tree: An index structure for high-dimensional nearest neighbor queries. ACM SIGMOD Record, vol. 26, no. 2, pp. 369–380, 1997.
Article Google Scholar
S. Blott, R. Weber. A Simple Vector-approximation File for Similarity Search in High-dimensional Vector Spaces, ESPRIT Technical Report TR19, CA, USA, 1997.
Google Scholar
H. V. Jagadish, B. C. Ooi, K. L. Tan, C. Yu, R. Zhang. iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Transactions on Database Systems (TODS), vol. 30, no. 2, pp. 364–397, 2005.
C. Yu, B. C. Ooi, K. L. Tan, H. V. Jagadish. Indexing the distance: An efficient method to KNN processing. In Proceedings of the 27th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc. San Francisco, USA, vol. 1, pp. 421–430, 2001.
Google Scholar
Z. Liu, C. K.Wang, P. Zou, W. Zheng, J. M.Wang. iPoc: A polar coordinate based indexing method for nearest neighbor search in high dimensional space. Web-Age Information Management, L. Chen, C. J. Tang, J. Yang, Y. J. Gao, Eds., Berlin Heidelberg, Germany: Springer, vol. 6184, pp. 345–356, 2010.
Y. P.Wu, J. J. Guo, X. J. Zhang. A linear DBSCAN algorithm based on LSH. In Proceedings of 2007 International Conference on Machine Learning and Cybernetics, IEEE, Hong Kong, China, vol. 5, pp. 2608–2614, 2007.
Article Google Scholar
X. W. Xu, J. Jäger, H. P. Kriegel. A fast parallel clustering algorithm for large spatial databases. High Performance Data Mining, Y. K. Guo, R. Grossman, Eds., New York, USA: Springer, pp. 263–290, 2002.
Chapter Google Scholar
D. Arlia, M. Coppola. Experiments in parallel clustering with DBSCAN. Euro-Par 2001 Parallel Processing, R. Sakellariou, J. Gurd, L. Freeman, J. Keane, Eds., Berlin Heidelberg, Germany: Springer, vol. 2150, pp. 326–331, 2001.
Chapter Google Scholar
R. J. Thapa, C. Trefftz, G. Wolffe. Memory-efficient implementation of a graphics processor-based cluster detection algorithm for large spatial databases. In Proceedings of 2010 IEEE International Conference on Electro/Information Technology, IEEE, Normal, USA, pp. 1–5, 2010.
Chapter Google Scholar
B. R. Dai, I. C. Lin. Efficient map/reduce-based DBSCAN algorithm with optimized data partition. In Proceedings of the 5th International Conference on Cloud Computing, IEEE, Honolulu, USA, pp. 59–66, 2012.
Google Scholar
M. Chen, X. D. Gao, H. F. Li. Parallel DBSCAN with priority R-tree. In Proceedings of the 2nd IEEE International Conference on Information Management and Engineering, IEEE, Chengdu, China, pp. 508–511, 2010.
Google Scholar
M. A. Patwary, D. Palsetia, A. Agrawal, W. K. Liao, F. Manne, A. Choudhary. A new scalable parallel DBSCAN algorithm using the disjoint-set data structure. In Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis, IEEE, Los Alamitos, USA, 2012.
Google Scholar
C. Böhm, R. Noll, C. Plant, B. Wackersreuther. Densitybased clustering using graphics processors. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, ACM, Hong Kong, China, pp. 661–670, 2009.
Google Scholar
M. Dash, H. Liu, X. W. Xu. “1+1 > 2”: Merging distance and density based clustering. In Proceedings of the 7th International Conference on Database Systems for Advanced Applications, IEEE, Hong Kong, China, pp. 32–39, 2001.
Google Scholar
S. Nagaraju, M. Kashyap, M. Bhattacharya. A variant of DBSCAN algorithm to find embedded and nested adjacent clusters. In Proceedings of the 3rd Signal Processing and Integrated Networks International Conference, IEEE, Noida, India, 2016.
Google Scholar

Download references

Author information

Authors and Affiliations

Visual Information Processing Lab, Indian Institute of Information Technology and Management, Gwalior, India
S. Nagaraju, Manish Kashyap & Mahua Bhattachraya

Authors

S. Nagaraju
View author publications
You can also search for this author in PubMed Google Scholar
Manish Kashyap
View author publications
You can also search for this author in PubMed Google Scholar
Mahua Bhattachraya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Nagaraju.

Additional information

Recommended by Guest Editor Dongbing Gu

S. Nagaraju received the B.Tech. degree in electronics and communication engineering from Maharaja Institute of Technology, India in 2012. He is a M.Tech. student of CSE Department at the Indian Institute of Information Technology and Management, Gwalior, India.

His research interests include image processing, computer vision and computer graphics.

ORCID iD: 0000

Manish Kashyap received the B.Tech. and M.Tech. degrees in electronics and communication engineering from Jaypee Institute of Information Technology University, India in 2009 and 2013, respectively. He is a research scholar in the field of medical image processing at the Indian Institute of Information Tehnology and Management, India.

His research interests include image processing, computer vision, computer graphics, soft computing and neural networks.

Mahua Bhattacharya received the B.Tech. and M.Tech. degrees from the Institute of Radio Physics and Electronics, University of Calcutta, India, the Ph. D. degree in medical image processing from University of Calcutta, India in 2001. Her area of specialization is based in medical image processing, pattern recognition, computer vision, soft computing. She is an associate professor since December 2006 of ABV Indian Institute of Information Technology and Management, an Ministry of Human Resource and Development (MHRD) Institute of Government of India. She worked as a research scientist at Indian Statistical Institute, Calcutta from 1995 till 2000. She was recipient of Frank George award for her paper entitled - Cybernetic Approach To Medical Technology: Application To Cancer Screening And Other Diagnostics, WOSC - The World Organization of Systems, Cybernetics, UK. She has published more than 130 papers in international journals and conference proceedings and book chapters. She has produced more than 100 M.Tech scholars and 6 Ph.D. scholars. She has been invited speaker in different international and national forums and serves as program chairs, session chairs and advisory technical committees and workshop organizer of International Conferences. She is reviewers of IEEE, Elsevier, Springer and Wiley journals. She is also principal investigator of various government sponsored research projects. Recently, she has been elected as president of International Neural Network Society, India Chapter and has been included as an editorial board member of the journal Neural Computing and Applications, Springer.

Her research interests includes application of computer vision in agriculture and study of cell or tissue morphology and the deformation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nagaraju, S., Kashyap, M. & Bhattachraya, M. An effective density based approach to detect complex data clusters using notion of neighborhood difference. Int. J. Autom. Comput. 14, 57–67 (2017). https://doi.org/10.1007/s11633-016-1038-7

Download citation

Received: 16 January 2016
Accepted: 16 May 2016
Published: 29 December 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s11633-016-1038-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An effective density based approach to detect complex data clusters using notion of neighborhood difference

Abstract

Access this article

Similar content being viewed by others

Efficient Approaches for Density-Based Spatial Clustering of Applications with Noise

A Novel Approach to Determining the Radius of the Neighborhood Required for the DBSCAN Algorithm

A Clustering Algorithm for Multi-density Datasets

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An effective density based approach to detect complex data clusters using notion of neighborhood difference

Abstract

Access this article

Similar content being viewed by others

Efficient Approaches for Density-Based Spatial Clustering of Applications with Noise

A Novel Approach to Determining the Radius of the Neighborhood Required for the DBSCAN Algorithm

A Clustering Algorithm for Multi-density Datasets

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation