Advertisement

An Incremental Density-Based Clustering Technique for Large Datasets

  • Saif ur Rehman
  • Muhammed Naeem Ahmed Khan
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 85)

Abstract

Data mining, also known as knowledge discovery in databases, is a statistical analysis technique used to find hidden patterns and identify untapped value in large datasets. Clustering is a principal data discovery technique in data mining that segregates a dataset into subsets or clusters so that data values in the same cluster have some common characteristics or attributes. A number of clustering techniques have been proposed in the past by many researchers that can identify arbitrary shaped cluster; where a cluster is defined as a dense region separated by the low-density regions and among them DBSCAN is a prime density-based clustering algorithm. DBSCAN is capable of discovering clusters of any arbitrary shape and size in databases which even include noise and outliers. Many researchers have attempted to overcome certain deficiencies in the original DBSCAN like identifying patterns within datasets of varied densities and its high computational complexity; hence a number of augmented forms of DBSCAN algorithm are available. We present an incremental density-based clustering technique which is based on the fundamental DBSCAN clustering algorithm to enhance its computational complexity. Our proposed algorithm can be used in different knowledge domains like image processing, classification of patterns in GIS maps, x-ray crystallography and information security.

Keywords

Clustering Techniques DBSCAN Data Mining Statistical Analysis Knowledge Discovery in Databases 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fahim, A.M., Salem, A.M., Torkey, F.A., Ramadan, M.A.: Density Clustering Based on Radius of Data (DCBRD). World Academy of Science, Engineering and Technology (2006)Google Scholar
  2. 2.
    El-Sonbaty, Y., Ismail, M.A., Farouk, M.: An Efficient Density Based Clustering Algorithm for Large Databases. In: Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence (2004)Google Scholar
  3. 3.
    Raymond, T.N., Han, J.: Efficient and Effective Clustering Method for Spatial Data Mining. In: Proceeding of the International Conference on Very Large Data Bases, Santiago, Chile, pp. 144–155 (1994)Google Scholar
  4. 4.
    Sudipto, G., Rastogi, R., Shim, K.: CURE: An Efficient Clustering Algorithm for Large Databases. In: Proceeding of the ACM SIGMOD International Conference on Management of Data, Seattle, WA, pp. 73–84 (1998)Google Scholar
  5. 5.
    Karypis, G., Han, E.H., Kumar, V.: Chameleon: Hierarchical Clustering using Dynamic Modeling. IEEE Computer 32, 68–75 (1999)Google Scholar
  6. 6.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proceeding of the ACM SIGMOD International Conference on Management of Data, Montreal, Canada, pp. 94–105 (1998)Google Scholar
  7. 7.
    Cheng, C.H., Fu, A.W., Zhang, Y.: Entropy-Based Subspace Clustering for Mining Numerical Data. In: Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, San Diego, CA, pp. 84–93 (1999)Google Scholar
  8. 8.
    Sheikholeslami, G., Chatterjee, S., Zhang, A.: WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. In: Proceeding of the 24th International Conference on Very Large Databases, San Francisco, CA, pp. 428–439 (1998)Google Scholar
  9. 9.
    Ester, M., Krigel, H.P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, WA, pp. 226–231 (1996)Google Scholar
  10. 10.
    Hinneburg, A., Keim, D.A.: An Efficient Approach to Clustering in Multimedia Databases with Noise. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, New York, pp. 58–65 (1998)Google Scholar
  11. 11.
    Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: PTICS: Ordering Points to Identify the Clustering Structure. In: Proceeding of the ACM SIGMOD 1999 International Conference on Management of Data, Philadelphia, PA, pp. 49–60 (1999)Google Scholar
  12. 12.
    Fisher, D.H.: Knowledge Acquisition via Incremental Conceptual Clustering. Machine Learning 2, 139–172 (1987)Google Scholar
  13. 13.
    Kohonen, T.: Self-Organization and Associative Memory. Springer, New York (1988)zbMATHGoogle Scholar
  14. 14.
    Liu, P., Zhou, D., Wu, N.: VDBSCAN:Varied Density Based Spatial Clustering of Applications with Noise. In: Proceedings of IEEE International Conference on Service Systems and Service Management, Chengdu, China, pp. 1–4 (2007)Google Scholar
  15. 15.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Chichester (1990)Google Scholar
  16. 16.
    Liu, B.: A Fast Density-Based Clustering Algorithm For Large Databases. In: Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, pp. 13–16 (2006)Google Scholar
  17. 17.
    Ram, A., Sharma, A., Jalall, A.S., Singh, R., Agrawal, A.: An Enhanced Density Based Spatial Clustering of Applications with Noise. In: IEEE International Advance Computing Conference (IACC 2009), Patiala, India (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Saif ur Rehman
    • 1
  • Muhammed Naeem Ahmed Khan
    • 1
  1. 1.Department of Computer ScienceSZABISTIslamabadPakistan

Personalised recommendations