Skip to main content
Log in

A new type of distance metric and its use for clustering

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

In order to address high dimensional problems, a new ‘direction-aware’ metric is introduced in this paper. This new distance is a combination of two components: (1) the traditional Euclidean distance and (2) an angular/directional divergence, derived from the cosine similarity. The newly introduced metric combines the advantages of the Euclidean metric and cosine similarity, and is defined over the Euclidean space domain. Thus, it is able to take the advantage from both spaces, while preserving the Euclidean space domain. The direction-aware distance has wide range of applicability and can be used as an alternative distance measure for various traditional clustering approaches to enhance their ability of handling high dimensional problems. A new evolving clustering algorithm using the proposed distance is also proposed in this paper. Numerical examples with benchmark datasets reveal that the direction-aware distance can effectively improve the clustering quality of the k-means algorithm for high dimensional problems and demonstrate the proposed evolving clustering algorithm to be an effective tool for high dimensional data streams processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional space. In: International conference on database theory, pp 420–434

  • Allah FA, Grosky WI, Aboutajdine D (2008) Document clustering based on diffusion maps and a comparison of the k-means performances in various spaces. In: IEEE symposium on computers and communications, pp 579–584

  • Angelov P, Sadeghi-Tehran P, Ramezani R (2014) An approach to automatic real-time novelty detection, object identification, and tracking in video streams based on recursive density estimation and evolving Takagi–Sugeno fuzzy systems. Int J Intell Syst 29(2):1–23

    MATH  Google Scholar 

  • Angelov P, Gu X, Kangin D (2017) Empirical data analytics. Int J Intell Syst. doi:10.1002/int.21899

    Google Scholar 

  • Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is ‘nearest neighbors’ meaningful? In: International conference on database theory, pp 217–235

  • Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat Methods 3(1):1–27

    Article  MathSciNet  MATH  Google Scholar 

  • Callebaut DK (1965) Generalization of the Cauchy–Schwarz inequality. J Math Anal Appl 12(3):491–494

    Article  MathSciNet  MATH  Google Scholar 

  • Cardiotocography Dataset. https://archive.ics.uci.edu/ml/datasets/Cardiotocography. Accessed 19 July 2017

  • Chiu SL (1994) Fuzzy model identification based on cluster estimation. J Intell Fuzzy Syst 2(3):267–278

    Google Scholar 

  • Clustering datasets. http://cs.joensuu.fi/sipu/datasets/. Accessed 19 July 2017

  • Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619

    Article  Google Scholar 

  • Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227

    Article  Google Scholar 

  • Dehak N, Dehak R, Glass J, Reynolds D, Kenny P (2010) Cosine similarity scoring without score normalization techniques. In: Proceeding Odyssey 2010—Speaker Language Recognition Work (Odyssey 2010), pp 71–75

  • Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798

    Article  Google Scholar 

  • Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87

    Article  Google Scholar 

  • Dutta Baruah R, Angelov P (2012) Evolving local means method for clustering of streaming data. In: IEEE international conference fuzzy system, pp 10–15

  • Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Int Conf Knowl Discov Data Min 96:226–231

    Google Scholar 

  • Franti P, Virmajoki O, Hautamaki V (2008) Probabilistic clustering by random swap algorithm. In: IEEE international conference on pattern recognition, pp 1–4

  • Fukunaga K, Hostetler L (1975) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inf Theory 21(1):32–40

    Article  MathSciNet  MATH  Google Scholar 

  • Keller JM, Gray MR (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 15(4):580–585

    Article  Google Scholar 

  • Li J, Ray S, Lindsay BG (2007) A nonparametric statistical approach to clustering via mode identification. J Mach Learn Res 8(8):1687–1723

    MathSciNet  MATH  Google Scholar 

  • Lughofer E, Cernuda C, Kindermann S, Pratama M (2015) Generalized smart evolving fuzzy systems. Evol Syst 6(4):269–292

    Article  Google Scholar 

  • MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: 5th Berkeley symposium mathematical statistics and probability 1967, vol 1, no 233, pp 281–297

  • McCune B, Grace JB, Urban DL (2002) Analysis of ecological communities, vol 28. MJM Software Design, Gleneden Beach

  • McLachlan GJ (1999) Mahalanobis distance. Resonance 4(6):20–26

    Article  Google Scholar 

  • Optical Recognition of Handwritten Digits Dataset. https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits. Accessed 19 July 2017

  • Pen-Based Recognition of Handwritten Digits Dataset. http://archive.ics.uci.edu/ml/datasets/Pen-Based+Recognition+of+Handwritten+Digits. Accessed 19 July 2017

  • Precup RE, Filip HI, Rədac MB, Petriu EM, Preitl S, Dragoş CA (2014) Online identification of evolving Takagi-Sugeno-Kang fuzzy models for crane systems. Appl Soft Comput J 24:1155–1163

    Article  Google Scholar 

  • Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science (80-) 344(6191):1493–1496

    Article  Google Scholar 

  • Rong HJ, Sundararajan N, Bin Huang G, Saratchandran P (2006) Sequential adaptive fuzzy inference system (SAFIS) for nonlinear system identification and prediction. Fuzzy Sets Syst 157(9):1260–1275

    Article  MathSciNet  MATH  Google Scholar 

  • Rong HJ, Sundararajan N, Bin Huang G, Zhao GS (2011) Extended sequential adaptive fuzzy inference system for classification problems. Evol Syst 2(2):71–82

    Article  Google Scholar 

  • Senoussaoui M, Kenny P, Dumouchel P, Stafylakis T (2013) Efficient iterative mean shift based cosine dissimilarity for multi-recording speaker clustering. In: IEEE international conference acoustics speech and signal processing, pp 7712–7715

  • Setlur V, Stone MC (2016) A linguistic approach to categorical color assignment for data visualization. IEEE Trans Vis Comput Graph 22(1):698–707

    Article  Google Scholar 

  • Steel Plates Faults Dataset. https://archive.ics.uci.edu/ml/datasets/Steel+Plates+Faults. Accessed 19 July 2017

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Plamen P. Angelov.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gu, X., Angelov, P.P., Kangin, D. et al. A new type of distance metric and its use for clustering. Evolving Systems 8, 167–177 (2017). https://doi.org/10.1007/s12530-017-9195-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-017-9195-7

Keywords

Navigation