A new type of distance metric and its use for clustering

Gu, Xiaowei; Angelov, Plamen P.; Kangin, Dmitry; Principe, Jose C.

doi:10.1007/s12530-017-9195-7

A new type of distance metric and its use for clustering

Original Paper
Published: 26 July 2017

Volume 8, pages 167–177, (2017)
Cite this article

Evolving Systems Aims and scope Submit manuscript

Xiaowei Gu ORCID: orcid.org/0000-0001-9116-4761¹,
Plamen P. Angelov¹,
Dmitry Kangin¹ &
…
Jose C. Principe²

818 Accesses
20 Citations
Explore all metrics

Abstract

In order to address high dimensional problems, a new ‘direction-aware’ metric is introduced in this paper. This new distance is a combination of two components: (1) the traditional Euclidean distance and (2) an angular/directional divergence, derived from the cosine similarity. The newly introduced metric combines the advantages of the Euclidean metric and cosine similarity, and is defined over the Euclidean space domain. Thus, it is able to take the advantage from both spaces, while preserving the Euclidean space domain. The direction-aware distance has wide range of applicability and can be used as an alternative distance measure for various traditional clustering approaches to enhance their ability of handling high dimensional problems. A new evolving clustering algorithm using the proposed distance is also proposed in this paper. Numerical examples with benchmark datasets reveal that the direction-aware distance can effectively improve the clustering quality of the k-means algorithm for high dimensional problems and demonstrate the proposed evolving clustering algorithm to be an effective tool for high dimensional data streams processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional space. In: International conference on database theory, pp 420–434
Allah FA, Grosky WI, Aboutajdine D (2008) Document clustering based on diffusion maps and a comparison of the k-means performances in various spaces. In: IEEE symposium on computers and communications, pp 579–584
Angelov P, Sadeghi-Tehran P, Ramezani R (2014) An approach to automatic real-time novelty detection, object identification, and tracking in video streams based on recursive density estimation and evolving Takagi–Sugeno fuzzy systems. Int J Intell Syst 29(2):1–23
MATH Google Scholar
Angelov P, Gu X, Kangin D (2017) Empirical data analytics. Int J Intell Syst. doi:10.1002/int.21899
Google Scholar
Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is ‘nearest neighbors’ meaningful? In: International conference on database theory, pp 217–235
Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat Methods 3(1):1–27
Article MathSciNet MATH Google Scholar
Callebaut DK (1965) Generalization of the Cauchy–Schwarz inequality. J Math Anal Appl 12(3):491–494
Article MathSciNet MATH Google Scholar
Cardiotocography Dataset. https://archive.ics.uci.edu/ml/datasets/Cardiotocography. Accessed 19 July 2017
Chiu SL (1994) Fuzzy model identification based on cluster estimation. J Intell Fuzzy Syst 2(3):267–278
Google Scholar
Clustering datasets. http://cs.joensuu.fi/sipu/datasets/. Accessed 19 July 2017
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
Article Google Scholar
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227
Article Google Scholar
Dehak N, Dehak R, Glass J, Reynolds D, Kenny P (2010) Cosine similarity scoring without score normalization techniques. In: Proceeding Odyssey 2010—Speaker Language Recognition Work (Odyssey 2010), pp 71–75
Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798
Article Google Scholar
Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87
Article Google Scholar
Dutta Baruah R, Angelov P (2012) Evolving local means method for clustering of streaming data. In: IEEE international conference fuzzy system, pp 10–15
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Int Conf Knowl Discov Data Min 96:226–231
Google Scholar
Franti P, Virmajoki O, Hautamaki V (2008) Probabilistic clustering by random swap algorithm. In: IEEE international conference on pattern recognition, pp 1–4
Fukunaga K, Hostetler L (1975) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inf Theory 21(1):32–40
Article MathSciNet MATH Google Scholar
Keller JM, Gray MR (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 15(4):580–585
Article Google Scholar
Li J, Ray S, Lindsay BG (2007) A nonparametric statistical approach to clustering via mode identification. J Mach Learn Res 8(8):1687–1723
MathSciNet MATH Google Scholar
Lughofer E, Cernuda C, Kindermann S, Pratama M (2015) Generalized smart evolving fuzzy systems. Evol Syst 6(4):269–292
Article Google Scholar
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: 5th Berkeley symposium mathematical statistics and probability 1967, vol 1, no 233, pp 281–297
McCune B, Grace JB, Urban DL (2002) Analysis of ecological communities, vol 28. MJM Software Design, Gleneden Beach
McLachlan GJ (1999) Mahalanobis distance. Resonance 4(6):20–26
Article Google Scholar
Optical Recognition of Handwritten Digits Dataset. https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits. Accessed 19 July 2017
Pen-Based Recognition of Handwritten Digits Dataset. http://archive.ics.uci.edu/ml/datasets/Pen-Based+Recognition+of+Handwritten+Digits. Accessed 19 July 2017
Precup RE, Filip HI, Rədac MB, Petriu EM, Preitl S, Dragoş CA (2014) Online identification of evolving Takagi-Sugeno-Kang fuzzy models for crane systems. Appl Soft Comput J 24:1155–1163
Article Google Scholar
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science (80-) 344(6191):1493–1496
Article Google Scholar
Rong HJ, Sundararajan N, Bin Huang G, Saratchandran P (2006) Sequential adaptive fuzzy inference system (SAFIS) for nonlinear system identification and prediction. Fuzzy Sets Syst 157(9):1260–1275
Article MathSciNet MATH Google Scholar
Rong HJ, Sundararajan N, Bin Huang G, Zhao GS (2011) Extended sequential adaptive fuzzy inference system for classification problems. Evol Syst 2(2):71–82
Article Google Scholar
Senoussaoui M, Kenny P, Dumouchel P, Stafylakis T (2013) Efficient iterative mean shift based cosine dissimilarity for multi-recording speaker clustering. In: IEEE international conference acoustics speech and signal processing, pp 7712–7715
Setlur V, Stone MC (2016) A linguistic approach to categorical color assignment for data visualization. IEEE Trans Vis Comput Graph 22(1):698–707
Article Google Scholar
Steel Plates Faults Dataset. https://archive.ics.uci.edu/ml/datasets/Steel+Plates+Faults. Accessed 19 July 2017

Download references

Author information

Authors and Affiliations

School of Computing and Communications, Lancaster University Lancaster, B24, InfoLab21, Bailrigg, Lancaster, LA1 4WA, UK
Xiaowei Gu, Plamen P. Angelov & Dmitry Kangin
Computational NeuroEngineering Laboratory, Department of Electrical and Computer Engineering, University of Florida, Gainesville, USA
Jose C. Principe

Authors

Xiaowei Gu
View author publications
You can also search for this author in PubMed Google Scholar
Plamen P. Angelov
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Kangin
View author publications
You can also search for this author in PubMed Google Scholar
Jose C. Principe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Plamen P. Angelov.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gu, X., Angelov, P.P., Kangin, D. et al. A new type of distance metric and its use for clustering. Evolving Systems 8, 167–177 (2017). https://doi.org/10.1007/s12530-017-9195-7

Download citation

Received: 04 May 2017
Accepted: 27 June 2017
Published: 26 July 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s12530-017-9195-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new type of distance metric and its use for clustering

Abstract

Access this article

Similar content being viewed by others

A New Topology-Preserving Distance Metric with Applications to Multi-dimensional Data Clustering

Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms

Binary Gravitational Subspace Search for Outlier Detection in High Dimensional Data Streams

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new type of distance metric and its use for clustering

Abstract

Access this article

Similar content being viewed by others

A New Topology-Preserving Distance Metric with Applications to Multi-dimensional Data Clustering

Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms

Binary Gravitational Subspace Search for Outlier Detection in High Dimensional Data Streams

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation