Advertisement

BorderShift: toward optimal MeanShift vector for cluster boundary detection in high-dimensional data

  • Xiaofeng CaoEmail author
  • Baozhi Qiu
  • Guandong Xu
Theoretical Advances
  • 160 Downloads

Abstract

We present a cluster boundary detection scheme that exploits MeanShift and Parzen window in high-dimensional space. To reduce the noises interference in Parzen window density estimation process, the kNN window is introduced to replace the sliding window with fixed size firstly. Then, we take the density of sample as the weight of its drift vector to further improve the stability of MeanShift vector which can be utilized to separate boundary points from core points, noise points, isolated points according to the vector models in multi-density data sets. Under such circumstance, our proposed BorderShift algorithm doesn’t need multi-iteration to get the optimal detection result. Instead, the developed Shift value of each data point helps to obtain it in a liner way. Experimental results on both synthetic and real data sets demonstrate that the F-measure evaluation of BorderShift is higher than that of other algorithms.

Keywords

Cluster boundary MeanShift Parzen window High-dimensional space 

References

  1. 1.
    Faktor A, Irani M (2014) Clustering by compositionunsupervised discovery of image categories. IEEE Trans Pattern Anal Mach Intell 36(6):1092–1106CrossRefGoogle Scholar
  2. 2.
    Chen YX, Wang JZ, Krovetz R (2005) CLUE: cluster-based retrieval of images by unsupervised learning. IEEE Trans Image Process 14(8):1187–1201CrossRefGoogle Scholar
  3. 3.
    Horng Y, Chen SM, Chang YC, Lee CH (2005) A new method for fuzzy information retrieval based on fuzzy hierarchical clusteringand fuzzy inference techniques. IEEE Trans Fuzzy Syst 13(2):216–228CrossRefGoogle Scholar
  4. 4.
    Li Q, Chen YP (2010) Personalized text snippet extraction using statistical language models. Pattern Recogn 43(1):378–386zbMATHCrossRefGoogle Scholar
  5. 5.
    Kim S, Yoo CD, Nowozin S, Kohli P (2014) Image segmentation using higher-order correlation clustering. IEEE Trans Pattern Anal Mach Intell 36(9):1761–1774CrossRefGoogle Scholar
  6. 6.
    Hohn N, Veitch D, Abry P (2003) Cluster processes: a natural language for network traffic. IEEE Trans Signal Process 51(8):2229–2244MathSciNetCrossRefGoogle Scholar
  7. 7.
    Qiu BZ, Yue F, Shen JY (2007) BRIM: an efficient boundary points detecting algorithm. In: Zhou ZH, Li H, Yang Q (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 761–768CrossRefGoogle Scholar
  8. 8.
    Xue LX, Qiu BZ (2009) Boundary points detection algorithm based on coefficient of variation. Pattern Recog Artif Intell 22(5):799–802Google Scholar
  9. 9.
    Qiu BZ, Yang Y, Du XW (2012) BRINK: an algorithmof boundary points of clusters detection based on local qualitative factors. J Zhengzhou Univ (Eng Sci) 33:117–120MathSciNetGoogle Scholar
  10. 10.
    Qiu BZ, Cao HL (2011) An efficient boundary points detecting algorithm based on joint entropy. Control Dec 1:71–74Google Scholar
  11. 11.
    Li LX, Geng P, Qiu BZ (2015) clusteringboundary detection technology for mixed attribute data set. Kongzhi yu Juece/Control Dec 30(1):171–175Google Scholar
  12. 12.
    Roweis ST, Saul LK (2000) Nonlnear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326CrossRefGoogle Scholar
  13. 13.
    He X, Niyogi P (2003) Locality preserving projections. In: Thrun S, Saul L, Schlkopf B (eds) Advances in neural information processing systems. MIT Press, CambridgeGoogle Scholar
  14. 14.
    Ester M, Kriegel HP, Sander J et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231Google Scholar
  15. 15.
    Xia C, Hsu W, Lee ML et al (2006) BORDER: an efficient computation of boundary points. IEEE Trans Knowl Data Eng 3:289–303Google Scholar
  16. 16.
    Vapnik VN (1995) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999CrossRefGoogle Scholar
  17. 17.
    Hearst MA, Dumais ST, Osman E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst 13(4):18–28CrossRefGoogle Scholar
  18. 18.
    Parzen E (1962) On the estimation of a probability density function and the mode. Ann Math Stat 32:10651076MathSciNetGoogle Scholar
  19. 19.
    Fukunaga K, Hostetler LD (1975) The estimation of the gradient of a density function. IEEE Trans Inf Theory 21:32–40zbMATHCrossRefGoogle Scholar
  20. 20.
    Cheng YZ (1995) MeanShift, mode seeking, and clustering. IEEE Trans Pattern Anal Mach Intell 17(8):790–799CrossRefGoogle Scholar
  21. 21.
    Comaniciu D, Meer P (1999) MeanShift analysis and application. In: Proceedings of the international conference on computer vision, pp 1197–1204Google Scholar
  22. 22.
    Chen XG, Xu GH, Liu F, Wan X, Zhang Q (2015) An adaptive alarm method for tool condition monitoring based on probability density functions estimated with the Parzen window. In: Tse P, Mathew J, Wong K, Lam R, Ko C (eds) Engineering asset management—systems, professional practices and certification lecture notes in mechanical engineering. Springer, Cham, pp 1–8Google Scholar
  23. 23.
    Vikjord V, Jenssen R (2014) Information theoretic clustering using a k-nearest neighbors approach. Pattern Recogn 47:30703081CrossRefGoogle Scholar
  24. 24.
  25. 25.
  26. 26.
  27. 27.
    Prostate data set. http://www.gems-system.org/
  28. 28.
    Zhu Q, Xin H (2013) Feature extraction and filter in handwritten numeral recognition. In: Bian F, Xie Y, Cui X, Zeng Y (eds) Geo-informatics in resource management and sustainable ecosystem. Communications in Computer and Information Science, vol 398. Springer, Berlin, pp 58–67Google Scholar
  29. 29.
    Juan M, Weber A, Sesmero MP (2015) Input transformation and output combination for improved handwritten digit recognition. In: Koprinkova-Hristova P, Mladenov V, Kasabov N (eds) Artificial neural networks. Springer series in bio-/neuroinformatics, vol 4. Springer, Cham, pp 435–443Google Scholar
  30. 30.
    Wang YM, Peyls A, Pan Y, Claesen L, Yan XL (2013) A fast self-organizing map algorithm for handwritten digit recognition. In: Park J, Ng JY, Jeong HY, Waluyo B (eds) Multimedia and ubiquitous engineering. Lecture notes in electrical engineering, vol 240. Springer, Dordrech, pp 177–183CrossRefGoogle Scholar
  31. 31.
  32. 32.
    Huang SC, Chen J, Luo Z (2014) Retraction note to: sparse tensor CCA for color face recognition. Neural Comput Appl 25(7–8):2091CrossRefGoogle Scholar
  33. 33.
    Bhaskar B, Mahantesh K, Geetha GP (2015) An investigation of fSVD and Ridgelet transform for illumination and expression invariant face recognition. In: El-Alfy ES, Thampi S, Takagi H, Piramuthu S, Hanne T (eds) Advances in intelligent informatics. Advances in intelligent systems and computing, vol 320. Springer, Cham, pp 31–38Google Scholar
  34. 34.
    Dang KD, Le TH (2013) Local region partitioning for disguised face recognition using non-negative sparse coding. In: Nguyen N, Trawiński B, Katarzyniak R, Jo GS (eds) Advanced methods for computational collective intelligence Studies in computational intelligence, vol 457. Springer, Berlin, pp 197–206CrossRefGoogle Scholar
  35. 35.
    Zang HJ, Zhan S, Zhang MJ, Zhao JJ, Liang ZC (2014) 3D face recognition by collaborative representation based on face feature. In: Sun Z, Shan S, Sang H, Zhou J, Wang Y, Yuan W (eds) Biometric recognition. Lecture notes in computer science, vol 8833. Springer, Cham, pp 182–190Google Scholar
  36. 36.
    Ge WY, Li PW (2013) A fuzzy expectation-maximization algorithm of electronic remote-sensing image. In: Jin D, Lin S (eds) Advances in mechanical and electronic engineering. Lecture notes in electrical engineering, vol 178. Springer, Berlin, pp 323–329CrossRefGoogle Scholar
  37. 37.
  38. 38.
    Sheldon KP (2008) Navigating for Noah: setting new directions for endangered species protection in the 21st century. In: Askins RA, Dreyer GD, Visgilio GR, Whitelaw DM (eds) Saving biological diversity. Springer, Boston, pp 21–33CrossRefGoogle Scholar
  39. 39.
    Colombo C, Leopold JH, Bellazzi R, Abu-Hanna A (2015) Comparison of probabilistic versus non-probabilistic electronic nose classification methods in an animal model. In: Holmes J, Bellazzi R, Sacchi L, Peek N (eds) Artificial intelligence in medicine. Lecture notes in computer science, vol 9105. Springer, Cham, pp 298–303CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Information EngineeringZhengzhou UniversityZhengzhouChina
  2. 2.Advanced Analytics InstituteUniversity of Technology SydneySydneyAustralia

Personalised recommendations