Advertisement

Hubness-Based Fuzzy Measures for High-Dimensional k-Nearest Neighbor Classification

  • Nenad Tomašev
  • Miloš Radovanović
  • Dunja Mladenić
  • Mirjana Ivanović
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6871)

Abstract

High-dimensional data are by their very nature often difficult to handle by conventional machine-learning algorithms, which is usually characterized as an aspect of the curse of dimensionality. However, it was shown that some of the arising high-dimensional phenomena can be exploited to increase algorithm accuracy. One such phenomenon is hubness, which refers to the emergence of hubs in high-dimensional spaces, where hubs are influential points included in many k-neighbor sets of other points in the data. This phenomenon was previously used to devise a crisp weighted voting scheme for the k-nearest neighbor classifier. In this paper we go a step further by embracing the soft approach, and propose several fuzzy measures for k-nearest neighbor classification, all based on hubness, which express fuzziness of elements appearing in k-neighborhoods of other points. Experimental evaluation on real data from the UCI repository and the image domain suggests that the fuzzy approach provides a useful measure of confidence in the predicted labels, resulting in improvement over the crisp weighted method, as well the standard kNN classifier.

Keywords

Neighborhood Size Fuzzy Approach Fuzzy Measure Neighbor List Fuzzy Estimate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    François, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering 19(7), 873–886 (2007)CrossRefGoogle Scholar
  2. 2.
    Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  3. 3.
    Houle, M.E., Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Can shared-neighbor distances defeat the curse of dimensionality? In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 482–500. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  4. 4.
    Durrant, R.J., Kabán, A.: When is ‘nearest neighbour’ meaningful: A converse theorem and implications. Journal of Complexity 25(4), 385–397 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Radovanović, M., Nanopoulos, A., Ivanović, M.: Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research 11, 2487–2531 (2010)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Radovanović, M., Nanopoulos, A., Ivanović, M.: Nearest neighbors in high-dimensional data: The emergence and influence of hubs. In: Proc. 26th Int. Conf. on Machine Learning (ICML), pp. 865–872 (2009)Google Scholar
  7. 7.
    Radovanović, M., Nanopoulos, A., Ivanović, M.: On the existence of obstinate results in vector space models. In: Proc. 33rd Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 186–193 (2010)Google Scholar
  8. 8.
    Radovanović, M., Nanopoulos, A., Ivanović, M.: Time-series classification in many intrinsic dimensions. In: Proc. 10th SIAM Int. Conf. on Data Mining (SDM), pp. 677–688 (2010)Google Scholar
  9. 9.
    Keller, J.E., Gray, M.R., Givens, J.A.: A fuzzy k-nearest neighbor algorithm. IEEE Transactions on Systems, Man and Cybernetics 15(4), 580–585 (1985)CrossRefGoogle Scholar
  10. 10.
    Zuo, W., Zhang, D., Wang, K.: On kernel difference-weighted k-nearest neighbor classification. Pattern Analysis and Applications 11, 247–257 (2008)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Zadeh, L.A.: Fuzzy sets. Information and Control 8(3), 338–353 (1965)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Cintra, M.E., Camargo, H.A., Monard, M.C.: A study on techniques for the automatic generation of membership functions for pattern recognition. In: Congresso da Academia Trinacional de Ciências (C3N), vol. 1, pp. 1–10 (2008)Google Scholar
  13. 13.
    Zheng, K., Fung, P.C., Zhou, X.: K-nearest neighbor search for fuzzy objects. In: Proc. 36th ACM SIGMOD Int. Conf. on Management of Data, pp. 699–710 (2010)Google Scholar
  14. 14.
    Babu, V.S., Viswanath, P.: Rough-fuzzy weighted k-nearest leader classifier for large data sets. Pattern Recognition 42(9), 1719–1731 (2009)CrossRefzbMATHGoogle Scholar
  15. 15.
    Pham, T.D.: An optimally weighted fuzzy k-NN algorithm. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 239–247. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    Chen, J., Fang, H., Saad, Y.: Fast approximate kNN graph construction for high dimensional data via recursive Lanczos bisection. Journal of Machine Learning Research 10, 1989–2012 (2009)zbMATHGoogle Scholar
  17. 17.
    Nadeau, C., Bengio, Y.: Inference for the generalization error. Machine Learning 52(3), 239–281 (2003)CrossRefzbMATHGoogle Scholar
  18. 18.
    Zhang, Z., Zhang, R.: Multimedia Data Mining, 1st edn. Chapman and Hall, Boca Raton (2009)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Nenad Tomašev
    • 1
  • Miloš Radovanović
    • 2
  • Dunja Mladenić
    • 1
  • Mirjana Ivanović
    • 2
  1. 1.Artificial Intelligence LaboratoryInstitute Jožef StefanLjubljanaSlovenia
  2. 2.Department of Mathematics and InformaticsUniversity of Novi SadNovi SadSerbia

Personalised recommendations