Hubness-Based Fuzzy Measures for High-Dimensional k-Nearest Neighbor Classification

Tomašev, Nenad; Radovanović, Miloš; Mladenić, Dunja; Ivanović, Mirjana

doi:10.1007/978-3-642-23199-5_2

Nenad Tomašev²⁰,
Miloš Radovanović²¹,
Dunja Mladenić²⁰ &
…
Mirjana Ivanović²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6871))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

2039 Accesses
10 Citations

Abstract

High-dimensional data are by their very nature often difficult to handle by conventional machine-learning algorithms, which is usually characterized as an aspect of the curse of dimensionality. However, it was shown that some of the arising high-dimensional phenomena can be exploited to increase algorithm accuracy. One such phenomenon is hubness, which refers to the emergence of hubs in high-dimensional spaces, where hubs are influential points included in many k-neighbor sets of other points in the data. This phenomenon was previously used to devise a crisp weighted voting scheme for the k-nearest neighbor classifier. In this paper we go a step further by embracing the soft approach, and propose several fuzzy measures for k-nearest neighbor classification, all based on hubness, which express fuzziness of elements appearing in k-neighborhoods of other points. Experimental evaluation on real data from the UCI repository and the image domain suggests that the fuzzy approach provides a useful measure of confidence in the predicted labels, resulting in improvement over the crisp weighted method, as well the standard kNN classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

François, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering 19(7), 873–886 (2007)
Article Google Scholar
Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2000)
Chapter Google Scholar
Houle, M.E., Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Can shared-neighbor distances defeat the curse of dimensionality? In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 482–500. Springer, Heidelberg (2010)
Chapter Google Scholar
Durrant, R.J., Kabán, A.: When is ‘nearest neighbour’ meaningful: A converse theorem and implications. Journal of Complexity 25(4), 385–397 (2009)
Article MathSciNet MATH Google Scholar
Radovanović, M., Nanopoulos, A., Ivanović, M.: Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research 11, 2487–2531 (2010)
MathSciNet MATH Google Scholar
Radovanović, M., Nanopoulos, A., Ivanović, M.: Nearest neighbors in high-dimensional data: The emergence and influence of hubs. In: Proc. 26th Int. Conf. on Machine Learning (ICML), pp. 865–872 (2009)
Google Scholar
Radovanović, M., Nanopoulos, A., Ivanović, M.: On the existence of obstinate results in vector space models. In: Proc. 33rd Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 186–193 (2010)
Google Scholar
Radovanović, M., Nanopoulos, A., Ivanović, M.: Time-series classification in many intrinsic dimensions. In: Proc. 10th SIAM Int. Conf. on Data Mining (SDM), pp. 677–688 (2010)
Google Scholar
Keller, J.E., Gray, M.R., Givens, J.A.: A fuzzy k-nearest neighbor algorithm. IEEE Transactions on Systems, Man and Cybernetics 15(4), 580–585 (1985)
Article Google Scholar
Zuo, W., Zhang, D., Wang, K.: On kernel difference-weighted k-nearest neighbor classification. Pattern Analysis and Applications 11, 247–257 (2008)
Article MathSciNet Google Scholar
Zadeh, L.A.: Fuzzy sets. Information and Control 8(3), 338–353 (1965)
Article MathSciNet MATH Google Scholar
Cintra, M.E., Camargo, H.A., Monard, M.C.: A study on techniques for the automatic generation of membership functions for pattern recognition. In: Congresso da Academia Trinacional de Ciências (C3N), vol. 1, pp. 1–10 (2008)
Google Scholar
Zheng, K., Fung, P.C., Zhou, X.: K-nearest neighbor search for fuzzy objects. In: Proc. 36th ACM SIGMOD Int. Conf. on Management of Data, pp. 699–710 (2010)
Google Scholar
Babu, V.S., Viswanath, P.: Rough-fuzzy weighted k-nearest leader classifier for large data sets. Pattern Recognition 42(9), 1719–1731 (2009)
Article MATH Google Scholar
Pham, T.D.: An optimally weighted fuzzy k-NN algorithm. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 239–247. Springer, Heidelberg (2005)
Chapter Google Scholar
Chen, J., Fang, H., Saad, Y.: Fast approximate kNN graph construction for high dimensional data via recursive Lanczos bisection. Journal of Machine Learning Research 10, 1989–2012 (2009)
MATH Google Scholar
Nadeau, C., Bengio, Y.: Inference for the generalization error. Machine Learning 52(3), 239–281 (2003)
Article MATH Google Scholar
Zhang, Z., Zhang, R.: Multimedia Data Mining, 1st edn. Chapman and Hall, Boca Raton (2009)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Laboratory, Institute Jožef Stefan, Jamova 39, 1000, Ljubljana, Slovenia
Nenad Tomašev & Dunja Mladenić
Department of Mathematics and Informatics, University of Novi Sad, Trg D. Obradovića 4, 21000, Novi Sad, Serbia
Miloš Radovanović & Mirjana Ivanović

Authors

Nenad Tomašev
View author publications
You can also search for this author in PubMed Google Scholar
Miloš Radovanović
View author publications
You can also search for this author in PubMed Google Scholar
Dunja Mladenić
View author publications
You can also search for this author in PubMed Google Scholar
Mirjana Ivanović
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Intitute of Computer Vision and Applied Computer Sciences, IBaI, Kohlenstraße 2, 04107, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tomašev, N., Radovanović, M., Mladenić, D., Ivanović, M. (2011). Hubness-Based Fuzzy Measures for High-Dimensional k-Nearest Neighbor Classification. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2011. Lecture Notes in Computer Science(), vol 6871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23199-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-23199-5_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23198-8
Online ISBN: 978-3-642-23199-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics