Abstract
The KNN algorithm is one of the most famous algorithms in data mining. It consists in calculating the distance between a query and all the data in the reference set. In this paper, we present an approach to standardize variables that avoids making assumptions about the presence of outliers or the number of classes. Our method involves computing the ranks of values within the dataset for each variable and using these ranks to standardize the variables. We then calculate a dissimilarity index between the standardized data, called the Rank-Based Dissimilarity Index (RBDI), which we use instead of Euclidean distance to find the K nearest neighbors. Finally, we combine the Euclidean distance and the RBDI index taking into account the advantage of both dissimilarity indices. In essence, the Euclidean distance considers the Euclidean geometry of the data space while RBDI is not constrained by distance or geometry in data space. We evaluate our approach using multidimensional open datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Afzal, S., Ganesh, K.: Load balancing in cloud computing - a hierarchical taxonomical classification. J. Cloud Comput. 8 (2019)
Aquino, L.D.G., Eckstein, S.: Minmax methods for optimal transport and beyond: regularization, approximation and numerics (2020)
Arora, A., Sinha, S., Kumar, P., Bhattacharya, A.: HD-index: pushing the scalability-accuracy boundary for approximate KNN search in high-dimensional spaces. Proc. VLDB Endow. 11(8), 906–919 (2018)
Belkasim, S., Shridhar, M., Ahmadi, M.: Pattern classification using an efficient KNNR. Pattern Recogn. 25(10), 1269–1274 (1992)
Boucetta, C., Hussenet, L., Herbin, M.: Practical method for multidimensional data ranking. In: Phillipson, F., Eichler, G., Erfurth, C., Fahrnberger, G. (eds.) I4CS 2022. CCIS, vol. 1585, pp. 267–277. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06668-9_19
Farahnakian, F., Pahikkala, T., Liljeberg, P., Plosila, J.: Energy aware consolidation algorithm based on k-nearest neighbor regression for cloud data centers. In: 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing, pp. 256–259 (2013)
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Weiss, Y., Schölkopf, B., Platt, J. (eds.) Advances in Neural Information Processing Systems, vol. 18. MIT Press, Cambridge (2005)
Hussenet, L., Boucetta, C.: A green-aware optimization strategy for virtual machine migration in cloud data centers. In: 2022 International Wireless Communications and Mobile Computing (IWCMC), pp. 1082–1087 (2022)
Liang, B., Wu, D., Wu, P., Su, Y.: An energy-aware resource deployment algorithm for cloud data centers based on dynamic hybrid machine learning. Knowl.-Based Syst. 222, 107020 (2021)
Mazidi, A., Golsorkhtabar, M., Tabari, M.: Autonomic resource provisioning for multilayer cloud applications with k-nearest neighbor resource scaling and priority-based resource allocation. Software: Practice and Experience 50 (04 2020)
Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)
Ou, X., et al.: Hyperspectral image target detection via weighted joint k-nearest neighbor and multitask learning sparse representation. IEEE Access 8, 11503–11511 (2020)
Su, J., Nair, S., Popokh, L.: EdgeGYM: a reinforcement learning environment for constraint-aware NFV resource allocation. In: 2023 IEEE 2nd International Conference on AI in Cybersecurity (ICAIC), pp. 1–7 (2023)
Taunk, K., De, S., Verma, S., Swetapadma, A.: A brief review of nearest neighbor algorithm for learning and classification. In: 2019 International Conference on Intelligent Computing and Control Systems (ICCS), pp. 1255–1260 (2019)
Xie, M., Hu, J., Han, S., Chen, H.H.: Scalable hypergrid K-NN-based online anomaly detection in wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. 24(8), 1661–1670 (2013)
Yu, C., Cui, B., Wang, S., Su, J.: Efficient index-based KNN join processing for high-dimensional data. Inf. Softw. Technol. 49(4), 332–344 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Boucetta, C., Hussenet, L., Herbin, M. (2023). Improved Euclidean Distance in the K Nearest Neighbors Method. In: Krieger, U.R., Eichler, G., Erfurth, C., Fahrnberger, G. (eds) Innovations for Community Services. I4CS 2023. Communications in Computer and Information Science, vol 1876. Springer, Cham. https://doi.org/10.1007/978-3-031-40852-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-40852-6_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40851-9
Online ISBN: 978-3-031-40852-6
eBook Packages: Computer ScienceComputer Science (R0)