Parallel k-Most Similar Neighbor Classifier for Mixed Data

  • Guillermo Sanchez-Diaz
  • Anilu Franco-Arcega
  • Carlos Aguirre-Salado
  • Ivan Piza-Davila
  • Luis R. Morales-Manilla
  • Uriel Escobar-Franco
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7435)

Abstract

This paper presents a paralellization of the incremental algorithm inc-k-msn, for mixed data and similarity functions that do not satisfy metric properties. The algorithm presented is suitable for processing large data sets, because it only stores in main memory the k-most similar neighbors processed in step t, traversing only once the training data set. Several experiments with synthetic and real data are presented.

Keywords

K-most similar neighbor K-nearest neighbor classification parallel algorithms 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. Transactions on Information Theory (13), 21–27 (1967)Google Scholar
  2. 2.
    Ramasubramanian, V., Paliwal, K.: Fast nearest-neighbor search based on approximation-elimination search. Pattern Recognition (33), 1497–1510 (2000)Google Scholar
  3. 3.
    Yong-Sheng, C., Yi-Ping, H., Chiou-Shann, F.: Fast and versatile algorithm for nearest neighbor search based on lower bound tree. Pattern Recognition Letters 2(40), 360–375 (2007)Google Scholar
  4. 4.
    Adler, M., Heeringa, B.: Search Space Reductions for Nearest-Neighbor Queries. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 554–567. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Sone, I., Olsen, R., Sivertsen, A., Eilertsen, G., Heia, K.: Classification of fresh Atlantic salmon (Salmo salar L.) fillets stored under different atmospheres by hyperspectral imaging. Journal of Food Engineering 109(3), 482–489 (2012)CrossRefGoogle Scholar
  6. 6.
    Chen, H., Yang, B., Wang, G., Liu, J., Xu, X., Wang, S., Liu, D.: A novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method. Knowledge-Based Systems 24(8), 1348–1359 (2011)CrossRefGoogle Scholar
  7. 7.
    Xia, C., Lu, H., Ooi, B., Hu, J.: Gorder: an eficient method for knn join processing. In: Proc. of the 30th International Conference on Very Large Data Bases, pp. 756–767 (2004)Google Scholar
  8. 8.
    Yu, C., Cui, B., Wang, S., Su, J.: Eficient index-based knn join processing for high-dimensional data. Inf. Softw. Technol. 4(49), 332–344 (2007); basado en el algoritmo incrementalCrossRefGoogle Scholar
  9. 9.
    Bohm, C., Krebs, C.F.: The k-nearest neighbor join: turbo charging the kdd process. Knowledge Information Systems 6(6), 728–749 (2004)CrossRefGoogle Scholar
  10. 10.
    Yu, C., Zhang, R., Huang, Y., Xiong, H.: High-dimensional kNN joins with incremental updates. Geoinformatica (14), 55–82 (2010)Google Scholar
  11. 11.
    Ruiz-Shulcloper, J.: Pattern recognition with mixed and incomplete data. Pattern Recognition and Image Analysis 18(4), 563–576 (2008)CrossRefGoogle Scholar
  12. 12.
    Hernandez-Rodriguez, S., Martinez-Trinidad, J., Carrasco-Ochoa, A.: Fast k most similar neighbor classifier for mixed data (tree k-MSN). Pattern Recognition (43), 873–886 (2010)Google Scholar
  13. 13.
    Sanchez-Diaz, G., Escobar-Franco, U., Morales-Manilla, L.R., Piza-Davila, I., Aguirre-Salado, C., Franco-Arcega, A.: Incremental k most similar neighbor classifier for mixed data. Submitted to Revista Facultad de Ingenieria, Universidad de AntioquiaGoogle Scholar
  14. 14.
    Walkowiak, K., Woźniak, M.: Modeling of Network Computing Systems for Decision Tree Induction Tasks. In: Corchado, E., Yin, H. (eds.) IDEAL 2009. LNCS, vol. 5788, pp. 759–766. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  15. 15.
    Jin, Y., Gao, Y., Shi, Y., Shang, L., Wang, R., Yang, Y.: P2LSA and P2LSA+: Two Paralleled Probabilistic Latent Semantic Analysis Algorithms Based on the MapReduce Model. In: Yin, H., Wang, W., Rayward-Smith, V. (eds.) IDEAL 2011. LNCS, vol. 6936, pp. 385–393. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  16. 16.
    Barua, S., Alhajj, R.: Parallel Wavelet Transform for Spatio-temporal Outlier Detection in Large Meteorological Data. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 684–694. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  17. 17.
    Ruiz-Shulcloper, J., Abidi, M.: Logical combinatorial pattern recognition: A review. Transworld Research Network, Kerala, India (2002)Google Scholar
  18. 18.
    Hernández-Rodríguez, S., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F.: Fast k Most Similar Neighbor Classifier for Mixed Data Based on Approximating and Eliminating. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 697–704. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Guillermo Sanchez-Diaz
    • 1
  • Anilu Franco-Arcega
    • 2
  • Carlos Aguirre-Salado
    • 1
  • Ivan Piza-Davila
    • 3
  • Luis R. Morales-Manilla
    • 4
  • Uriel Escobar-Franco
    • 4
  1. 1.Universidad Autonoma de San Luis PotosiSan Luis PotosiMexico
  2. 2.Universidad Autonoma del Estado de HidalgoPachucaMexico
  3. 3.Instituto Tecnologico y de Estudios Superiores de OccidenteTlaquepaqueMexico
  4. 4.Universidad Politecnica de TulancingoTulancingoMexico

Personalised recommendations