Advertisement

Reduction of Dimension and Size of Data Set by Parallel Fast Simulated Annealing

  • Piotr Kulczycki
  • Szymon Łukasik
Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 530)

Abstract

A universal method of dimension and sample size reduction, designed for exploratory data analysis procedures, constitutes the subject of this paper. The dimension is reduced by applying linear transformation, with the requirement that it has the least possible influence on the respective locations of sample elements. For this purpose an original version of the heuristic Parallel Fast Simulated Annealing method was used. In addition, those elements which change the location significantly as a result of the transformation, may be eliminated or assigned smaller weights for further analysis. As well as reducing the sample size, this also improves the quality of the applied methodology of knowledge extraction. Experimental research confirmed the usefulness of the procedure worked out in a broad range of problems of exploratory data analysis such as clustering, classification, identification of outliers and others.

Keywords

Dimension reduction Sample size reduction Linear transformation Simulated annealing Data analysis and mining 

References

  1. 1.
    Gendreau, M., Potvin, J.-Y.: Handbook of Metaheuristics. Springer, New York (2010)CrossRefMATHGoogle Scholar
  2. 2.
    Francois, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Trans. Knowl. Data Eng. 19, 873–886 (2007)CrossRefGoogle Scholar
  3. 3.
    Xu, R., Wunsch, D.C.: Clustering. Wiley, New Jersey (2009)Google Scholar
  4. 4.
    van der Maaten, L.J.P.: Feature extraction from visual data. Ph.D. thesis, Tilburg University (2009)Google Scholar
  5. 5.
    Bartenhagen, C., Klein, H.-U., Ruckert, C., Jiang, X., Dugas, M.: Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data. BMC Bioinformatics, 11, paper no 567 (2010)Google Scholar
  6. 6.
    Yan, S., Xu, D., Zhang, B., Zhang, H.-J., Yang, Q., Lin, S.: Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 29, 40–51 (2007)CrossRefGoogle Scholar
  7. 7.
    Rodriguez-Martinez, E., Goulermas, J.Y., Tingting, M., Ralph, J.F.: Automatic induction of projection pursuit indices. IEEE Trans. Neural Netw. 21, 1281–1295 (2010)CrossRefGoogle Scholar
  8. 8.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2006)Google Scholar
  9. 9.
    Kulczycki, P., Kowalski, P.A.: Bayes classification of imprecise information of interval type. Control Cybern. 40, 101–123 (2011)MathSciNetGoogle Scholar
  10. 10.
    Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38, 257–286 (2000)CrossRefMATHGoogle Scholar
  11. 11.
    Kulczycki, P.: Estymatory jadrowe w analizie systemowej. WNT, Warsaw (2005)Google Scholar
  12. 12.
    Wand, M.P., Jones, M.C.: Kernel Smoothing. Chapman and Hall, London (1995)CrossRefMATHGoogle Scholar
  13. 13.
    Deng, Z., Chung, F.-L., Wang, S.: FRSDE: fast reduced set density estimator using minimal enclosing ball approximation. Pattern Recogn. 41, 1363–1372 (2008)CrossRefMATHGoogle Scholar
  14. 14.
    Saxena, A., Pal, N.R., Vora, M.: Evolutionary methods for unsupervised feature selection using Sammon’s stress function. Fuzzy Inform. Eng. 2, 229–247 (2010)CrossRefGoogle Scholar
  15. 15.
    Sammon, J.W.: A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 18, 401–409 (1969)CrossRefGoogle Scholar
  16. 16.
    Strickert, M., Teichmann, S., Sreenivasulu, N., Seiffert, U.: ‘DIPPP’ online self-improving linear map for distance-preserving data analysis. 5th Workshop on Self-Organizing Maps (WSOM), Paris, 5–8 September 2005, pp. 661–668 (2005)Google Scholar
  17. 17.
    Vanstrum, M.D., Starks, S.A.: An algorithm for optimal linear maps. Southeastcon Conference, Huntsville, 5–8 April 1981, pp. 106–110 (1981)Google Scholar
  18. 18.
    Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. Chapman and Hall, Boca Raton (2000)Google Scholar
  19. 19.
    Pal, S.K., Mitra, P.: Pattern Recognition Algorithms for Data Mining. Chapman and Hall, Boca Raton (2004)CrossRefMATHGoogle Scholar
  20. 20.
    Mitra, P., Murthy, C.A., Pal, S.K.: Density-based multiscale data condensation. IEEE Trans. Pattern Anal. Mach. Intell. 24, 734–747 (2002)CrossRefGoogle Scholar
  21. 21.
    Geman, S., Geman, D.: Stochastic relaxation, Gibbs distribution and the Bayesian restoration in images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984)CrossRefMATHGoogle Scholar
  22. 22.
    Szu, H., Hartley, R.: Fast simulated annealing. Phys. Lett. A 122, 157–162 (1987)CrossRefGoogle Scholar
  23. 23.
    Ingber, L.: Adaptive simulated annealing (ASA): lessons learned. Control Cybern. 25, 33–54 (1996)MATHGoogle Scholar
  24. 24.
    Nam, D., Lee, J.-S., Park, C.H.: N-dimensional Cauchy neighbor generation for the fast simulated annealing. IEICE Trans. Inf. Syst. E87–D, 2499–2502 (2004)Google Scholar
  25. 25.
    Aarts, E.H.L., Korst, J.H.M., van Laarhoven, P.J.M.: Simulated annealing. In: Aarts, E.H.L., Lenstra, J.K. (eds.) Local Search in Combinatorial Optimization. Wiley, Chichester (1997)Google Scholar
  26. 26.
    Ben-Ameur, W.: Computing the initial temperature of simulated annealing. Comput. Optim. Appl. 29, 367–383 (2004)CrossRefMathSciNetGoogle Scholar
  27. 27.
    Kuo, Y.: Using simulated annealing to minimize fuel consumption for the time-dependent vehicle routing problem. Comput. Ind. Eng. 59, 157–165 (2010)CrossRefGoogle Scholar
  28. 28.
    Mesgarpour, M., Kirkavak, N., Ozaktas, H.: Bicriteria scheduling problem on the two-machine flowshop using simulated annealing. Lect. Notes Comput. Sci. 6022, 166–177 (2010)CrossRefMathSciNetGoogle Scholar
  29. 29.
    Sait, S.M., Youssef, H.: Iterative Computer Algorithms with Applications in Engineering: Solving Combinatorial Optimization Problems. IEEE Computer Society Press, Los Alamitos (2000)Google Scholar
  30. 30.
    Bartkute, V., Sakalauskas, L.: Statistical inferences for termination of Markov type random search algorithms. J. Optim. Theory Appl. 141, 475–493 (2009)CrossRefMATHMathSciNetGoogle Scholar
  31. 31.
    David, H.A., Nagaraja, H.N.: Order Statistics. Wiley, New York (2003)CrossRefMATHGoogle Scholar
  32. 32.
    Zhigljavsky, A., Zilinskas, A.: Stochastic Global Optimization. Springer-Verlag, Berlin (2008)MATHGoogle Scholar
  33. 33.
    Azencott, R.: Simulated Annealing: Parallelization Techniques. Wiley, New York (1992)MATHGoogle Scholar
  34. 34.
    Alba, E.: Parallel Metaheuristics: A New Class of Algorithms. Wiley, New York (2005)CrossRefGoogle Scholar
  35. 35.
    Kendall, M.G., Stuart, A.: The Advanced Theory of Statistics; Vol. 2: Inference and Relationship. Griffin, London (1973)MATHGoogle Scholar
  36. 36.
    Borg, I., Groenen, P.J.F.: Modern Multidimensional Scaling. Theory and Applications. Springer-Verlag, Berlin (2005)MATHGoogle Scholar
  37. 37.
    Camastra, F.: Data dimensionality estimation methods: a survey. Pattern Recogn. 36, 2945–2954 (2003)CrossRefMATHGoogle Scholar
  38. 38.
    Kerdprasop, K., Kerdprasop, N., Sattayatham, P.: Weighted k-means for density-biased clustering. Lect. Notes Comput. Sci. 3589, 488–497 (2005)CrossRefGoogle Scholar
  39. 39.
    Parvin, H., Alizadeh, H., Minati, B.: A modification on k-nearest neighbor classifier. Glob. J. Comput. Sci. Technol. 10, 37–41 (2010)Google Scholar
  40. 40.
    Łukasik, S.: Algorytm redukcji wymiaru i licznosci proby dla celow procedur eksploracyjnej analizy danych. Ph.D. thesis, Systems Research Institute, Polish Academy of Sciences (2012)Google Scholar
  41. 41.
    Kulczycki, P., Łukasik, S.: An algorithm for reducing dimension and size of sample for data exploration procedures. In press (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Systems Research Institute, Centre of Information Technology for Data Analysis MethodsPolish Academy of SciencesWarsawPoland
  2. 2.Department of Automatic Control and Information TechnologyCracow University of TechnologyCracowPoland

Personalised recommendations