Sparse Linear Combination of SOMs for Data Imputation: Application to Financial Database

  • Antti Sorjamaa
  • Francesco Corona
  • Yoan Miche
  • Paul Merlin
  • Bertrand Maillet
  • Eric Séverin
  • Amaury Lendasse
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5629)

Abstract

This paper presents a new methodology for missing value imputation in a database. The methodology combines the outputs of several Self-Organizing Maps in order to obtain an accurate filling for the missing values. The maps are combined using MultiResponse Sparse Regression and the Hannan-Quinn Information Criterion. The new combination methodology removes the need for any lengthy cross-validation procedure, thus speeding up the computation significantly. Furthermore, the accuracy of the filling is improved, as demonstrated in the experiments.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sorjamaa, A., Lendasse, A., Cornet, Y., Deleersnijder, E.: An improved methodology for filling missing values in spatiotemporal climate data set. Computational Geosciences (February 2009) (online publication), doi:10.1007/s10596-009-9132-3Google Scholar
  2. 2.
    Kohonen, T.: Self-Organizing Maps. Springer, Berlin (1995)CrossRefMATHGoogle Scholar
  3. 3.
    Wang, S.: Application of self-organising maps for data mining with incomplete data sets. Neural Computing and Applications 12(1), 42–48 (2003)CrossRefGoogle Scholar
  4. 4.
    Sorjamaa, A., Liitiäinen, E., Lendasse, A.: Time series prediction as a problem of missing values: Application to estsp2007 and nn3 competition benchmarks. In: IJCNN, International Joint Conference on Neural Networks, Documation LLC, Eau Claire, Wisconsin, USA, August 12-17, pp. 1770–1775 (2007), doi:10.1109/IJCNN.2007.4371429Google Scholar
  5. 5.
    Cottrell, M., Letrémy, P.: Missing values: Processing with the kohonen algorithm. In: Applied Stochastic Models and Data Analysis, Brest, France, May 17-20, pp. 489–496 (2005)Google Scholar
  6. 6.
  7. 7.
    Similä, T., Tikka, J.: Multiresponse sparse regression with application to multidimensional scaling. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 97–102. Springer, Heidelberg (2005)Google Scholar
  8. 8.
    Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. In: Annals of Statistics, vol. 32, pp. 407–499 (2004)Google Scholar
  9. 9.
    Akaike, H.: A new look at the statistical model identification. IEEE Transactions on Automatic Control 19(6), 716–723 (1974)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Schwarz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461–464 (1978)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Bhansali, R.J., Downham, D.Y.: Some properties of the order of an autoregressive model selected by a generalization of akaike’s epf criterion. Biometrika 64(3), 547–551 (1977)MathSciNetMATHGoogle Scholar
  12. 12.
    Hannan, E.J., Quinn, B.G.: The determination of the order of an autoregression. Journal of the Royal Statistical Society, B 41, 190–195 (1979)MathSciNetMATHGoogle Scholar
  13. 13.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Antti Sorjamaa
    • 1
  • Francesco Corona
    • 1
  • Yoan Miche
    • 1
  • Paul Merlin
    • 2
  • Bertrand Maillet
    • 2
  • Eric Séverin
    • 3
  • Amaury Lendasse
    • 2
  1. 1.Department of Information and Computer ScienceHelsinki University of TechnologyFinland
  2. 2.A.A. Advisors-QCG (ABN AMRO) – Variances, CES/CNRS and EIFUniversity of Paris-1
  3. 3.Department GEAUniversity of Lille 1France

Personalised recommendations