Nearest Neighbour in Least Squares Data Imputation Algorithms for Marketing Data

  • Ito WasitoEmail author
Part of the Springer Optimization and Its Applications book series (SOIA, volume 92)


Marketing research operates with multivariate data for solving such problems as market segmentation, estimating purchasing power of a market sector, modeling attrition. In many cases, the data collected or supplied for these purposes may have a number of missing entries.The paper is devoted to an empirical evaluation of method for imputation of missing data in the so-called nearest neighbour of least-squares approximation approach, a non-parametric computationally efficient multidimensional technique. We make contributions to each of the two components of the experiment setting: (a) An empirical evaluation of the nearest neighbour in least-squares data imputation algorithm for marketing research (b) experimental comparisons with expectation–maximization (EM) algorithm and multiple imputation (MI) using real marketing data sets. Specifically, we review “global” methods for least-squares data imputation and propose extensions to them based on the nearest neighbours (NN) approach. It appears that NN in the least-squares data imputation algorithm almost always outperforms EM algorithm and is comparable to the multiple imputation approach.


Least squares Nearest neighbours Singular value decomposition Missing data Marketing data 



The author gratefully acknowledges many comments by reviewers that have been very helpful in improving the presentation.


  1. 1.
    Aha, D.: Editorial. Artif. Intel. Rev. 11, 1–6 (1997)Google Scholar
  2. 2.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. 39, 1–38 (1977)zbMATHMathSciNetGoogle Scholar
  3. 3.
  4. 4.
    Gabriel, K.R., Zamir, S.: Lower rank approximation of matrices by least squares with any choices of weights. Technometrics 21, 489–298 (1979)CrossRefzbMATHGoogle Scholar
  5. 5.
    Golub, G.H., Loan, C.F.: Matrix Computation, 2nd edn. John Hopkins University Press, Baltimore (1986)Google Scholar
  6. 6.
    Heiser, W.J.: Convergent computation by iterative majorization: theory and applications in multidimensional analysis, In: Krzanowski, W.J. (ed.) Recent Advances in Descriptive Multivariate Analysis, pp. 157–189. Oxford University Press, Oxford (1995)Google Scholar
  7. 7.
    Ho, Y., Chung, Y., Lau, K.: Unfolding large-scale marketing data. Int. J. Res. Mark. 27, 119–132 (2010)CrossRefGoogle Scholar
  8. 8.
    Holzinger, K.J., Harman, H.H.: Factor Analysis. University of Chicago Press, Chicago (1941)Google Scholar
  9. 9.
    Jollife, I.T.: Principal Component Analysis. Springer, New-York (1986)CrossRefGoogle Scholar
  10. 10.
    Kiers, H.A.L.: Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62, 251–266 (1997)CrossRefzbMATHMathSciNetGoogle Scholar
  11. 11.
    Laaksonen, S.: Regression-based nearest neighbour hot decking. Comput. Stat. 15, 65–71 (2000)CrossRefzbMATHGoogle Scholar
  12. 12.
    Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)zbMATHGoogle Scholar
  13. 13.
    Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic, Dordrecht (1996)CrossRefzbMATHGoogle Scholar
  14. 14.
    Mitchell, T.M.: Machine Learning. McGraw-Hill, London (1997)zbMATHGoogle Scholar
  15. 15.
    Myrtveit, I., Stensrud, E., Olsson, U.H.: Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods. IEEE Trans. Softw. Eng. 27, 999–1013 (2001)CrossRefGoogle Scholar
  16. 16.
    Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (1987)CrossRefGoogle Scholar
  17. 17.
    Rubin, D.B.: Multiple imputation after 18+ years. J. Am. Stat. Assoc. 91, 473–489 (1996)CrossRefzbMATHGoogle Scholar
  18. 18.
    Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman and Hall, London (1997)CrossRefzbMATHGoogle Scholar
  19. 19.
    Schafer, J.L.: NORM. (1997)
  20. 20.
    Strauss, R.E., Atanassov, M.N., De Oliveira, J.A.: Evaluation of the principal-component and expectation-maximization methods for estimating missing data in morphometric studies. J. Vertebr. Paleontol. 23(2), 284–296 (2003)CrossRefGoogle Scholar
  21. 21.
    Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Hastie, R., Tibshirani, R., Botsein, D., Altman, R.B.: Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001)CrossRefGoogle Scholar
  22. 22.
    Wasito, I., Mirkin, B.: Nearest neighbour approach in the least-squares data imputation algorithms. Inf. Sci. 169, 1–25 (2005)CrossRefzbMATHMathSciNetGoogle Scholar
  23. 23.
    Wasito, I., Mirkin, B.: Least squares data imputation with nearest neighbour approach with different missing patterns. Comput. Stat. Data Anal. 50, 926–949 (2006)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Faculty of Computer ScienceUniversity of IndonesiaDepokIndonesia

Personalised recommendations