Abstract
Incomplete data are often neglected when designing machine learning methods. A popular strategy adopted by practitioners to circumvent this consists of taking a preprocessing step to fill the missing components. These preprocessing algorithms are designed independently of the machine learning method that will be applied subsequently, which may lead to sub-optimal results. An alternative solution is to redesign classical machine learning methods to handle missing data directly. In this paper, we propose a variant of the forward stagewise regression (FSR) algorithm for incomplete data. The original FSR is an iterative procedure to estimate parameters of sparse linear models. The proposed method, named forward stagewise regression for incomplete datasets with GMM (FSIG), models the missing components as random variables following a Gaussian mixture distribution. In FSIG, the main steps of FSR are adapted to deaç with the intrinsic uncertainty of incomplete samples. The performance of FSIG was evaluated in an extensive set of experiments, and our model was able to outperform classical methods in most of the tested cases.
Similar content being viewed by others
References
Belanche L, Kobayashi V, Aluja T (2014) Handling missing values in kernel methods with application to microbiology data. Neurocomputing 141:110–116. https://doi.org/10.1016/j.neucom.2014.01.047
Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM Rev 43(1):129–159
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Dheeru D, Karra Taniskidou E (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Efron B, Hastie T, Johnstone I, Tibshirani R et al (2004) Least angle regression. Ann Stat 32(2):407–499
Eirola E, Doquire G, Verleysen M, Lendasse A (2013) Distance estimation in numerical data sets with missing values. Inf Sci 240:115–128
Eirola E, Lendasse A, Vandewalle V, Biernacki C (2014) Mixture of gaussians for distance estimation with missing data. Neurocomputing 131:32–42. https://doi.org/10.1016/j.neucom.2013.07.050
Figueiredo MA, Nowak RD, Wright SJ (2007) Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J Sel Top Signal Process 1(4):586–597
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92
García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR (2009) Pattern classification with missing data: a review. Neural Comput Appl 19:263–282
Gui J, Sun Z, Ji S, Tao D, Tan T (2017) Feature selection based on structured sparsity: a comprehensive study. IEEE Trans Neural Netw Learn Syst 28(7):1490–1507
Hastie T, Taylor J, Tibshirani R, Walther G (2006) Forward stagewise regression and the monotone lasso. Electron J Stat 1:2007
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer, New York
Hulse JV, Khoshgoftaar TM (2014) Incomplete-case nearest neighbor imputation in software measurement data. Inf Sci 259:596–610
Hunt L, Jorgensen M (2003) Mixture model clustering for mixed data with missing information. Comput Stat Data Anal 41(3–4):429–440. https://doi.org/10.1016/S0167-9473(02)00190-1
Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley-Interscience, Hoboken
Liu Z, Wu XJ, Shu Z (2019) Sparsity augmented discriminative sparse representation for face recognition. Pattern Anal Appl. https://doi.org/10.1007/s10044-019-00792-5
Malkomes G, de Brito CEF, Gomes JPP (2017) A stochastic framework for k-SVD with applications on face recognition. Pattern Anal Appl 20(3):845–854. https://doi.org/10.1007/s10044-016-0541-3
Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2):267–278
Mesquita DP, Gomes JP, Junior AHS, Nobre JS (2017) Euclidean distance estimation in incomplete datasets. Neurocomputing 248:11–18. https://doi.org/10.1016/j.neucom.2016.12.081
Murphy KP (2012) Machine learning: a probabilistic perspective. MIT Press, Cambridge
Nebot-Troyano G, Belanche-Muñoz LA (2010) A kernel extension to handle missing data. In: Bramer M, Ellis R, Petridis M (eds) Research and development in intelligent systems XXVI. Springer, London, pp 165–178
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464. https://doi.org/10.1214/aos/1176344136
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288
Veras MBA, Mesquita DPP, Gomes JPP, Souza Junior AH, Barreto GA (2017) Forward stagewise regression on incomplete datasets. In: Rojas I, Joya G, Catala A (eds) Advances in computational intelligence. Springer, Cham, pp 386–395
Wu TT, Lange K et al (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2(1):224–244
Xie P, Liu X, Yin J, Wang Y (2016) Absent extreme learning machine algorithm with application to packed executable identification. Neural Comput Appl 27(1):93–100. https://doi.org/10.1007/s00521-014-1558-4
Yang AY, Sastry SS, Ganesh A, Ma Y (2010) Fast l1-minimization algorithms and an application in robust face recognition: a review. In: 2010 17th IEEE international conference on image processing (ICIP). IEEE, pp 1849–1852
Yuan GX, Chang KW, Hsieh CJ, Lin CJ (2010) A comparison of optimization methods and software for large-scale l1-regularized linear classification. J Mach Learn Res 11(Nov):3183–3234
Zahin SA, Ahmed CF, Alam T (2018) An effective method for classification with missing values. Appl Intell 48(10):3209–3230. https://doi.org/10.1007/s10489-018-1139-9
Zhang H, Wang S, Xu X, Chow TWS, Wu QMJ (2018) Tree2Vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst 29(11):5304–5318. https://doi.org/10.1109/TNNLS.2018.2797060
Zhang X, Song S, Wu C (2013) Robust Bayesian classification with incomplete data. Cogn Comput 5(2):170–187. https://doi.org/10.1007/s12559-012-9188-6
Zhang Z, Xu Y, Yang J, Li X, Zhang D (2015) A survey of sparse representation: algorithms and applications. IEEE Access 3:490–530
Ziegler ML (2000) Variable selection when confronted with missing data. PhD thesis, University of Pittsburgh
Acknowledgements
The authors would like to thank the Brazilian National Council for Scientific and Technological Development (CNPq) for financial support (Grant 302289/2019-4).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Veras, M.B.A., Mesquita, D.P.P., Mattos, C.L.C. et al. A sparse linear regression model for incomplete datasets. Pattern Anal Applic 23, 1293–1303 (2020). https://doi.org/10.1007/s10044-019-00859-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-019-00859-3