Abstract
We present a factorization framework to analyze the data of a regression learning task with two peculiarities. First, inputs can be split into two parts that represent semantically significant entities. Second, the performance of regressors is very low. The basic idea of the approach presented here is to try to learn the ordering relations of the target variable instead of its exact value. Each part of the input is mapped into a common Euclidean space in such a way that the distance in the common space is the representation of the interaction of both parts of the input. The factorization approach obtains reliable models from which it is possible to compute a ranking of the features according to their responsibility in the variation of the target variable. Additionally, the Euclidean representation of data provides a visualization where metric properties have a clear semantics. We illustrate the approach with a case study: the analysis of a dataset about the variations of Body Mass Index for Age of children after a Food Aid Program deployed in poor rural communities in Southern México. In this case, the two parts of inputs are the vectorial representation of children and their diets. In addition to discovering latent information, the mapping of inputs allows us to visualize children and diets in a common metric space.
This is a preview of subscription content,
to check access.

References
Bahamonde, A., Bayón, G.F., Díez, J., Quevedo, J.R., Luaces, O., del Coz, J.J., Alonso, J., Goyache, F.: Feature subset selection for learning preferences: a case study. In: Proceedings of the International Conference on Machine Learning (ICML ’04), pp. 49–56 (2004)
Chen, S., Moore, J., Turnbull, D., Joachims, T.: Playlist prediction via metric embedding. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 714–722. ACM, New York (2012)
González de Cossío, T., Gutiérrez, J., González-Castell, D., Rodríguez-Ramárez, S., Unar, M., Leroy, J., Gadsden, P., Hernández-Licona, G., Gertler, P.: Evaluación de impacto del programa de apoyo alimentario. In: Nutrición y pobreza: política pública basada en evidencia. World Bank, SEDESOL (2008)
del Coz, J.J., Bayón, G.F., Díez, J., Luaces, O., Bahamonde, A., Sañudo, C.: Trait selection for assessing beef meat quality using non-linear SVM. In: Advances in Neural Information Processing Systems 17 (NIPS ’04), pp. 321–328 (2005)
Díez, J., Bayón, G.F., Quevedo, J.R., del Coz, J.J., Luaces, O., Alonso, J., Bahamonde, A.: Discovering relevancies in very difficult regression problems: applications to sensory data analysis. In: Proceedings of the European Conference on Artificial Intelligence (ECAI ’04) (2004)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)
Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: Proceedings of the IEEE 12th International Conference on Computer Vision, pp. 2130–2137 (2009)
Leroy, J.L., Gadsden, P., González de Cossío, T., Gertler, P.: Cash and in-kind transfers lead to excess weight gain in a population of women with a high prevalence of overweight in rural mexico. J. Nutr. 143(3), 378–383 (2013)
Leroy, J.L., Gadsden, P., Rodríguez-Ramírez, S., Gonzalez de Cossío, T.: Cash and in-kind transfers in poor rural communities in mexico increase household fruit, vegetable, and micronutrient consumption but also lead to excess energy consumption. J. Nutr. 140(3), 612–617 (2010)
Leroy, J.L., García-Guerra, A., García, R., Dominguez, C., Rivera, J., Neufeld, L.M.: The oportunidades program increases the linear growth of children enrolled at young ages in urban mexico. J. Nutr. 138(4), 793–798 (2008)
Leroy, J.L., Ruel, M., Verhofstadt, E.: The impact of conditional cash transfer programmes on child nutrition: a review of evidence using a programme theory framework. J. Dev. Eff. 1(2), 103–129 (2009)
Luaces, O., Bayón, G.F., Quevedo, J.R., Díez, J., del Coz, J.J., Bahamonde, A.: Analyzing sensory data using non-linear preference learning with feature subset selection. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD ’04), pp. 286–297 (2004)
Moore, J., Chen, S., Joachims, T., Turnbull, D.: Learning to embed songs and tags for playlist prediction. In: Proceedings ISMIR (2012)
Rakotomamonjy, A.: Variable selection using svm based criteria. J. Mach. Learn. Res. 3, 1357–1370 (2003)
Rendle, S.: Factorization machines with libfm. ACM Trans. Intell. Syst. Technol. (TIST) 3(3), 57 (2012)
Rendle, S., Freudenthaler, C., Gantner, Z., Schmidt-Thieme, L.: Bpr: Bayesian personalized ranking from implicit feedback. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 452–461. AUAI Press, Corvallis (2009)
Rendle, S., Schmidt-Thieme, L.: Pairwise interaction tensor factorization for personalized tag recommendation. In: Proceedings of the third ACM international conference on Web search and data mining, pp. 81–90. ACM, New York (2010)
Rivera, J.A., Sotres-Alvarez, D., Habicht, J.P., Shamah, T., Villalpando, S.: Impact of the mexican program for education, health, and nutrition (progresa) on rates of growth and anemia in infants and young children. JAMA: J. Am. Med. Assoc. 291(21), 2563–2570 (2004)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Weston, J., Bengio, S., Hamel, P.: Multi-tasking with joint semantic spaces for large-scale music annotation and retrieval. J. New Music Res. 40(4), 337–348 (2011)
Weston, J., Bengio, S., Usunier, N.: Large scale image annotation: learning to rank with joint word-image embeddings. Mach. Learn. 81(1), 21–35 (2010)
Acknowledgments
The research reported here is supported in part under grant TIN2011-23558 from the MICINN (Ministerio de Ciencia e Innovación, Spain). Edna Gamboa was supported by a Ph.D. grant from CONACYT (Consejo Nacional de Ciencia y Tecnología, México). The paper was written while Antonio Bahamonde was visiting Cornell University with Grants of Movilidad Campus de Excelencia Internacional (Universidad de Oviedo) and of Programa Nacional de Movilidad de Recursos Humanos del Plan Nacional de Investigación (Ministerio de Educación, Cultura y Deporte, Spain). The dataset was gathered in a project supported by Ministerio de Desarrollo Social de México.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Díez, J., Gamboa, E., González de Cossío, T. et al. Analysis of nutrition data by means of a matrix factorization method. Prog Artif Intell 3, 119–127 (2015). https://doi.org/10.1007/s13748-015-0062-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13748-015-0062-0