An integrated algorithm for three-way compositional data

  • Michele Gallo
  • Violetta Simonacci
  • Maria Anna Di Palma
Article
  • 12 Downloads

Abstract

Compositional data with a tridimensional structure are not uncommon in social sciences. The CANDECOMP/PARAFAC model is one of the most adequate techniques for modeling these arrays without confusing modes variability. Estimating parameters in this setting can be particularly difficult because compositional data are multicollinear by definition and because, in general, for socio-economic data the exact number of latent variables is harder to determine. The most used fitting procedure in the literature is the PARAFAC-ALS algorithm which, however, is sensitive to both the difficulties presented, namely it is sensitive to multicollinearity and to the use of the wrong number of factors. In this work an integrated PARAFAC-ALS algorithm initialized with SWATLD steps is proposed as an effective solution to these deficiencies. This approach is tested on simulated multicollinear data in comparison with standard ALS and proves capable of performing better in terms of robustness against over-factoring and temporary degeneracies, it is faster at converging even in case of collinearity and it still provides a least-squares solution.

Keywords

Alternating least-squares Collinearity Compositions Log-ratio coordinates Three-way data SWATLD 

References

  1. Aitchison, J.: The Statistical Analysis of Compositional Data. Chapman & Hall, London (1986)CrossRefGoogle Scholar
  2. Aitchison, J.: Logratios and natural laws in compositional data analysis. Math. Geol. 31(5), 563–580 (1999)CrossRefGoogle Scholar
  3. Andersson, C.A., Bro, R.: Improving the speed of multi-way algorithms Part II: compression. Chemometr. Intell. Lab. Syst. 42(1), 105–113 (1998)Google Scholar
  4. Billheimer, D., Guttorp, P., Fagan, W.F.: Statistical interpretation of species composition. J. Am. Stat. Assoc. 96(456), 1205–1214 (2001)CrossRefGoogle Scholar
  5. Bro, R.: PARAFAC. Tutorial and applications. Chemometr. Intell. Lab. Syst. 38(2), 149–171 (1997)CrossRefGoogle Scholar
  6. Bro, R.: Multi-way Analysis in the Food Industry. Models Algorithms and Applications. University of Amsterdam, Amsterdam (1998)Google Scholar
  7. Carroll, J.D., Chang, J.J.: Analysis of individual differences in multidimensional scaling via an N-way generalization of Eckart-Young decomposition. Psychometrika 35(3), 283319 (1970)CrossRefGoogle Scholar
  8. Ceulemans, E., Kiers, H.A.: Selecting among three-mode principal component models of different types and complexities: a numerical convex hull based method. Br. J. Math. Stat. Psychol. 59(1), 133–150 (2006)CrossRefGoogle Scholar
  9. Chen, Z.P., Wu, H.L., Jiang, J.H., Li, Y., Yu, R.Q.: A novel trilinear decomposition algorithm for second-order linear calibration. Chemometr. Intell. Lab. Syst. 52(1), 75–86 (2000)CrossRefGoogle Scholar
  10. Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G., Barceló-Vidal, C.: Isometric logratio transformations for compositional data analysis. Math. Geol. 35(3), 279–300 (2003).  https://doi.org/10.1023/A:1023818214614 CrossRefGoogle Scholar
  11. Egozcue, J.J., Barcelo-Vidal, C., Martín-Fernández, J.A., Jarauta-Bragulat, E., Díaz-Barrero, J.L., Mateu-Figueras, G., Pawlowsky-Glahn, V., Buccianti, A.: Elements of simplicial linear algebra and geometry. In: Compositional Data Analysis: Theory and Applications, pp. 141–157 (2011)Google Scholar
  12. Engle, M.A., Gallo, M., Schroeder, K.T., Geboy, N.J., Zupancic, J.W.: Three-way compositional analysis of water quality monitoring data. Environ. Ecol. Stat. 21(3), 565–581 (2014)CrossRefGoogle Scholar
  13. EU.: Stepping Up the Fight Against Undeclared Work, p. 628. European Commision, Bruxelles (2007)Google Scholar
  14. Faber, N.K.M., Bro, R., Hopke, P.K.: Recent developments in CANDECOMP/PARAFAC algorithms: a critical review. Chemometr. Intell. Lab. Syst. 65(1), 119–137 (2003)CrossRefGoogle Scholar
  15. Gallo, M.: Discriminant partial least squares analysis on compositional data. Stat. Model. 10(1), 41–56 (2010)CrossRefGoogle Scholar
  16. Gallo, M.: Log-ratio and parallel factor analysis: an approach to analyze three-way compositional data. In: Advanced Dynamic Modeling of Economic and Social Systems, pp. 209–221. Springer (2013).  https://doi.org/10.1285/i20705948v6n2p202
  17. Gallo, M.: Tucker3 model for compositional data. Commun. Stat. Theory Methods 44(21), 4441–4453 (2015)CrossRefGoogle Scholar
  18. Gallo, M., Simonacci, V.: A procedure for the three-mode analysis of compositions. Electron. J. Appl. Stat. Anal. 6(2), 202–210 (2013)Google Scholar
  19. Giordani, P., Kiers, H.A., Del Ferraro, M.A.: Three-way component analysis using the r package threeway. J. Stat. Softw. 57(7), 1–23 (2014)CrossRefGoogle Scholar
  20. Harshman, R.A.: Foundations of the PARAFAC procedure: models and conditions for an ‘explanatory’ multi-modal factor analysis. UCLA Working Papers Phonetics 16, 1–84. (1970). https://ci.nii.ac.jp/naid/10010162299/en/
  21. ISTAT: Note metodologiche: la misura dell’occupazione non regolare nelle stime di contabilitá nazionale. Roma. www.istat.it (2011)
  22. Kiers, H.A.: A three-step algorithm for CANDECOMP/PARAFAC analysis of large data sets with multicollinearity. J. Chemom. 12(3), 155–171 (1998)CrossRefGoogle Scholar
  23. Kiers, H.A.: Some procedures for displaying results from three-way methods. J. Chemom. 14(3), 151–170 (2000)CrossRefGoogle Scholar
  24. Kruskal, J.B.: Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Appl. 18(2), 95–138 (1977)CrossRefGoogle Scholar
  25. Kruskal, J.B.: Rank, decomposition, and uniqueness for 3-way and N-way arrays. In: Coppi, R., Bolasco, S. (eds.) Multiway Data Analysis, pp. 7–18. North-Holland Publishing Co., Amsterdam (1989)Google Scholar
  26. Mitchell, B.C., Burdick, D.S.: An empirical comparison of resolution met-hods for three-way arrays. Chemometr. Intell. Lab. Syst. 20(2), 149–161 (1993)CrossRefGoogle Scholar
  27. Mitchell, B.C., Burdick, D.S.: Slowly converging parafac sequences: Swamps and two-factor degeneracies. J. Chemom. 8(2), 155–168 (1994)CrossRefGoogle Scholar
  28. Olivieri, A.C.: Recent advances in analytical calibration with multi-way data. Anal. Methods 4(7), 1876–1886 (2012)CrossRefGoogle Scholar
  29. Pawlowsky-Glahn, V., Egozcue, J.J.: Geometric approach to statistical analysis on the simplex. Stoch. Env. Res. Risk Assess. 15(5), 384–398 (2001)CrossRefGoogle Scholar
  30. Pawlowsky-Glahn, V., Egozcue, J.J., Tolosana-Delgado, R.: Modeling and Analysis of Compositional Data. Wiley, Hoboken (2015)Google Scholar
  31. R Core Team: R: A Language and Environment for Statistical Computing. Vienna. https://www.R-project.org (2015)
  32. Sidiropoulos, N.D., Bro, R.: On the uniqueness of multilinear decomposition of N-way arrays. J. Chemom. 14(3), 229–239 (2000)CrossRefGoogle Scholar
  33. Ten Berge, J.M.F., Sidiropoulos, N.D.: On uniqueness in CANDECOMP/PARAFAC. Psychometrika 67(3), 399–409 (2002).  https://doi.org/10.1007/BF02294992 (Chicago)CrossRefGoogle Scholar
  34. Timmerman, M.E., Kiers, H.A.: Three-mode principal components analysis: choosing the numbers of components and sensitivity to local optima. Br. J. Math. Stat. Psychol. 53(1), 1–16 (2000)CrossRefGoogle Scholar
  35. Tomasi, G., Bro, R.: A comparison of algorithms for fitting the PARAFAC model. Comput. Stat. Data Anal. 50(7), 1700–1734 (2006)CrossRefGoogle Scholar
  36. Yu, Y.J., Wu, H.L., Nie, J.F., Zhang, S.R., Li, S.F., Li, Y.N., Zhu, S.H., Yu, R.Q.: A comparison of several trilinear second-order calibration algorithms. Chemometr. Intell. Lab. Syst. 106(1), 93–107 (2011)CrossRefGoogle Scholar
  37. Zhang, S.R., Wu, H.L., Yu, R.Q.: A study on the differential strategy of some iterative trilinear decomposition algorithms: PARAFAC-ALS, ATLD, SWATLD, and APTLD. J. Chemom. 29(3), 179–192 (2015)CrossRefGoogle Scholar
  38. Zijlstra, B.J., Kiers, H.A.: Degenerate solutions obtained from several variants of factor analysis. J. Chemom. 16(11), 596–605 (2002)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V., part of Springer Nature 2018

Authors and Affiliations

  • Michele Gallo
    • 1
  • Violetta Simonacci
    • 1
  • Maria Anna Di Palma
    • 1
  1. 1.University of Naples “L’Orientale” - DISUSNaplesItaly

Personalised recommendations