Advertisement

Clusterwise analysis for multiblock component methods

  • Stéphanie Bougeard
  • Hervé Abdi
  • Gilbert Saporta
  • Ndèye Niang
Regular Article
  • 150 Downloads

Abstract

Multiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multiblock component methods that integrate the information found in several blocks of explanatory variables in order to describe and explain one set of dependent variables. In the following, multiblock PLS and multiblock redundancy analysis are chosen, as particular cases of multiblock component methods when one set of variables is explained by a set of predictor variables that is organized into blocks. Because these multiblock techniques assume that the observations come from a homogeneous population they will provide suboptimal results when the observations actually come from different populations. A strategy to palliate this problem—presented in this article—is to use a technique such as clusterwise regression in order to identify homogeneous clusters of observations. This approach creates two new methods that provide clusters that have their own sets of regression coefficients. This combination of clustering and regression improves the overall quality of the prediction and facilitates the interpretation. In addition, the minimization of a well-defined criterion—by means of a sequential algorithm—ensures that the algorithm converges monotonously. Finally, the proposed method is distribution-free and can be used when the explanatory variables outnumber the observations within clusters. The proposed clusterwise multiblock methods are illustrated with of a simulation study and a (simulated) example from marketing.

Keywords

Multiblock component method Clusterwise regression Typological regression Cluster analysis Dimension reduction 

Mathematics Subject Classification

62H30 62H25 91C20 

Notes

Acknowledgements

The authors are grateful to two anonymous reviewers for their valuable suggestions that greatly improved the clarity and the relevance of this article.

References

  1. Abdi H, Williams L (2012) Partial least squares methods: partial least squares correlation and partial least square regression. In: Reisfeld B, Mayeno A (eds) Methods in molecular biology: computational toxicology. Springer, New York, pp 549–579Google Scholar
  2. Bock H (1969) The equivalence of two extremal problems and its application to the iterative classification of multivariate data. In: Vortragsausarbeitung, Tagung. Mathematisches Forschungsinstitut OberwolfachGoogle Scholar
  3. Bougeard S, Cardinal M (2014) Multiblock modeling for complex preference study. Application to European preferences for smoked salmon. Food Qual Prefer 32:56–64CrossRefGoogle Scholar
  4. Bougeard S, Hanafi M, Qannari E (2007) ACPVI multibloc. Application à des données d’épidémiologie animale. Journal de la Société Française de Statistique 148:77–94Google Scholar
  5. Bougeard S, Qannari E, Lupo C, Hanafi M (2011a) From multiblock partial least squares to multiblock redundancy analysis. A continuum approach. Informatica 22:11–26MathSciNetzbMATHGoogle Scholar
  6. Bougeard S, Qannari E, Rose N (2011b) Multiblock redundancy analysis: interpretation tools and application in epidemiology. J Chemom 25:467–475CrossRefGoogle Scholar
  7. Bry X, Verron T, Redont P, Cazes P (2012) THEME-SEER: a multidimensional exploratory technique to analyze a structural model using an extended covariance criterion. J Chemom 26:158–169CrossRefGoogle Scholar
  8. Charles C (1977) Régression typologique et reconnaissance des formes. PhD thesis, University of Paris IX, FranceGoogle Scholar
  9. De Roover K, Ceulemans C, Timmerman M (2012) Clusterwise simultaneous component analysis for analyzing structural differences in multivariate multiblock data. Psychol Methods 17:100–119CrossRefGoogle Scholar
  10. DeSarbo W, Cron W (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5:249–282MathSciNetCrossRefzbMATHGoogle Scholar
  11. Diday E (1976) Classification et sélection de paramètres sous contraintes. Technical report, IRIA-LABORIAGoogle Scholar
  12. Dolce P, Esposito Vinzi V, Lauro C (2016) Path directions incoherence in PLS path modeling: a prediction-oriented solution. In: Abdi H, Esposito Vinzi V, Russolillo G, Saporta G, Trinchera L (eds) The multiple facets of partial least squares and related methods. Springer proceedings in mathematics & statistics. Springer, Berlin, pp 59–59Google Scholar
  13. Hahn C, Johnson M, Hermann AFA (2002) Capturing customer heterogeneity using finite mixture PLS approach. Schmalenbach Bus Rev 54:243–269CrossRefGoogle Scholar
  14. Hubert H, Arabie P (1985) Comparing partitions. J Classif 2:193–218CrossRefzbMATHGoogle Scholar
  15. Hwang H, Takane Y (2004) Generalized structured component analysis. Psychometrika 69:81–99MathSciNetCrossRefzbMATHGoogle Scholar
  16. Hwang H, DeSarbo S, Takane Y (2007) Fuzzy clusterwise generalized structured component analysis. Psychometrika 72:181–198MathSciNetCrossRefzbMATHGoogle Scholar
  17. Kissita G (2003) Les analyses canoniques généralisées avec tableau de référence généralisé : éléments théoriques et appliqués. PhD thesis, University of Paris Dauphine, FranceGoogle Scholar
  18. Lohmoller J (1989) Latent variables path modeling with partial least squares. Physica-Verlag, HeidelbergCrossRefzbMATHGoogle Scholar
  19. Martella F, Vicari D, Vichi M (2015) Partitioning predictors in multivariate regression models. Stat Comput 25:261–272MathSciNetCrossRefzbMATHGoogle Scholar
  20. Preda C, Saporta G (2005) Clusterwise PLS regression on a stochastic process. Comput Stat Data Anal 49:99–108MathSciNetCrossRefzbMATHGoogle Scholar
  21. Qin S, Valle S, Piovoso M (2001) On unifying multiblock analysis with application to decentralized process monitoring. J Chemom 15:715–742CrossRefGoogle Scholar
  22. Sarstedt M (2008) A review of recent approaches for capturing heterogeneity in partial least squares path modelling. J Model Manage 3:140–161CrossRefGoogle Scholar
  23. Schlittgen R, Ringle C, Sarstedt M, Becker JM (2016) Segmentation of PLS path models by iterative reweighted regressions. J Bus Res 69:4583–4592CrossRefGoogle Scholar
  24. Shao Q, Wu Y (2005) Consistent procedure for determining the number of clusters in regression clustering. J Stat Plan Inference 135:461–476MathSciNetCrossRefzbMATHGoogle Scholar
  25. Spath H (1979) Clusterwise linear regression. Computing 22:367–373MathSciNetCrossRefzbMATHGoogle Scholar
  26. Team R (2015) R: a language and environment of statistical computing. http://cran.r-project.org/
  27. Tenenhaus A, Tenenhaus M (2011) Regularized generalized canonical correlation analysis. Psychometrika 76:257–284MathSciNetCrossRefzbMATHGoogle Scholar
  28. Tenenhaus M (1998) La régression PLS. Technip, PariszbMATHGoogle Scholar
  29. Trinchera L (2007) Unobserved heterogeneity in structural equation models: a new approach to latent class detection in PLS path modeling. PhD thesis, University of Naples Federico IIGoogle Scholar
  30. Vicari D, Vichi M (2013) Multivariate linear regression for heterogeneous data. J Appl Stat 40:1209–1230MathSciNetCrossRefGoogle Scholar
  31. Vinzi V, Lauro C, Amato S (2005) PLS typological regression. In: Vichi M, Monari P, Mignani S, Montanari A (eds) New developments in classification and data analysis. Springer, Berlin, pp 133–140CrossRefGoogle Scholar
  32. Vinzi V, Ringle C, Squillacciotti S, Trinchera L (2007) Capturing and treating unobserved heterogeneity by response based segmentation in PLS path modeling. a comparison of alternative methods by computational experiments. Technical reports, ESSEC Business School, https://www.academia.edu/168969/Capturing_and_Treating_Unobserved_Heterogeneity_by_Response_Based_Segmentation_in_PLS_Path_Modeling._A_Comparison_of_Alternative_Methods_by_Computational_Experiments
  33. Vinzi V, Trinchera L, Squillacciotti S, Tenenhaus M (2009) REBUS-PLS: a response-based procedure for detecting unit segments in pls path modeling. Appl Stochastic Models Bus Ind 24:439–458CrossRefzbMATHGoogle Scholar
  34. Vivien M (2002) Approches PLS linéaires et non-linéaires pour la modélisation de multi-tableaux : théorie et applications. PhD thesis, University of Montpellier 1, FranceGoogle Scholar
  35. Westerhuis J, Coenegracht P (1997) Multivariate modelling of the pharmaceutical two-step process of wet granulation and tableting with multiblock partial least squares. J Chemom 11:379–392CrossRefGoogle Scholar
  36. Westerhuis J, Smilde A (2001) Deflation in multiblock PLS. J Chemom 15:485–493CrossRefGoogle Scholar
  37. Westerhuis J, Kourti T, MacGregor J (1998) Analysis of multiblock and hierarchical PCA and PLS model. J Chemom 12:301–321CrossRefGoogle Scholar
  38. Wold H (1985) Encyclopedia of statistical sciences. In: Kotz S, Johnson N (eds) Partial least squares. Wiley, New York, pp 581–591Google Scholar
  39. Wold S (1984) Three PLS algorithms according to SW. Technical reports, Umea University, SwedenGoogle Scholar
  40. Wold S, Martens H, Wold H (1983) The multivariate calibration problem in chemistry solved by the PLS method. Matrix Pencils pp 286–293Google Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  • Stéphanie Bougeard
    • 1
  • Hervé Abdi
    • 2
  • Gilbert Saporta
    • 3
  • Ndèye Niang
    • 3
  1. 1.Department of EpidemiologyAnses (French agency for food, environmental and occupational health safety)PloufraganFrance
  2. 2.The University of Texas at DallasRichardsonUSA
  3. 3.CEDRIC CNAMParis Cedex 03France

Personalised recommendations