Advances in Data Analysis and Classification

, Volume 12, Issue 2, pp 285–313

# Clusterwise analysis for multiblock component methods

• Stéphanie Bougeard
• Hervé Abdi
• Gilbert Saporta
• Ndèye Niang
Regular Article

## Abstract

Multiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multiblock component methods that integrate the information found in several blocks of explanatory variables in order to describe and explain one set of dependent variables. In the following, multiblock PLS and multiblock redundancy analysis are chosen, as particular cases of multiblock component methods when one set of variables is explained by a set of predictor variables that is organized into blocks. Because these multiblock techniques assume that the observations come from a homogeneous population they will provide suboptimal results when the observations actually come from different populations. A strategy to palliate this problem—presented in this article—is to use a technique such as clusterwise regression in order to identify homogeneous clusters of observations. This approach creates two new methods that provide clusters that have their own sets of regression coefficients. This combination of clustering and regression improves the overall quality of the prediction and facilitates the interpretation. In addition, the minimization of a well-defined criterion—by means of a sequential algorithm—ensures that the algorithm converges monotonously. Finally, the proposed method is distribution-free and can be used when the explanatory variables outnumber the observations within clusters. The proposed clusterwise multiblock methods are illustrated with of a simulation study and a (simulated) example from marketing.

## Keywords

Multiblock component method Clusterwise regression Typological regression Cluster analysis Dimension reduction

## Mathematics Subject Classification

62H30 62H25 91C20

## Notes

### Acknowledgements

The authors are grateful to two anonymous reviewers for their valuable suggestions that greatly improved the clarity and the relevance of this article.

## References

1. Abdi H, Williams L (2012) Partial least squares methods: partial least squares correlation and partial least square regression. In: Reisfeld B, Mayeno A (eds) Methods in molecular biology: computational toxicology. Springer, New York, pp 549–579Google Scholar
2. Bock H (1969) The equivalence of two extremal problems and its application to the iterative classification of multivariate data. In: Vortragsausarbeitung, Tagung. Mathematisches Forschungsinstitut OberwolfachGoogle Scholar
3. Bougeard S, Cardinal M (2014) Multiblock modeling for complex preference study. Application to European preferences for smoked salmon. Food Qual Prefer 32:56–64
4. Bougeard S, Hanafi M, Qannari E (2007) ACPVI multibloc. Application à des données d’épidémiologie animale. Journal de la Société Française de Statistique 148:77–94Google Scholar
5. Bougeard S, Qannari E, Lupo C, Hanafi M (2011a) From multiblock partial least squares to multiblock redundancy analysis. A continuum approach. Informatica 22:11–26
6. Bougeard S, Qannari E, Rose N (2011b) Multiblock redundancy analysis: interpretation tools and application in epidemiology. J Chemom 25:467–475
7. Bry X, Verron T, Redont P, Cazes P (2012) THEME-SEER: a multidimensional exploratory technique to analyze a structural model using an extended covariance criterion. J Chemom 26:158–169
8. Charles C (1977) Régression typologique et reconnaissance des formes. PhD thesis, University of Paris IX, FranceGoogle Scholar
9. De Roover K, Ceulemans C, Timmerman M (2012) Clusterwise simultaneous component analysis for analyzing structural differences in multivariate multiblock data. Psychol Methods 17:100–119
10. DeSarbo W, Cron W (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5:249–282
11. Diday E (1976) Classification et sélection de paramètres sous contraintes. Technical report, IRIA-LABORIAGoogle Scholar
12. Dolce P, Esposito Vinzi V, Lauro C (2016) Path directions incoherence in PLS path modeling: a prediction-oriented solution. In: Abdi H, Esposito Vinzi V, Russolillo G, Saporta G, Trinchera L (eds) The multiple facets of partial least squares and related methods. Springer proceedings in mathematics & statistics. Springer, Berlin, pp 59–59Google Scholar
13. Hahn C, Johnson M, Hermann AFA (2002) Capturing customer heterogeneity using finite mixture PLS approach. Schmalenbach Bus Rev 54:243–269
14. Hubert H, Arabie P (1985) Comparing partitions. J Classif 2:193–218
15. Hwang H, Takane Y (2004) Generalized structured component analysis. Psychometrika 69:81–99
16. Hwang H, DeSarbo S, Takane Y (2007) Fuzzy clusterwise generalized structured component analysis. Psychometrika 72:181–198
17. Kissita G (2003) Les analyses canoniques généralisées avec tableau de référence généralisé : éléments théoriques et appliqués. PhD thesis, University of Paris Dauphine, FranceGoogle Scholar
18. Lohmoller J (1989) Latent variables path modeling with partial least squares. Physica-Verlag, Heidelberg
19. Martella F, Vicari D, Vichi M (2015) Partitioning predictors in multivariate regression models. Stat Comput 25:261–272
20. Preda C, Saporta G (2005) Clusterwise PLS regression on a stochastic process. Comput Stat Data Anal 49:99–108
21. Qin S, Valle S, Piovoso M (2001) On unifying multiblock analysis with application to decentralized process monitoring. J Chemom 15:715–742
22. Sarstedt M (2008) A review of recent approaches for capturing heterogeneity in partial least squares path modelling. J Model Manage 3:140–161
23. Schlittgen R, Ringle C, Sarstedt M, Becker JM (2016) Segmentation of PLS path models by iterative reweighted regressions. J Bus Res 69:4583–4592
24. Shao Q, Wu Y (2005) Consistent procedure for determining the number of clusters in regression clustering. J Stat Plan Inference 135:461–476
25. Spath H (1979) Clusterwise linear regression. Computing 22:367–373
26. Team R (2015) R: a language and environment of statistical computing. http://cran.r-project.org/
27. Tenenhaus A, Tenenhaus M (2011) Regularized generalized canonical correlation analysis. Psychometrika 76:257–284
28. Tenenhaus M (1998) La régression PLS. Technip, Paris
29. Trinchera L (2007) Unobserved heterogeneity in structural equation models: a new approach to latent class detection in PLS path modeling. PhD thesis, University of Naples Federico IIGoogle Scholar
30. Vicari D, Vichi M (2013) Multivariate linear regression for heterogeneous data. J Appl Stat 40:1209–1230
31. Vinzi V, Lauro C, Amato S (2005) PLS typological regression. In: Vichi M, Monari P, Mignani S, Montanari A (eds) New developments in classification and data analysis. Springer, Berlin, pp 133–140
32. Vinzi V, Ringle C, Squillacciotti S, Trinchera L (2007) Capturing and treating unobserved heterogeneity by response based segmentation in PLS path modeling. a comparison of alternative methods by computational experiments. Technical reports, ESSEC Business School, https://www.academia.edu/168969/Capturing_and_Treating_Unobserved_Heterogeneity_by_Response_Based_Segmentation_in_PLS_Path_Modeling._A_Comparison_of_Alternative_Methods_by_Computational_Experiments
33. Vinzi V, Trinchera L, Squillacciotti S, Tenenhaus M (2009) REBUS-PLS: a response-based procedure for detecting unit segments in pls path modeling. Appl Stochastic Models Bus Ind 24:439–458
34. Vivien M (2002) Approches PLS linéaires et non-linéaires pour la modélisation de multi-tableaux : théorie et applications. PhD thesis, University of Montpellier 1, FranceGoogle Scholar
35. Westerhuis J, Coenegracht P (1997) Multivariate modelling of the pharmaceutical two-step process of wet granulation and tableting with multiblock partial least squares. J Chemom 11:379–392
36. Westerhuis J, Smilde A (2001) Deflation in multiblock PLS. J Chemom 15:485–493
37. Westerhuis J, Kourti T, MacGregor J (1998) Analysis of multiblock and hierarchical PCA and PLS model. J Chemom 12:301–321
38. Wold H (1985) Encyclopedia of statistical sciences. In: Kotz S, Johnson N (eds) Partial least squares. Wiley, New York, pp 581–591Google Scholar
39. Wold S (1984) Three PLS algorithms according to SW. Technical reports, Umea University, SwedenGoogle Scholar
40. Wold S, Martens H, Wold H (1983) The multivariate calibration problem in chemistry solved by the PLS method. Matrix Pencils pp 286–293Google Scholar

## Authors and Affiliations

• Stéphanie Bougeard
• 1
• Hervé Abdi
• 2
• Gilbert Saporta
• 3
• Ndèye Niang
• 3
1. 1.Department of EpidemiologyAnses (French agency for food, environmental and occupational health safety)PloufraganFrance
2. 2.The University of Texas at DallasRichardsonUSA
3. 3.CEDRIC CNAMParis Cedex 03France