# Clusterwise analysis for multiblock component methods

- 129 Downloads

## Abstract

Multiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multiblock component methods that integrate the information found in several blocks of explanatory variables in order to describe and explain one set of dependent variables. In the following, multiblock PLS and multiblock redundancy analysis are chosen, as particular cases of multiblock component methods when one set of variables is explained by a set of predictor variables that is organized into blocks. Because these multiblock techniques assume that the observations come from a homogeneous population they will provide suboptimal results when the observations actually come from different populations. A strategy to palliate this problem—presented in this article—is to use a technique such as clusterwise regression in order to identify homogeneous clusters of observations. This approach creates two new methods that provide clusters that have their own sets of regression coefficients. This combination of clustering and regression improves the overall quality of the prediction and facilitates the interpretation. In addition, the minimization of a well-defined criterion—by means of a sequential algorithm—ensures that the algorithm converges monotonously. Finally, the proposed method is distribution-free and can be used when the explanatory variables outnumber the observations within clusters. The proposed clusterwise multiblock methods are illustrated with of a simulation study and a (simulated) example from marketing.

## Keywords

Multiblock component method Clusterwise regression Typological regression Cluster analysis Dimension reduction## Mathematics Subject Classification

62H30 62H25 91C20## Notes

### Acknowledgements

The authors are grateful to two anonymous reviewers for their valuable suggestions that greatly improved the clarity and the relevance of this article.

## References

- Abdi H, Williams L (2012) Partial least squares methods: partial least squares correlation and partial least square regression. In: Reisfeld B, Mayeno A (eds) Methods in molecular biology: computational toxicology. Springer, New York, pp 549–579Google Scholar
- Bock H (1969) The equivalence of two extremal problems and its application to the iterative classification of multivariate data. In: Vortragsausarbeitung, Tagung. Mathematisches Forschungsinstitut OberwolfachGoogle Scholar
- Bougeard S, Cardinal M (2014) Multiblock modeling for complex preference study. Application to European preferences for smoked salmon. Food Qual Prefer 32:56–64CrossRefGoogle Scholar
- Bougeard S, Hanafi M, Qannari E (2007) ACPVI multibloc. Application à des données d’épidémiologie animale. Journal de la Société Française de Statistique 148:77–94Google Scholar
- Bougeard S, Qannari E, Lupo C, Hanafi M (2011a) From multiblock partial least squares to multiblock redundancy analysis. A continuum approach. Informatica 22:11–26MathSciNetzbMATHGoogle Scholar
- Bougeard S, Qannari E, Rose N (2011b) Multiblock redundancy analysis: interpretation tools and application in epidemiology. J Chemom 25:467–475CrossRefGoogle Scholar
- Bry X, Verron T, Redont P, Cazes P (2012) THEME-SEER: a multidimensional exploratory technique to analyze a structural model using an extended covariance criterion. J Chemom 26:158–169CrossRefGoogle Scholar
- Charles C (1977) Régression typologique et reconnaissance des formes. PhD thesis, University of Paris IX, FranceGoogle Scholar
- De Roover K, Ceulemans C, Timmerman M (2012) Clusterwise simultaneous component analysis for analyzing structural differences in multivariate multiblock data. Psychol Methods 17:100–119CrossRefGoogle Scholar
- DeSarbo W, Cron W (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5:249–282MathSciNetCrossRefzbMATHGoogle Scholar
- Diday E (1976) Classification et sélection de paramètres sous contraintes. Technical report, IRIA-LABORIAGoogle Scholar
- Dolce P, Esposito Vinzi V, Lauro C (2016) Path directions incoherence in PLS path modeling: a prediction-oriented solution. In: Abdi H, Esposito Vinzi V, Russolillo G, Saporta G, Trinchera L (eds) The multiple facets of partial least squares and related methods. Springer proceedings in mathematics & statistics. Springer, Berlin, pp 59–59Google Scholar
- Hahn C, Johnson M, Hermann AFA (2002) Capturing customer heterogeneity using finite mixture PLS approach. Schmalenbach Bus Rev 54:243–269CrossRefGoogle Scholar
- Hubert H, Arabie P (1985) Comparing partitions. J Classif 2:193–218CrossRefzbMATHGoogle Scholar
- Hwang H, Takane Y (2004) Generalized structured component analysis. Psychometrika 69:81–99MathSciNetCrossRefzbMATHGoogle Scholar
- Hwang H, DeSarbo S, Takane Y (2007) Fuzzy clusterwise generalized structured component analysis. Psychometrika 72:181–198MathSciNetCrossRefzbMATHGoogle Scholar
- Kissita G (2003) Les analyses canoniques généralisées avec tableau de référence généralisé : éléments théoriques et appliqués. PhD thesis, University of Paris Dauphine, FranceGoogle Scholar
- Lohmoller J (1989) Latent variables path modeling with partial least squares. Physica-Verlag, HeidelbergCrossRefzbMATHGoogle Scholar
- Martella F, Vicari D, Vichi M (2015) Partitioning predictors in multivariate regression models. Stat Comput 25:261–272MathSciNetCrossRefzbMATHGoogle Scholar
- Preda C, Saporta G (2005) Clusterwise PLS regression on a stochastic process. Comput Stat Data Anal 49:99–108MathSciNetCrossRefzbMATHGoogle Scholar
- Qin S, Valle S, Piovoso M (2001) On unifying multiblock analysis with application to decentralized process monitoring. J Chemom 15:715–742CrossRefGoogle Scholar
- Sarstedt M (2008) A review of recent approaches for capturing heterogeneity in partial least squares path modelling. J Model Manage 3:140–161CrossRefGoogle Scholar
- Schlittgen R, Ringle C, Sarstedt M, Becker JM (2016) Segmentation of PLS path models by iterative reweighted regressions. J Bus Res 69:4583–4592CrossRefGoogle Scholar
- Shao Q, Wu Y (2005) Consistent procedure for determining the number of clusters in regression clustering. J Stat Plan Inference 135:461–476MathSciNetCrossRefzbMATHGoogle Scholar
- Spath H (1979) Clusterwise linear regression. Computing 22:367–373MathSciNetCrossRefzbMATHGoogle Scholar
- Team R (2015) R: a language and environment of statistical computing. http://cran.r-project.org/
- Tenenhaus A, Tenenhaus M (2011) Regularized generalized canonical correlation analysis. Psychometrika 76:257–284MathSciNetCrossRefzbMATHGoogle Scholar
- Tenenhaus M (1998) La régression PLS. Technip, PariszbMATHGoogle Scholar
- Trinchera L (2007) Unobserved heterogeneity in structural equation models: a new approach to latent class detection in PLS path modeling. PhD thesis, University of Naples Federico IIGoogle Scholar
- Vicari D, Vichi M (2013) Multivariate linear regression for heterogeneous data. J Appl Stat 40:1209–1230MathSciNetCrossRefGoogle Scholar
- Vinzi V, Lauro C, Amato S (2005) PLS typological regression. In: Vichi M, Monari P, Mignani S, Montanari A (eds) New developments in classification and data analysis. Springer, Berlin, pp 133–140CrossRefGoogle Scholar
- Vinzi V, Ringle C, Squillacciotti S, Trinchera L (2007) Capturing and treating unobserved heterogeneity by response based segmentation in PLS path modeling. a comparison of alternative methods by computational experiments. Technical reports, ESSEC Business School, https://www.academia.edu/168969/Capturing_and_Treating_Unobserved_Heterogeneity_by_Response_Based_Segmentation_in_PLS_Path_Modeling._A_Comparison_of_Alternative_Methods_by_Computational_Experiments
- Vinzi V, Trinchera L, Squillacciotti S, Tenenhaus M (2009) REBUS-PLS: a response-based procedure for detecting unit segments in pls path modeling. Appl Stochastic Models Bus Ind 24:439–458CrossRefzbMATHGoogle Scholar
- Vivien M (2002) Approches PLS linéaires et non-linéaires pour la modélisation de multi-tableaux : théorie et applications. PhD thesis, University of Montpellier 1, FranceGoogle Scholar
- Westerhuis J, Coenegracht P (1997) Multivariate modelling of the pharmaceutical two-step process of wet granulation and tableting with multiblock partial least squares. J Chemom 11:379–392CrossRefGoogle Scholar
- Westerhuis J, Smilde A (2001) Deflation in multiblock PLS. J Chemom 15:485–493CrossRefGoogle Scholar
- Westerhuis J, Kourti T, MacGregor J (1998) Analysis of multiblock and hierarchical PCA and PLS model. J Chemom 12:301–321CrossRefGoogle Scholar
- Wold H (1985) Encyclopedia of statistical sciences. In: Kotz S, Johnson N (eds) Partial least squares. Wiley, New York, pp 581–591Google Scholar
- Wold S (1984) Three PLS algorithms according to SW. Technical reports, Umea University, SwedenGoogle Scholar
- Wold S, Martens H, Wold H (1983) The multivariate calibration problem in chemistry solved by the PLS method. Matrix Pencils pp 286–293Google Scholar