Clustering non-linear interactions in factor analysis


Factor analysis is a powerful tool for dimensionality reduction in multivariate studies. This study extends the factor model with non-linear interactions. The main contribution of our work is to present two approaches to cluster the non-linear interactions and thus develop new models that are not restricted to the extreme scenarios where all non-null interactions are different or all are the same. The first strategy to handle the clusters involves a finite mixture of degenerate components. The second option is specified via the Dirichlet process. A comprehensive simulation study is developed to explore the performance of the proposals. A sensitivity analysis is carried out to evaluate advantages of estimating a smoothness parameter defined in a covariance function of the Gaussian process establishing the non-linearity of the interactions. In terms of application, the methodology is illustrated with the analysis of gene expression levels related to four breast cancer data sets. The genes belonging to disjoint genome regions, with copy number alteration, are connected to the main factors and their non-linear interactions are estimated and clustered. The mutual investigation and comparison of these four breast cancer data sets is rarely found in the literature.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    Affymetrix: Statistical algorithms reference guide. Affymetrix Technical Report (2001). Accessed 3 July 2020

  2. 2.

    Carvalho, M.C., Chang, J., Lucas, J.E., Nevins, J.R., Wang, Q., West, M.: High-dimensional sparse factor modelling: applications in gene expression genomics. J. Am. Stat. Assoc. 103, 1438–1456 (2008)

    Article  Google Scholar 

  3. 3.

    Chin, K., De Vriers, S., Fridlyand, J., Spellman, P.T., Roydasgupta, R., Kuo, W.L., Lapuk, A., Neve, R.M., Qian, Z., Ryder, T., Chen, F., Feiler, H., Tokuyasu, T., Esserman, L., Albertson, D.G., Waldman, F.M., Gray, J.W.: Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell 10, 529–541 (2006)

    Article  Google Scholar 

  4. 4.

    Eddelbuettel, D.: Seamless R and C++ integration with Rcpp, vol. 64. Springer, New York (2013).

    Google Scholar 

  5. 5.

    Eddelbuettel, D., Francois, R.: Rcpp: seamless R and C++ integration. J. Stat. Softw. 40(8), 1–18 (2011). Acccessed 3 July 2020

  6. 6.

    Eddelbuettel, D., Sanderson, C.: RcppArmadillo: accelerating R with high-performance C++ linear algebra. Comput. Stat. Data Anal. 71, 1054–1063 (2014)

    MathSciNet  Article  Google Scholar 

  7. 7.

    Gamerman, D., Lopes, H.F.: Markov chain Monte Carlo: stochastic simulation for Bayesian inference, vol. 68, 2nd edn. Chapman and Hall/CRC, Boca Raton (2006)

    Google Scholar 

  8. 8.

    Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian Data Analysis. Texts in Statistical Science, 3rd edn. Chapman and Hall/CRC, Boca Raton (2013)

    Google Scholar 

  9. 9.

    Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984)

    Article  Google Scholar 

  10. 10.

    Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970)

    MathSciNet  Article  Google Scholar 

  11. 11.

    Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., Speed, T.P.: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249–264 (2003)

    Article  Google Scholar 

  12. 12.

    Ishwaran, H., James, L.: Gibbs sampling methods for stick-breaking priors. J. Am. Stat. Assoc. 96, 161–173 (2001)

    MathSciNet  Article  Google Scholar 

  13. 13.

    Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis, 6th edn. Pearson/Prenticel Hall, Upper Saddle River (2007)

    Google Scholar 

  14. 14.

    Lucas, J.E., Carvalho, C., Wang, Q., Bild, A., Nevins, J.R., West, M.: Sparse statistical modelling in gene expression genomics. In: Muller, K.D.P., Vannucci, M. (eds.) Bayesian Inference for Gene Expression and Proteomics, pp. 155–176. Cambridge University Press, Cambridge (2006)

    Google Scholar 

  15. 15.

    Lucas, J.E., Kung, H.N., Chin, J.T.: Cross-study projections of genomics biomarkers: an evaluation in cancer genomics. PLoS Comput. Biol. 6, e1000920 (2010).

    Article  Google Scholar 

  16. 16.

    Mayrink, V.D., Lucas, J.E.: Sparse latent factor model with interactions: analysis of gene expression. Ann. Appl. Stat. 7(2), 799–822 (2013)

    MathSciNet  Article  Google Scholar 

  17. 17.

    Mayrink, V.D., Lucas, J.E.: Supplement to sparse latent factor model with interations: analysis of gene expression. Ann. Appl. Stat. (2013).

    Article  MATH  Google Scholar 

  18. 18.

    Mayrink, V.D., Lucas, J.E.: Bayesian factor model for the detection of coherent patterns in gene expression data. Braz. J. Probab. Stat. 29(1), 1–33 (2015)

    MathSciNet  Article  Google Scholar 

  19. 19.

    Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equations of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953)

    Article  Google Scholar 

  20. 20.

    Miller, D.L., Smeds, J., George, J., Vega, V.B., Vergara, L., Ploner, A., Pawitan, Y., Hall, P., Klaar, S., Liu, E.T., Bergh, J.: An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc. Natl. Acad. Sci. USA 112, 13550–13555 (2005)

    Article  Google Scholar 

  21. 21.

    Pollack, J.R., Sorlie, T., Perou, C.M., Rees, C.A., Jeffrey, S.S., Lonning, P.E., Tibshirani, R., Botstein, D., Dale, A.L.B., Brown, P.O.: Microarrays analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc. Natl. Acad. Sci. USA 99(20), 12963–12968 (2002)

    Article  Google Scholar 

  22. 22.

    R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2020). Accessed 3 July 2020

  23. 23.

    Roberts, G.O., Gelman, A., Gilks, W.R.: Weak convergence and optimal scaling of random walk Metropolis algorithm. Ann. Appl. Probab. 7(1), 110–120 (1997)

    MathSciNet  Article  Google Scholar 

  24. 24.

    Rueda, O.M., Uriarte, R.D.: Flexible and accurate detection of genomic copy number changes from aCGH. PLoS Comput. Biol. 3(6), e122 (2007)

    MathSciNet  Article  Google Scholar 

  25. 25.

    Sethuraman, J.: A constructive definition of the Dirichlet process prior. Stat. Sin. 2, 639–650 (1994)

    MATH  Google Scholar 

  26. 26.

    Sotiriou, C., Wirapati, P., Loi, S., Harris, A., Fox, S., Smeds, J., Nordgren, H., Farmer, P., Praz, V., Kains, B.H., Desmedt, C., Larsimont, D., Cardoso, F., Peterse, H., Nuyten, D., Buyse, M., Vijver, M.J.V.D., Bergh, J., Piccart, M., Delorenzi, M.: Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J. Natl. Cancer Inst. 98, 262–272 (2006)

    Article  Google Scholar 

  27. 27.

    Spiegelhalter, D.J., Best, N.G., van der Linde, B.P.C.A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B 64, 583–639 (2002)

    MathSciNet  Article  Google Scholar 

  28. 28.

    Wang, Y., Klijn, J.G.M., Zhang, Y., Sieuwert, A.M., Look, M.P., Yang, F., Talantov, D., Timmermans, M., Gelder, M.E.M.V., Jatkoe, T., Berns, E.M.J.J., Atkins, D., Foekens, J.A.: Gene expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365, 671–679 (2005)

    Article  Google Scholar 

  29. 29.

    Watanabe, S.: Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 11, 3571–3594 (2010)

    MathSciNet  MATH  Google Scholar 

  30. 30.

    West, M.: Bayesian factor regression models in the large p, small n paradigm. In: Bernardo, J., Bayarri, M., Berger, J., Dawid, A., Heckerman, D., Smith, A., West, M. (eds.) Bayesian Statistics, vol. 7, pp. 723–732. Oxford University Press, Oxford (2003)

    Google Scholar 

  31. 31.

    Wu, Z., Irizarry, R.A., Gentleman, R., Murillo, F.M., Spencer, F.: A model based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc. 99, 909–917 (2004)

    MathSciNet  Article  Google Scholar 

Download references


The authors would like to thank two anonymous referees for their constructive comments leading to an improved version of this paper. The first author is also grateful to Fundação de Amparo à Pesquisa de Minas Gerais (FAPEMIG) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for supporting this research.

Author information



Corresponding author

Correspondence to Vinícius Diniz Mayrink.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix A: Full posterior conditional distributions

\((F^{*}_{r}\mid \alpha , \lambda , \sigma ^{2}, z, X) \sim N_{n}(M_{F^{*}_{r}}, V_{F^{*}_{r}})\)   where

$$\begin{aligned} V_{F^{*}_{r}}= & {} \left[ \displaystyle {\left( \sum _{i=1}^{m} \frac{z_{ir}}{\sigma ^{2}_{i}} \right) } I_{n} + K^{-1}(\lambda ,\phi ) \right] ^{-1} \; \hbox {and}\\ M_{F^{*}_{r}}= & {} V_{F^{*}_{r}} \left[ \displaystyle { \sum _{i=1}^{m} \frac{z_{ir}}{\sigma ^{2}_{i}}} \left( X^{\top }_{i\bullet } -\lambda ^{\top } \alpha ^{\top }_{i\bullet } \right) \right] . \end{aligned}$$

For the probability \(\rho ^{*}_{ir}\) in (10), consider \(Q_{0}=\mathbf {0}\) and   \(Q_{r} = \displaystyle {-\frac{1}{2\sigma ^{2}_{i}} \left[ F^{*}_{r} F^{*\top }_{r} -2F^{*}_{r} \left( X^{\top }_{i\bullet } -\lambda ^{\top } \alpha ^{\top }_{i\bullet }\right) \right] }\),   with   \(r = 0, 1, \ldots , R\).

\((\sigma ^{2}_{i} \mid \alpha _{i \bullet }, \lambda , F_{i \bullet }, X_{i \bullet }) \sim \text{ IG }(A,B)\) where \(A = a + n/2\) and

$$\begin{aligned} B=\displaystyle {\frac{1}{2}\left[ X_{i\bullet }X^{\top }_{i\bullet }-2\alpha _{i\bullet }\lambda (X^{\top }_{i\bullet }-F^{\top }_{i\bullet }) - 2F_{i\bullet }X^{\top }_{i\bullet } + F_{i\bullet }F^{\top }_{i\bullet } + \alpha _{i\bullet }\lambda \lambda ^{\top }\alpha ^{\top }_{i\bullet }\right] + b}. \end{aligned}$$

In order to update \(q_{il}^{*}\) and \(\alpha _{il}\), consider the \(N(M_{\alpha _{il}},V_{\alpha _{il}})\) such that \(V_{\alpha _{il}} = \left[ \displaystyle {\frac{1}{w} } +\displaystyle { \frac{1}{\sigma ^{2}_{i}} }\sum _{j=1}^{n}\lambda ^{2}_{lj}\right] ^{-1}\) and \(M_{\alpha _{il}} = V_{\alpha _{il}}\left[ \displaystyle {\frac{1}{\sigma ^{2}_{i}} }\sum _{j=1}^{n}\lambda _{lj}\left( X_{ij}-F_{ij}-\sum _{l^{*}\ne l}\alpha _{il^{*}}\lambda _{l^{*}j} \right) \right] \).

The construction of posterior weights via stick-breaking process takes into account:   \((\mathcal {V}_{ir} \mid F, z, \rho , \lambda , \phi ) \sim \text{ Beta }(z_{ir} + 1, \sum _{s=r+1}^{R}z_{is} + \tau )\).

The full conditional distribution of \(\phi \) is:

$$\begin{aligned} p(\phi \mid \alpha , \lambda , F, \sigma ^2, z, X)&\propto p(X \mid \alpha ,\lambda , F, \sigma ^{2})~ p(F \mid \lambda , z)~ p(\phi ) \\&\propto \left\{ \prod _{r=0}^{R}\prod _{i=1}^{m}\left[ N_{n}(F^{\top }_{i\bullet }\mid M_{F^{\top }_{i\bullet }}, V_{F^{\top }_{i\bullet }}) \right] ^{z_{ir}}\right\} p(\phi ), \end{aligned}$$

with \(p(\phi )\) being the density of the U(0.1, 0.5). We have \(M_{F_{i\bullet }^\top } = M_{F_r^*}\) and \(V_{F^{\top }_{i\bullet }} = V_{F^*_r}\) when \(F_{i\bullet }^\top = F_{r}^*\).

The full conditional distribution of \(\lambda _{\bullet j}\) is given by:

$$\begin{aligned} p(\lambda _{\bullet j} \mid \alpha , \lambda _{-\left\{ \bullet j\right\} }, F, \sigma ^2_{i}, X)&\propto p(X \mid \alpha ,\lambda , F, \sigma ^{2})~ p(F^{*}_{1},F^{*}_{2}, \ldots , F^{*}_{R} \mid \lambda , z_{i})~ p(\lambda _{\bullet j}) \\&\propto N_{L}(\lambda _{\bullet j} \mid M_{\lambda _j},V_{\lambda _j})\left| K(\lambda , \phi )\right| ^{-\sum ^{R}_{r=1} z_{ir}/2} \\&\quad \times \exp \left\{ -\frac{1}{2}\sum ^{R}_{r=1}z_{ir}F^{*}_{r}K(\lambda ,\phi )^{-1}F^{*\top }_{r}\right\} , \end{aligned}$$

where \(V_{\lambda _j}=\left[ \alpha ^{\top }D^{-1}\alpha + I_{L}\right] ^{-1}\)   and   \(M_{\lambda _j}=V_{\lambda _j}\left[ \alpha ^{\top }D^{-1}(X_{\bullet j}-F_{\bullet j})\right] \). The term \(\lambda _{-\left\{ \bullet j\right\} }\) indicates the matrix \(\lambda \) without the j-th column.

Appendix B: Short description of some goodness-of-fit measurements

Let \(\theta \) be a generic vector of unknown parameters associated to the model with likelihood \(p(Y|\theta )\). In this case, \(Y = \{Y_1, Y_2, \cdots , Y_n\}\) represents the set of observed data and n is the sample size. Supposed that an MCMC algorithm was applied to sample from the target posterior distribution \(p(\theta |Y)\). As a result, \(\theta ^{(s)}\) is the value generated in the s-th MCMC iteration after the burn-in period, for \(s = 1, \ldots , S\). Assume that \(\bar{\theta }\) is the posterior mean of \(\theta \). Three measurements, considered in this paper to compare models in terms of goodness-of-fit, are summarized as follows:

  • The DIC is a widely used criterion for model selection in the Bayesian context. According to [27] this quantity is calculated by \(2\bar{D}(\theta )-D(\bar{\theta })\), where \(\bar{D}(\theta ) = -2 \sum _{s=1}^{S} \ln [p(Y|\theta ^{(s)})]/S\) and \(D(\bar{\theta }) = -2\ln [p(Y|\bar{\theta })]\).

  • The WAIC criterion is obtained through the following difference \(\hat{\text{ lppd }} - \hat{p}_{\tiny \text{ WAIC }}\). The first term is the estimated log pointwise predictive density given by \(\hat{\text{ lppd }} = \sum _{i=1}^{n} \ln [\sum _{s=1}^{S} p(Y_i|\theta ^{(s)})/S]\). The second term is the estimated effective number of parameters obtained through the formulation \(\hat{p}_{\tiny \text{ WAIC }} = \sum _{i=1}^{n} V_{s=1}^{S}[\ln p(Y_i|\theta ^{(s)})]\), where \(V_{s=1}^{S}[a^{(s)}] = \sum _{s=1}^{S} (a^{(s)}-\bar{a})^2/(S-1)\) and \(\bar{a} = \sum _{s=1}^{S} a^{(s)}/S\). Consider [29] for more details.

  • The LPML is a model selection criterion based on the so called conditional predictive ordinate (CPO). For the i-th observation, we calculate \(\hat{\text{ CPO }}_i = S [\sum _{s=1}^{S} 1/ p(Y_i|\theta ^{(s)})]^{-1}\). The target result is given by \(\sum _{i=1}^{n} \ln \hat{\text{ CPO }}_i\). See [8] for additional details.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Amorim, E.d.C., Mayrink, V.D. Clustering non-linear interactions in factor analysis. METRON (2020).

Download citation


  • Mixture
  • Dirichlet process
  • Gene expression
  • Breast cancer
  • Microarray