Abstract
The development of modern science and technology has facilitated the collection of a large amount of matrix data in fields such as biomedicine. Matrix data modeling has been extensively studied, which advances from the naive approach of flattening the matrix into a vector. However, existing matrix modeling methods mainly focus on homogeneous data, failing to handle the data heterogeneity frequently encountered in the biomedical field, where samples from the same study belong to several underlying subgroups, and different subgroups follow different models. In this paper, we focus on regression-based heterogeneity analysis. We propose a matrix data heterogeneity analysis framework, by combining matrix bilinear sparse decomposition and penalized fusion techniques, which enables data-driven subgroup detection, including determining the number of subgroups and subgrouping membership. A rigorous theoretical analysis is conducted, including asymptotic consistency in terms of subgroup detection, the number of subgroups, and regression coefficients. Numerous numerical studies based on simulated and real data have been constructed, showcasing the superior performance of the proposed method in analyzing matrix heterogeneous data.
Similar content being viewed by others
Data and code availability
Data and codes to reproduce the numerical results are posted on the GitHub page, the https://github.com/Zhang-Fengchuan/Matrix-heterogeneity-linear-regression.git, https://github.com/Zhang-Fengchuan/Matrix-heterogeneity-logistic-regression.git.
References
Amato, R., Pinelli, M., D’Andrea, D., Miele, G., Nicodemi, M., Raiconi, G., Cocozza, S.: A novel approach to simulate gene-environment interactions in complex diseases. BMC Bioinform. 11(1), 1–9 (2010)
Benjamin, E.J., Blaha, M.J., Chiuve, S.E., Cushman, M., Das, S.R., Deo, R., De Ferranti, S.D., Floyd, J., Fornage, M., Gillespie, C., et al.: Heart disease and stroke statistics-2017 update: a report from the American Heart Association. Circulation 135(10), 146–603 (2017)
Caner, M.: Generalized linear models with structured sparsity estimators. J. Econ. 236(2), 105478 (2023)
Chakraborty, R., Ostrin, L.A., Nickla, D.L., Iuvone, P.M., Pardue, M.T., Stone, R.A.: Circadian rhythms, refractive development, and myopia. Ophthalmic Physiol. Opt. 38(3), 217–245 (2018)
Clark, R., Pozarickij, A., Hysi, P.G., Ohno-Matsui, K., Williams, C., Guggenheim, J.A., Eye, U.B., Consortium, V.: Education interacts with genetic variants near GJD2, RBFOX1, LAMA2, KCNQ5 and LRRC4C to confer susceptibility to myopia. PLoS Genet. 18(11), 478 (2022)
Ding, S., Dennis Cook, R.: Matrix variate regressions and envelope models. J. R. Stat. Soc. Ser. B Stat Methodol. 80(2), 387–408 (2018)
Enthoven, C.A., Tideman, J.W.L., Polling, J.R., Tedja, M.S., Raat, H., Iglesias, A.I., Verhoeven, V.J., Klaver, C.C.: Interaction between lifestyle and genetic susceptibility in myopia: the generation R study. Eur. J. Epidemiol. 34, 777–784 (2019)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)
Fan, Q., Guo, X., Tideman, J.W.L., Williams, K.M., Yazar, S., Hosseini, S.M., Howe, L.D., Pourcain, B.S., Evans, D.M., Timpson, N.J., et al.: Childhood gene-environment interactions and age-dependent effects of genetic variants associated with refractive error and myopia: The cream consortium. Sci. Rep. 6(1), 25853 (2016)
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458), 611–631 (2002)
Guggenheim, J.A., McMahon, G., Kemp, J.P., Akhtar, S., St Pourcain, B., Northstone, K., Ring, S.M., Evans, D.M., Smith, G.D., Timpson, N.J., et al.: A genome-wide association study for corneal curvature identifies the platelet-derived growth factor receptor alpha gene as a quantitative trait locus for eye size in white europeans. Mol. Vis. 19, 243 (2013)
Hu, X., Huang, J., Liu, L., Sun, D., Zhao, X.: Subgroup analysis in the heterogeneous cox model. Stat. Med. 40(3), 739–757 (2021)
Hughes, A., Piggins, H.: Behavioral responses of VIPR2-/-mice to light. J. Biol. Rhythms 23(3), 211–219 (2008)
Hung, H., Wang, C.-C.: Matrix variate logistic regression model with application to EEG data. Biostatistics 14(1), 189–202 (2013)
Hunter, D.J.: Gene-environment interactions in human diseases. Nat. Rev. Genet. 6(4), 287–298 (2005)
Khalili, A., Chen, J.: Variable selection in finite mixture of regression models. J. Am. Stat. Assoc. 102(479), 1025–1038 (2007)
Kossaï, M., Leary, A., Scoazec, J.-Y., Genestie, C.: Ovarian cancer: a heterogeneous disease. Pathobiology 85(1–2), 41–49 (2018)
Kravitz, R.L., Duan, N., Braslow, J.: Evidence-based medicine, heterogeneity of treatment effects, and the trouble with averages. Milbank Q. 82(4), 661–687 (2004)
Li, B., Kim, M.K., Altman, N.: On dimension folding of matrix-or array-valued statistical objects. Ann. Stat. (2010)
Li, S.-M., Liu, L.-R., Li, S.-Y., Ji, Y.-Z., Fu, J., Wang, Y., Li, H., Zhu, B.-D., Yang, Z., Li, L., et al.: Design, methodology and baseline data of a school-based cohort study in central china: the Anyang childhood eye study. Ophthalmic Epidemiol. 20(6), 348–359 (2013)
Li, S.-M., Li, S.-Y., Kang, M.-T., Zhou, Y., Liu, L.-R., Li, H., Wang, Y.-P., Zhan, S.-Y., Gopinath, B., Mitchell, P., et al.: Near work related parameters and myopia in Chinese children: the Anyang childhood eye study. PLoS ONE 10(8), 0134514 (2015)
Li, S.-M., Ran, A.-R., Kang, M.-T., Yang, X., Ren, M.-Y., Wei, S.-F., Gan, J.-H., Li, L., He, X., Li, H., et al.: Effect of text messaging parents of school-aged children on outdoor time to control myopia: a randomized clinical trial. JAMA Pediatr. 176(11), 1077–1083 (2022)
Liu, L., Lin, L.: Subgroup analysis for heterogeneous additive partially linear models and its application to car sales data. Comput. Stat. Data Anal. 138, 239–259 (2019)
Liu, J., Huang, J., Zhang, Y., Lan, Q., Rothman, N., Zheng, T., Ma, S.: Identification of gene-environment interactions in cancer studies using penalization. Genomics 102(4), 189–194 (2013)
Ma, S., Huang, J.: A concave pairwise fusion approach to subgroup analysis. J. Am. Stat. Assoc. 112(517), 410–423 (2017)
Ma, S., Huang, J., Zhang, Z., Liu, M.: Exploration of heterogeneous treatment effects via concave fusion. Int. J. Biostat. 16(1), 20180026 (2019)
Mathew, D., Giles, J.R., Baxter, A.E., Oldridge, D.A., Greenplate, A.R., Wu, J.E., Alanio, C., Kuri-Cervantes, L., Pampena, M.B., D’Andrea, K., et al.: Deep immune profiling of COVID-19 patients reveals distinct immunotypes with therapeutic implications. Science 369(6508), 8511 (2020)
Morgan, I.G., Ohno-Matsui, K., Saw, S.-M.: Myopia. Lancet 379(9827), 1739–1748 (2012)
Pozarickij, A., Williams, C., Hysi, P.G., Guggenheim, J.A.: Quantile regression analysis reveals widespread evidence for gene-environment or gene–gene interactions in myopia development. Commun. Biol. 2(1), 167 (2019)
Ren, M., Zhang, Q., Zhang, S., Zhong, T., Huang, J., Ma, S.: Hierarchical cancer heterogeneity analysis based on histopathological imaging features. Biometrics 78(4), 1579–1591 (2022)
Sørensen, T.I.: Which patients may be harmed by good treatments? Lancet 348(9024), 351–352 (1996)
Stucky, B., Geer, S.: Asymptotic confidence regions for high-dimensional structured sparsity. IEEE Trans. Signal Process. 66(8), 2178–2190 (2018)
Turajlic, S., Sottoriva, A., Graham, T., Swanton, C.: Resolving genetic heterogeneity in cancer. Nat. Rev. Genet. 20(7), 404–416 (2019)
Vaart, A.W.: Asymptotic Statistics, vol. 3. Cambridge University Press, Cambridge (2000)
Wang, H., Li, B., Leng, C.: Shrinkage tuning parameter selection with a diverging number of parameters. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 71(3), 671–683 (2009)
Yang, X., Yan, X., Huang, J.: High-dimensional integrative analysis with homogeneity and sparsity recovery. J. Multivar. Anal. 174, 104529 (2019)
Yiu, W.C., Yap, M.K., Fung, W.Y., Ng, P.W., Yip, S.P.: Genetic susceptibility to refractive error: association of vasoactive intestinal peptide receptor 2 (vipr2) with high myopia in chinese. PLoS ONE 8(4), 61805 (2013)
Zadnik, K., Mutti, D.O.: Outdoor activity protects against childhood myopia-let the sun shine in. JAMA Pediatr. 173(5), 415–416 (2019)
Zhang, C.-H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. (2010)
Zhang, H., Jia, J.: Elastic-net regularized high-dimensional negative binomial regression: consistency and weak signal detection. Stat. Sin. 32, 181–207 (2022)
Zhou, H., Li, L., Zhu, H.: Tensor regression with applications in neuroimaging data analysis. J. Am. Stat. Assoc. 108(502), 540–552 (2013)
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China No. 12171454, U19B2940, Fundamental Research Funds for the Central Universities, Beijing Natural Science Foundation (JQ20029), National Key R &D Program of China (2022YFC3502502), and National Natural Science Foundation of China, No. 82071000.
Author information
Authors and Affiliations
Contributions
SZ and MR conceived the study. FZ and MR wrote the main manuscript text and supplementary information, planned and carried out the simulations, and developed the theory. SL provided the real data and guided the real data analysis. All authors reviewed the manuscript and contributed to the final manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, F., Zhang, S., Li, SM. et al. Matrix regression heterogeneity analysis. Stat Comput 34, 95 (2024). https://doi.org/10.1007/s11222-024-10401-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-024-10401-z