Advertisement

Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

Regression and subgroup detection for heterogeneous samples

  • 3 Accesses

Abstract

Regression analysis of heterogeneous samples with subgroup structure is essential to the development of precision medicine. In practice, this task is often challenging owing to the lack of prior knowledge of subgroup labels. Therefore, detecting the subgroups with similar characteristics becomes critical, which often controls the accuracy of regression analysis. In this article, we investigate a new framework for detecting the subgroups that have similar characters in feature space and similar treatment effects. The key idea is that we incorporate K-means clustering into the regression framework of concave pairwise fusion, so that the regression and subgroup detection tasks can be performed simultaneously. Our method is specifically tailored for handling the situations where the sample is not homogeneous in the sense that the response variables in different domains of feature space are generated through different mechanisms.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

References

  1. Eckstein J (2012) Augmented Lagrangian and alternating direction methods for convex optimization: a tutorial and some illustrative computational results. In: RUTCOR research report RRR 32-2012, Rutgers University, pp 1–34

  2. El-Banna M (2017) Modified Mahalanobis Taguchi system for imbalance data classification. Comput Intell Neurosc 2017:5874896–15

  3. Everitt BS, Landau S, Leese M (2001) Cluster analysis, 4th edn. Arnold, London

  4. Fan J, Li R (2001) Variable selection via non-concave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

  5. Fortin M, Glowinski R (1983) On decomposition-coordination methods using an augmented Lagrangian. In: Fortin M, Glowinski R (eds) Augmented Lagrangian methods: applications to the solution of boundary-value problems. North-Holland, Amsterdam

  6. Huang H (2017) Regression in heterogeneous problems. Statistica Sinica 27(1):71–88

  7. Hartigan JA (1975) Clustering algorithms. Wiley, New York

  8. Hastie T, Tibshirani R, Friedman J (2016) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, Berlin, pp 459–463

  9. Huber PJ (1981) Robust statistics. Wiley, New York, pp 153–164

  10. Kumar P, Kanaujia SK, Singh A, Pradhan A (2019) In vivo detection of oral precancer using a fluorescence-based, in-house-fabricated device: a Mahalanobis distance-based classification. Lasers Med Sci 34(6):1243–1251

  11. Ma S, Huang J (2017) A concave pairwise fusion approach to subgroup analysis. J Am Stat Assoc 112(517):410–423

  12. Martino A, Ghiglietti A, Ieva F, Paganoni AM (2019) A k-means procedure based on a Mahalanobis type distance for clustering multivariate functional data. Stat Methods Appl 28(2):301–322

  13. Meier L, van de Geer S, Bühlmann P (2008) The group Lasso for logistic regression. J R Stat Soc Ser B (Stat Methodol) 70(1):53–71

  14. Morgan KL, Rubin DB (2015) Rerandomization to balance tiers of covariates. J Am Stat Assoc 110(512):1412–1421

  15. Nikpay S, Freedman S, Levy H, Buchmueller T (2017) Effect of the affordable care act medicaid expansion on emergency department visits: evidence from state-level emergency department databases. Ann Emerg Med 70(2):215–225.e6

  16. Sorensen T (1996) Which patients may be harmed by good treatments? Lancet 384:351–352

  17. Shen J, He X (2015) Inference for subgroup analysis with a structured logistic-normal mixture model. J Am Stat Assoc 110(509):303–312

  18. Tehan H, Witteveen K, Tolan GA, Tehan G, Senior GJ (2018) Using mahalanobis distance to evaluate recovery in acute stroke. Arch Clin Neuropsychol 33(5):577–582

  19. Wang H, Li R, Tsai CL (2007) Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94(3):553–568

  20. Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942

  21. Zhang Y, Wang HJ, Zhu Z (2019) Robust subgroup identification. Stat Sin 29(4):1873–1889

  22. Zhao L, Tian L, Cai T, Claggett B, Wei LJ (2013) Effectively selecting a target population for a future comparative study. J Am Stat Assoc 108(502):527–539

Download references

Acknowledgements

The authors thank AE and two anonymous reviewers for their helpful comments and valuable suggestions on earlier versions of this article. The authors also thank professor Shujie Ma for her constructive comments on our work during the meeting at LICAS 2019. This research was supported by the Fundamental Research Funds for the Central Universities, Beijing Natural Science Foundation (No. 1204031), and the National Natural Science Foundation of China (No. 11901013).

Author information

Correspondence to Yanping Qiu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 217 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liang, B., Wu, P., Tong, X. et al. Regression and subgroup detection for heterogeneous samples. Comput Stat (2020). https://doi.org/10.1007/s00180-020-00965-5

Download citation

Keywords

  • Concave fusion
  • Heterogeneous problem
  • K-means clustering
  • Regression
  • Subgroup detection