Abstract
In this paper, we investigate a non-parametric approach to compare two groups in microarray data. This is done using a threshold penalized-distance likelihood function, which is made up of a penalty and a suitable threshold distance, and is applicable when sample size is small or when the data is not normally distributed. We also use this function to classify new data. This is based on objects that are identified as differences between the two groups, not for all objects. We also study a real data application to illustrate our methods.
Similar content being viewed by others
Data availability
We used Efron’s microarray prostate data (singh2002 data set available in “sda” library in R software).
References
Bayati, M., Ghoreishi, S. K., & Wu, J. (2021). Bayesian analysis of restricted penalized empirical likelihood. Computational Statistics, 36(2), 1321–39.
Benjamini, Yoav, Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 289–300.
Bhattacharya, Anirban, Pati, Debdeep, Pillai, Natesh S., & Dunson, David B. (2015). Dirichlet–Laplace priors for optimal shrinkage. Journal of the American Statistical Association, 110(512), 1479–1490.
Campbell, M. J., & Shantikumar, S. (2016). Parametric and non-parametric tests for comparing two or more groups. HealthKnowledge. Viitattu, 2, 2020.
Churchill, G. A. (2004). Using ANOVA to analyze microarray data. Biotechniques, 37(2), 173–7.
Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model. Statistical Science, 23, 1–22.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of statistics, 32(2), 407–499.
Fay, Michael P., & Proschan, Michael A. (2010). Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. Statistics Surveys, 4, 1–39.
Gao, L., Wang, J., Zhao, Y., Liu, J., Cai, D., Zhang, X., et al. (2021). Identification of sulforaphane regulatory network in hepatocytes by microarray data analysis based on GEO database. Bioscience Reports, 41(2), 26.
Ghoreishi, S.K, Ghoreishi, G. S., & Jingjing, W. (2022). Penalized-distance likelihood functions in sparse and non-sparse high-dimensional. Journal of Statistical Theory and practice (To appear).
Johnstone, I. M., & Silverman, B. W. (2004). Needles and straw in haystacks: Empirical bayes estimates of possibly sparse sequences. The Annals of Statistics., 32(4), 1594–1649.
Kumar, M., Rath, N. K., Swain, A., & Rath, S. K. (2015). Feature selection and classification of microarray data using MapReduce based ANOVA and K-nearest neighbor. Procedia Computer Science, 1, 54.
Nueda, M. J., Conesa, A., Westerhuis, J. A., Hoefsloot, H. C., Smilde, A. K., Talón, M., & Ferrer, A. (2007). Discovering gene expression patterns in time course microarray experiments by ANOVA-SCA. Bioinformatics, 23(14), 1792–800.
Stretch, C., Khan, S., Asgarian, N., Eisner, R., Vaisipour, S., Damaraju, S., et al. (2013). Effects of sample size on differential gene expression, rank order and prediction accuracy of a gene signature. PLoS One,8(6), e65380.
Tarca, A. L., Romero, R., & Draghici, S. (2006). Analysis of microarray experiments of gene expression profiling. American Journal of Obstetrics and Gynecology,195(2), 373–88.
Tinker, A. V., Boussioutas, A., & Bowtell, D. D. (2006). The challenges of gene expression microarrays for the study of human cancer. Cancer Cell,9(5), 333–339.
Zhao, Y. Y., & Lin, J. G. (2019). Estimation and test of jump discontinuities in varying coefficient models with empirical applications. Computational Statistics & Data Analysis,139, 145–63.
Zhao, Y. Y., Lin, J. G., Huang, X. F., & Wang, H. X. (2016). Adaptive jump-preserving estimates in varying-coefficient models. Journal of Multivariate Analysis,149, 65–80.
Funding
There was no fund in this work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Ethical conduct
In this work, we have used a well-known dataset to apply our methodology, so we have no ethical conduct.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ghoreishi, S.K., Wu, J. & Ghoreishi, G.S. Non-parametric comparison and classification of two large-scale populations. J. Korean Stat. Soc. 52, 234–247 (2023). https://doi.org/10.1007/s42952-022-00198-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-022-00198-w