Data analysis methods for defining biomarkers from omics data

Li, Chao; Gao, Zhenbo; Su, Benzhe; Xu, Guowang; Lin, Xiaohui

doi:10.1007/s00216-021-03813-7

Data analysis methods for defining biomarkers from omics data

Review
Published: 24 December 2021

Volume 414, pages 235–250, (2022)
Cite this article

Analytical and Bioanalytical Chemistry Aims and scope Submit manuscript

Chao Li^1,2,
Zhenbo Gao¹,
Benzhe Su¹,
Guowang Xu² &
…
Xiaohui Lin¹

1863 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

Omics mainly includes genomics, epigenomics, transcriptomics, proteomics and metabolomics. The rapid development of omics technology has opened up new ways to study disease diagnosis and prognosis and to define prospective information of complex diseases. Since omics data are usually large and complex, the method used to analyze the data and to define important information is crucial in omics study. In this review, we focus on advances in biomarker discovery methods based on omics data in the last decade, and categorize them as individual feature analysis, combinatorial feature analysis and network analysis. We also discuss the challenges and perspectives in this field.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identifying Biomarkers with Differential Analysis

Biostatistics, Data Mining and Computational Modeling

Computational Techniques and Tools for Omics Data Analysis: State-of-the-Art, Challenges, and Future Directions

Article 01 February 2021

Abbreviations

ANN:: Artificial neural network
ANOVA:: Analysis of variance
ATSD-DN:: Analyzing time-series data based on dynamic networks
AUC:: Area under the receiver operating characteristic curve
AUCTSP:: AUC-based TSP
BPCA:: Bayesian principal component analysis
CFC-CM:: Construct feature combinations and a classification model
Chi-TSG:: Chi-square statistic-based top-scoring genes
CRV:: Carcinogenesis relevance value
DCEN:: Differential co-expression network
DFS:: Deep feature selection
DiSNEP:: Disease-specific network enhancement prioritization
DN:: Differential network
DNB:: Dynamic network biomarker
DNB-HC:: Defining network biomarkers based on horizontal comparison
DNN:: Deep neural network
EMDN:: Epigenetic module based on differential networks
ERGS:: Effective range-based gene selection
GA:: Genetic algorithm
GEDFN:: Graph-embedded deep feedforward networks
GGM:: Gaussian graphical modeling
GNFS:: Gene-network-based feature set
GO:: Gene ontology
GSNFS:: Gene subnetwork-based feature selection
HCC:: Hepatocellular carcinoma
HFS-SLPEE:: Hierarchical feature selection and second learning probability error ensemble model
IFSER:: Improved feature selection based on effective range
IG:: Information gain
ImRml:: Information maximization and redundancy minimization through feature interaction
INDEED:: Integrated differential expression and differential network analysis
ISFLA:: Improved shuffled frog leaping algorithm
kNN:: k-Nearest neighbors
kNN-TN:: kNN truncation
k-TSP:: k Top-scoring pairs
LASSO:: Least absolute shrinkage and selection operator
LC-k-TSP:: Linear combination of k top-scoring pairs
l-DNB:: Landscape dynamic network biomarker
LOD:: Limit of detection
LOPC:: Low-order partial correlation
MI:: Mutual information
MIC:: Maximal information coefficient
MIMAGA:: Hybrid feature selection algorithm based on mutual information maximization and the adaptive genetic algorithm
missForest:: Nonparametric missing value imputation using random forest
MPeMR:: Minimum projection error minimum redundancy
N-CSI:: Network-based metabolic feature selection method based on combinational significance index
ND:: Network diffusion
NFSM:: Network-based feature selection method
NS-kNN:: No-skip kNN
PB-DSN:: Potential biomarkers based on differential subnetworks
PCA:: Principal component analysis
PermFIT:: Permutation-based feature importance test
PLS-DA:: Partial-least-squares discriminant analysis
PNN:: Probabilistic neural network
PPI:: Protein–protein interaction
QC:: Quality control
RNGCS:: Reduced number of genes for combination selection
SDAE:: Stacked denoising autoencoder
SE1DCNN:: Sample expansion-based one-dimensional convolutional neural network
SESAE:: Sample expansion-based stacked autoencoder
SFLA:: Shuffled frog leaping algorithm
SR:: SpectralRank
SU-HAS:: Symmetrical uncertainty filter and harmony search algorithm wrapper
SVD:: Singular value decomposition
SVM:: Support vector machine
SVM-RFE:: Support vector machine-recursive feature elimination
T2DM:: Type 2 diabetes mellitus
TSN:: Top-scoring ‘N’
TSP:: Top-scoring pair
TST:: Top-scoring triplet
UGFS:: Unsupervised graph-based feature selection
VH-k-TSP:: Vertical and horizontal k-TSP

References

Chen L, Wu J. Systems biology for complex diseases. J Mol Cell Biol. 2012;4(3):125–6. https://doi.org/10.1093/jmcb/mjs022.
Article PubMed Google Scholar
Fu WJ, Stromberg AJ, Viele K, Carroll RJ, Wu G. Statistics and bioinformatics in nutritional sciences: analysis of complex data in the era of systems biology. J Nutr Biochem. 2010;21(7):561–72. https://doi.org/10.1016/j.jnutbio.2009.11.007.
Article CAS PubMed PubMed Central Google Scholar
Kim EY, Lee JW, Lee MY, Kim SH, Mok HJ, Ha K, et al. Serum lipidomic analysis for the discovery of biomarkers for major depressive disorder in drug-free patients. Psychiatry Res. 2018;265:174–82. https://doi.org/10.1016/j.psychres.2018.04.029.
Article CAS PubMed Google Scholar
Fatai AA, Gamieldien J. A 35-gene signature discriminates between rapidly- and slowly-progressing glioblastoma multiforme and predicts survival in known subtypes of the cancer. BMC Cancer. 2018;18(1):1–13. https://doi.org/10.1186/s12885-018-4103-5.
Article CAS Google Scholar
Usai MG, Goddard ME, Hayes BJ. LASSO with cross-validation for genomic selection. Genet Res. 2009;91(6):427–36. https://doi.org/10.1017/S0016672309990334.
Article CAS Google Scholar
Geman D, d'Avignon C, Naiman DQ, Winslow RL. Classifying gene expression profiles from pairwise mRNA comparisons. Stat Appl Genet Mol Biol. 2004;3:Article19. https://doi.org/10.2202/1544-6115.1071
Luo P, Yin P, Hua R, Tan Y, Li Z, Qiu G, et al. A Large-scale, multicenter serum metabolite biomarker identification study for the early detection of hepatocellular carcinoma. Hepatology. 2018;67(2):662–75. https://doi.org/10.1002/hep.29561.
Article CAS PubMed Google Scholar
Yang B, Li M, Tang W, Liu W, Zhang S, Chen L, et al. Dynamic network biomarker indicates pulmonary metastasis at the tipping point of hepatocellular carcinoma. Nat Commun. 2018;9(1):678. https://doi.org/10.1038/s41467-018-03024-2.
Article CAS PubMed PubMed Central Google Scholar
Zuo Y, Cui Y, Di Poto C, Varghese RS, Yu G, Li R, et al. INDEED: Integrated differential expression and differential network analysis of omic data for biomarker discovery. Methods. 2016;111:12–20. https://doi.org/10.1016/j.ymeth.2016.08.015.
Article CAS PubMed PubMed Central Google Scholar
Chen YL, Zhang Y, Wang J, Chen N, Fang W, Zhong J, et al. A 17 gene panel for non-small-cell lung cancer prognosis identified through integrative epigenomic-transcriptomic analyses of hypoxia-induced epithelial-mesenchymal transition. Mol Oncol. 2019;13(7):1490–502. https://doi.org/10.1002/1878-0261.12491.
Article CAS PubMed PubMed Central Google Scholar
Ein-Dor L, Zuk O, Domany E. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci USA. 2006;103(15):5923–8. https://doi.org/10.1073/pnas.0601231103.
Article CAS PubMed PubMed Central Google Scholar
Ward PS, Thompson CB. Metabolic reprogramming: a cancer hallmark even warburg did not anticipate. Cancer Cell. 2012;21(3):297–308. https://doi.org/10.1016/j.ccr.2012.02.014.
Article CAS PubMed PubMed Central Google Scholar
Beloribi-Djefaflia S, Vasseur S, Guillaumond F. Lipid metabolic reprogramming in cancer cells. Oncogenesis. 2016;5: e189. https://doi.org/10.1038/oncsis.2015.49.
Article CAS PubMed PubMed Central Google Scholar
Lee JY, Styczynski MP. NS-kNN: a modified k-nearest neighbors approach for imputing metabolomics data. Metabolomics. 2018;14(12):153. https://doi.org/10.1007/s11306-018-1451-8.
Article CAS PubMed PubMed Central Google Scholar
Moorthy K, Mohamad MS, Deris S. A review on missing value imputation algorithms for microarray gene expression data. Curr Bioinform. 2014;9:18–22. https://doi.org/10.2174/1574893608999140109120957.
Article CAS Google Scholar
Gromski PS, Xu Y, Kotze HL, Correa E, Ellis DI, Armitage EG, et al. Influence of missing values substitutes on multivariate analysis of metabolomics data. Metabolites. 2014;4(2):433–52. https://doi.org/10.3390/metabo4020433.
Article CAS PubMed PubMed Central Google Scholar
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17:520–5. https://doi.org/10.1093/bioinformatics/17.6.520.
Article CAS PubMed Google Scholar
Shah JS, Rai SN, DeFilippis AP, Hill BG, Bhatnagar A, Brock GN. Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies. BMC Bioinformatics. 2017;18(1):114. https://doi.org/10.1186/s12859-017-1547-6.
Article CAS PubMed PubMed Central Google Scholar
Stekhoven DJ, Buhlmann P. MissForest–non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28(1):112–8. https://doi.org/10.1093/bioinformatics/btr597.
Article CAS PubMed Google Scholar
Nishanth KJ, Ravi V. Probabilistic neural network based categorical data imputation. Neurocomputing. 2016;218:17–25. https://doi.org/10.1016/j.neucom.2016.08.044.
Article Google Scholar
Gromski PS, Xu Y, Hollywood KA, Turner ML, Goodacre R. The influence of scaling metabolomics data on model classification accuracy. Metabolomics. 2014;11(3):684–95. https://doi.org/10.1007/s11306-014-0738-7.
Article CAS Google Scholar
van den Berg RA, Hoefsloot HC, Westerhuis JA, Smilde AK, van der Werf MJ. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics. 2006;7:142. https://doi.org/10.1186/1471-2164-7-142.
Article CAS PubMed PubMed Central Google Scholar
Keun HC, Ebbels TMD, Antti H, Bollard ME, Beckonert O, Holmes E, et al. Improved analysis of multivariate data by variable stability scaling: application to NMR-based metabolic profiling. Anal Chem Acta. 2003;490(1–2):265–76. https://doi.org/10.1016/S0003-2670(03)00094-1.
Article CAS Google Scholar
Luo P, Yin P, Zhang W, Zhou L, Lu X, Lin X, et al. Optimization of large-scale pseudotargeted metabolomics method based on liquid chromatography-mass spectrometry. J Chromatogr A. 2016;1437:127–36. https://doi.org/10.1016/j.chroma.2016.01.078.
Article CAS PubMed Google Scholar
Zhao Y, Hao Z, Zhao C, Zhao J, Zhang J, Li Y, et al. A novel strategy for large-scale metabolomics study by calibrating gross and systematic errors in gas chromatography-mass spectrometry. Anal Chem. 2016;88(4):2234–42. https://doi.org/10.1021/acs.analchem.5b0391.
Article CAS PubMed Google Scholar
Thonusin C, IglayReger HB, Soni T, Rothberg AE, Burant CF, Evans CR. Evaluation of intensity drift correction strategies using MetaboDrift, a normalization tool for multi-batch metabolomics data. J Chromatogr A. 2017;1523:265–74. https://doi.org/10.1016/j.chroma.2017.09.023.
Article CAS PubMed PubMed Central Google Scholar
Ferreira AJ, Figueiredo MAT. Efficient feature selection filters for high-dimensional data. Pattern Recogn Lett. 2012;33(13):1794–804. https://doi.org/10.1016/j.patrec.2012.05.019.
Article Google Scholar
Liu R, Wang X, Aihara K, Chen L. Early diagnosis of complex diseases by molecular biomarkers, network biomarkers, and dynamical network biomarkers. Med Res Rev. 2014;34(3):455–78. https://doi.org/10.1002/med.21293.
Article PubMed Google Scholar
Mathys H, Davila-Velderrain J, Peng Z, Gao F, Mohammadi S, Young JZ, et al. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature. 2019;570(7761):332–7. https://doi.org/10.1038/s41586-019-1195-2.
Article CAS PubMed PubMed Central Google Scholar
Mi X, Zou B, Zou F, Hu J. Permutation-based identification of important biomarkers for complex diseases via machine learning models. Nat Commun. 2021;12(1):3008. https://doi.org/10.1038/s41467-021-22756-2.
Article PubMed PubMed Central Google Scholar
Chandra B, Gupta M. An efficient statistical feature selection approach for classification of gene expression data. J Biomed Inform. 2011;44(4):529–35. https://doi.org/10.1016/j.jbi.2011.01.001.
Article CAS PubMed Google Scholar
Wang J, Zhou S, Yi Y, Kong J. An improved feature selection based on effective range for classification. ScientificWorldJournal. 2014;2014: 972125. https://doi.org/10.1155/2014/972125.
Article PubMed PubMed Central Google Scholar
Laing EE, Moller-Levet CS, Dijk DJ, Archer SN. Identifying and validating blood mRNA biomarkers for acute and chronic insufficient sleep in humans: a machine learning approach. Sleep. 2019;42(1). https://doi.org/10.1093/sleep/zsy186
Li Y, Chen C-Y, Wasserman WW, editors. Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters. International Conference on Research in Computational Molecular Biology; 2015; Cham: Springer International Publishing.
Kohavi R, John GH. Wrappers for feature subset selection. Artif Intell. 1997;97:273–324. https://doi.org/10.1016/S0004-3702(97)00043-X.
Article Google Scholar
Lv J, Peng Q, Chen X, Sun Z. A multi-objective heuristic algorithm for gene expression microarray data classification. Expert Syst Appl. 2016;59:13–9. https://doi.org/10.1016/j.eswa.2016.04.020.
Article Google Scholar
Hu B, Dai Y, Su Y, Moore P, Zhang X, Mao C, et al. Feature selection for optimized high-dimensional biomedical data using an improved shuffled frog leaping algorithm. IEEE/ACM Trans Comput Biol Bioinform. 2018;15(6):1765–73. https://doi.org/10.1109/TCBB.2016.2602263.
Article PubMed Google Scholar
Lu H, Chen J, Yan K, Jin Q, Xue Y, Gao Z. A hybrid feature selection algorithm for gene expression data classification. Neurocomputing. 2017;256:56–62. https://doi.org/10.1016/j.neucom.2016.07.080.
Article Google Scholar
Qiao Y, Xiong Y, Gao H, Zhu X, Chen P. Protein-protein interface hot spots prediction based on a hybrid feature selection strategy. BMC Bioinformatics. 2018;19(1):14. https://doi.org/10.1186/s12859-018-2009-5.
Article CAS PubMed PubMed Central Google Scholar
Shreem SS, Abdullah S, Nazri MZA. Hybrid feature selection algorithm using symmetrical uncertainty and a harmony search algorithm. Int J Syst Sci. 2014;47(6):1312–29. https://doi.org/10.1080/00207721.2014.924600.
Article Google Scholar
Civelek M, Lusis AJ. Systems genetics approaches to understand complex traits. Nat Rev Genet. 2014;15(1):34–48. https://doi.org/10.1038/nrg3575.
Article CAS PubMed Google Scholar
Chopra P, Lee J, Kang J, Lee S. Improving cancer classification accuracy using gene pairs. PLoS ONE. 2010;5(12): e14305. https://doi.org/10.1371/journal.pone.0014305.
Article CAS PubMed PubMed Central Google Scholar
Huang X, Zeng J, Zhou L, Hu C, Yin P, Lin X. A new strategy for analyzing time-series data using dynamic networks: identifying prospective biomarkers of hepatocellular carcinoma. Sci Rep. 2016;6:32448. https://doi.org/10.1038/srep32448.
Article CAS PubMed PubMed Central Google Scholar
Netzer M, Weinberger KM, Handler M, Seger M, Fang X, Kugler KG, et al. Profiling the human response to physical exercise: a computational strategy for the identification and kinetic analysis of metabolic biomarkers. J Clin Bioinform. 2011;1(1):34. https://doi.org/10.1186/2043-9113-1-34.
Article CAS Google Scholar
Xing P, Chen Y, Gao J, Bai L, Yuan Z. A fast approach to detect gene-gene synergy. Sci Rep. 2017;7(1):16437. https://doi.org/10.1038/s41598-017-16748-w.
Article CAS PubMed PubMed Central Google Scholar
Chen Y, Cao D, Gao J, Yuan Z. Discovering pair-wise synergies in microarray data. Sci Rep. 2016;6:30672. https://doi.org/10.1038/srep30672.
Article CAS PubMed PubMed Central Google Scholar
Sreevani Murthy CA, Chanda B. Generation of compound features based on feature interaction for classification. Exp Syst Appl. 2018;108:61–73. https://doi.org/10.1016/j.eswa.2018.04.033.
Article Google Scholar
Murthy CA. Bridging feature selection and extraction: compound feature generation. IEEE Trans Knowl Data Eng. 2017;29(4):757–70. https://doi.org/10.1109/tkde.2016.2619712.
Article Google Scholar
Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D. Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics. 2005;21(20):3896–904. https://doi.org/10.1093/bioinformatics/bti631.
Article CAS PubMed Google Scholar
Lin X, Afsari B, Marchionni L, Cope L, Parmigiani G, Naiman D, et al. The ordering of expression among a few genes can provide simple cancer biomarkers and signal BRCA1 mutations. BMC Bioinformatics. 2009;10:256. https://doi.org/10.1186/1471-2105-10-256.
Article CAS PubMed PubMed Central Google Scholar
Magis AT, Price ND. The top-scoring ‘N’ algorithm: a generalized relative expression classification method from small numbers of biomolecules. BMC Bioinformatics. 2012;13:227. https://doi.org/10.1186/1471-2105-13-227.
Article PubMed PubMed Central Google Scholar
Kagaris D, Khamesipour A, Yiannoutsos CT. AUCTSP: an improved biomarker gene pair class predictor. BMC Bioinformatics. 2018;19(1):244. https://doi.org/10.1186/s12859-018-2231-1.
Article CAS PubMed PubMed Central Google Scholar
Khamesipour A, Kagaris D. Speeding up the discovery of combinations of differentially expressed genes for disease prediction and classification. Comput Methods Programs Biomed. 2019;170:69–80. https://doi.org/10.1016/j.cmpb.2019.01.004.
Article PubMed Google Scholar
Wang H, Zhang H, Dai Z, Chen M, Yuan Z. TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection. BMC Med Genomics. 2013;6:S3. https://doi.org/10.1186/1755-8794-6-S1-S3.
Article PubMed PubMed Central Google Scholar
Huang X, Lin X, Zhou L, Su B. Analyzing omics data by pair-wise feature evaluation with horizontal and vertical comparisons. J Pharm Biomed Anal. 2018;157:20–6. https://doi.org/10.1016/j.jpba.2018.04.052.
Article CAS PubMed Google Scholar
Lin X, Zhang Y, Li C, Wang J, Luo P, Zhou H. A new data analysis method based on feature linear combination. J Biomed Inform. 2019;94: 103173. https://doi.org/10.1016/j.jbi.2019.103173.
Article PubMed Google Scholar
Chen F, Xue J, Zhou L, Wu S, Chen Z. Identification of serum biomarkers of hepatocarcinoma through liquid chromatography/mass spectrometry-based metabonomic method. Anal Bioanal Chem. 2011;401(6):1899–904. https://doi.org/10.1007/s00216-011-5245-3.
Article CAS PubMed PubMed Central Google Scholar
Andersen AH, Rayens WS, Liu Y, Smith CD. Partial least squares for discrimination in fMRI data. Magn Reson Imaging. 2012;30(3):446–52. https://doi.org/10.1016/j.mri.2011.11.001.
Article PubMed PubMed Central Google Scholar
Lin X, Huang X, Zhou L, Ren W, Zeng J, Yao W, et al. The robust classification model based on combinatorial features. IEEE/ACM Trans Comput Biol Bioinform. 2019;16(2):650–7. https://doi.org/10.1109/TCBB.2017.2779512.
Article PubMed Google Scholar
Ochs MF, Farrar JE, Considine M, Wei Y, Meshinchi S, Arceci RJ. Outlier analysis and top scoring pair for integrated data analysis and biomarker discovery. IEEE/ACM Trans Comput Biol Bioinform. 2014;11(3):520–32. https://doi.org/10.1109/TCBB.2013.153.
Article PubMed PubMed Central Google Scholar
Hu JX, Thomas CE, Brunak S. Network biology concepts in complex disease comorbidities. Nat Rev Genet. 2016;17(10):615–29. https://doi.org/10.1038/nrg.2016.87.
Article CAS PubMed Google Scholar
Barabasi AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011;12(1):56–68. https://doi.org/10.1038/nrg2918.
Article CAS PubMed PubMed Central Google Scholar
Jin G, Zhou X, Wang H, Zhao H, Cui K, Zhang XS, et al. The knowledge-integrated network biomarkers discovery for major adverse cardiac events. J Proteome Res. 2008;7:4013–21. https://doi.org/10.1021/pr8002886.
Article CAS PubMed PubMed Central Google Scholar
Miryala SK, Anbarasu A, Ramaiah S. Discerning molecular interactions: a comprehensive review on biomolecular interaction databases and network analysis tools. Gene. 2018;642:84–94. https://doi.org/10.1016/j.gene.2017.11.028.
Article CAS PubMed Google Scholar
Oughtred R, Stark C, Breitkreutz BJ, Rust J, Boucher L, Chang C, et al. The BioGRID interaction database: 2019 update. Nucleic Acids Res. 2019;47(D1):D529–41. https://doi.org/10.1093/nar/gky1079.
Article CAS PubMed Google Scholar
Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021;49(D1):D605–12. https://doi.org/10.1093/nar/gkaa1074.
Article CAS PubMed Google Scholar
Mardinoglu A, Agren R, Kampf C, Asplund A, Uhlen M, Nielsen J. Genome-scale metabolic modelling of hepatocytes reveals serine deficiency in patients with non-alcoholic fatty liver disease. Nat Commun. 2014;5:3083. https://doi.org/10.1038/ncomms4083.
Article CAS PubMed Google Scholar
Jahagirdar S, Saccenti E. On the Use of Correlation and MI as a Measure of Metabolite-Metabolite Association for Network Differential Connectivity Analysis. Metabolites. 2020;10(4). https://doi.org/10.3390/metabo10040171
Singh AJ, Ramsey SA, Filtz TM, Kioussi C. Differential gene regulatory networks in development and disease. Cell Mol Life Sci. 2018;75(6):1013–25. https://doi.org/10.1007/s00018-017-2679-6.
Article CAS PubMed Google Scholar
Chen L, Liu R, Liu ZP, Li M, Aihara K. Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers. Sci Rep. 2012;2:342. https://doi.org/10.1038/srep00342.
Article CAS PubMed PubMed Central Google Scholar
Liu X, Chang X, Leng S, Tang H, Aihara K, Chen L. Detection for disease tipping points by landscape dynamic network biomarkers. Natl Sci Rev. 2019;6(4):775–85. https://doi.org/10.1093/nsr/nwy162.
Article CAS PubMed Google Scholar
Li M, Zeng T, Liu R, Chen L. Detecting tissue-specific early warning signals for complex diseases based on dynamical network biomarkers: study of type 2 diabetes by cross-tissue analysis. Brief Bioinform. 2014;15(2):229–43. https://doi.org/10.1093/bib/bbt027.
Article CAS PubMed Google Scholar
Liu X, Liu ZP, Zhao XM, Chen L. Identifying disease genes and module biomarkers by differential interactions. J Am Med Inform Assoc. 2012;19(2):241–8. https://doi.org/10.1136/amiajnl-2011-000658.
Article PubMed Google Scholar
Lui TW, Tsui NB, Chan LW, Wong CS, Siu PM, Yung BY. DECODE: an integrated differential co-expression and differential expression analysis of gene expression data. BMC Bioinformatics. 2015;16:182. https://doi.org/10.1186/s12859-015-0582-4.
Article PubMed PubMed Central Google Scholar
Krumsiek J, Suhre K, Illig T, Adamski J, Theis FJ. Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data. BMC Syst Biol. 2011;5(1):21. https://doi.org/10.1186/1752-0509-5-21.
Article CAS PubMed PubMed Central Google Scholar
Zuo Y, Yu G, Tadesse MG, Ressom HW. Biological network inference using low order partial correlation. Methods. 2014;69(3):266–73. https://doi.org/10.1016/j.ymeth.2014.06.010.
Article CAS PubMed PubMed Central Google Scholar
Ideker T, Krogan NJ. Differential network biology. Mol Syst Biol. 2012;8:565. https://doi.org/10.1038/msb.2011.99.
Article PubMed PubMed Central Google Scholar
Savino A, Provero P, Poli V. Differential co-expression analyses allow the identification of critical signalling pathways altered during tumour transformation and progression. Int J Mol Sci. 2020;21(24). https://doi.org/10.3390/ijms21249461
Hsu CL, Juan HF, Huang HC. Functional analysis and characterization of differential coexpression networks. Sci Rep. 2015;5:13295. https://doi.org/10.1038/srep13295.
Article CAS PubMed PubMed Central Google Scholar
Siska C, Bowler R, Kechris K. The discordant method: a novel approach for differential correlation. Bioinformatics. 2016;32(5):690–6. https://doi.org/10.1093/bioinformatics/btv633.
Article CAS PubMed Google Scholar
Huang X, Lin X, Zeng J, Wang L, Yin P, Zhou L, et al. A computational method of defining potential biomarkers based on differential sub-networks. Sci Rep. 2017;7(1):14339. https://doi.org/10.1038/s41598-017-14682-5.
Article CAS PubMed PubMed Central Google Scholar
Su B, Luo P, Yang Z, Yu P, Li Z, Yin P, et al. A novel analysis method for biomarker identification based on horizontal relationship: identifying potential biomarkers from large-scale hepatocellular carcinoma metabolomics data. Anal Bioanal Chem. 2019;411(24):6377–86. https://doi.org/10.1007/s00216-019-02011-w.
Article CAS PubMed Google Scholar
Wang Q, Su B, Dong L, Jiang T, Tan Y, Lu X, et al. Liquid chromatography-mass spectrometry-based nontargeted metabolomics predicts prognosis of hepatocellular carcinoma after curative resection. J Proteome Res. 2020;19(8):3533–41. https://doi.org/10.1021/acs.jproteome.0c00344.
Article CAS PubMed Google Scholar
Fang C, Su B, Jiang T, Li C, Tan Y, Wang Q, et al. Prognosis prediction of hepatocellular carcinoma after surgical resection based on serum metabolic profiling from gas chromatography-mass spectrometry. Anal Bioanal Chem. 2021;413(12):3153–65. https://doi.org/10.1007/s00216-021-03281-z.
Article CAS PubMed Google Scholar
Wang YC, Chen BS. A network-based biomarker approach for molecular investigation and diagnosis of lung cancer. BMC Med Genomics. 2011;4(1):2. https://doi.org/10.1186/1755-8794-4-2.
Article PubMed PubMed Central Google Scholar
Allahyar A, Ubels J, de Ridder J. A data-driven interactome of synergistic genes improves network-based cancer outcome prediction. PLoS Comput Biol. 2019;15(2): e1006657. https://doi.org/10.1371/journal.pcbi.1006657.
Article CAS PubMed PubMed Central Google Scholar
Ruan P, Wang S. DiSNEP: a Disease-Specific gene Network Enhancement to improve Prioritizing candidate disease genes. Brief Bioinform. 2021;22(4). https://doi.org/10.1093/bib/bbaa241
Koutrouli M, Karatzas E, Paez-Espino D, Pavlopoulos GA. A guide to conquer the biological network era using graph theory. Front Bioeng Biotechnol. 2020;8:34. https://doi.org/10.3389/fbioe.2020.00034.
Article PubMed PubMed Central Google Scholar
Wang C, Chen L, Yang Y, Zhang M, Wong G. Identification of bladder cancer prognostic biomarkers using an ageing gene-related competitive endogenous RNA network. Oncotarget. 2017;8:111742–53. https://doi.org/10.18632/oncotarget.22905.
Article PubMed PubMed Central Google Scholar
Bernier M, Croteau E, Castellano CA, Cunnane SC, Whittingstall K. Spatial distribution of resting-state BOLD regional homogeneity as a predictor of brain glucose uptake: a study in healthy aging. Neuroimage. 2017;150:14–22. https://doi.org/10.1016/j.neuroimage.2017.01.055.
Article CAS PubMed Google Scholar
Cai S, Huang K, Kang Y, Jiang Y, von Deneen KM, Huang L. Potential biomarkers for distinguishing people with Alzheimer’s disease from cognitively intact elderly based on the rich-club hierarchical structure of white matter networks. Neurosci Res. 2019;144:56–66. https://doi.org/10.1016/j.neures.2018.07.005.
Article PubMed Google Scholar
Li S, Chen X, Liu X, Yu Y, Pan H, Haak R, et al. Complex integrated analysis of lncRNAs-miRNAs-mRNAs in oral squamous cell carcinoma. Oral Oncol. 2017;73:1–9. https://doi.org/10.1016/j.oraloncology.2017.07.026.
Article CAS PubMed Google Scholar
Henni K, Mezghani N, Gouin-Vallerand C. Unsupervised graph-based feature selection via subspace and pagerank centrality. Expert Syst Appl. 2018;114:46–53. https://doi.org/10.1016/j.eswa.2018.07.029.
Article Google Scholar
Ahmed H, Howton TC, Sun Y, Weinberger N, Belkhadir Y, Mukhtar MS. Network biology discovers pathogen contact points in host protein-protein interactomes. Nat Commun. 2018;9(1):2312. https://doi.org/10.1038/s41467-018-04632-8.
Article CAS PubMed PubMed Central Google Scholar
Wei B, Liu J, Wei D, Gao C, Deng Y. Weighted k-shell decomposition for complex networks based on potential edge weights. Physica A. 2015;420:277–83. https://doi.org/10.1016/j.physa.2014.11.012.
Article Google Scholar
Xu S, Wang P, Zhang CX, Lu J. Spectral learning algorithm reveals propagation capability of complex networks. IEEE Trans Cybern. 2019;49(12):4253–61. https://doi.org/10.1109/TCYB.2018.2861568.
Article PubMed Google Scholar
Di Nanni N, Gnocchi M, Moscatelli M, Milanesi L, Mosca E. Gene relevance based on multiple evidences in complex networks. Bioinformatics. 2020;36(3):865–71. https://doi.org/10.1093/bioinformatics/btz652.
Article CAS PubMed Google Scholar
Ning Z, Feng C, Song C, Liu W, Shang D, Li M, et al. Topologically inferring active miRNA-mediated subpathways toward precise cancer classification by directed random walk. Mol Oncol. 2019;13(10):2211–26. https://doi.org/10.1002/1878-0261.12563.
Article CAS PubMed PubMed Central Google Scholar
Isik Z, Ercan ME. Integration of RNA-Seq and RPPA data for survival time prediction in cancer patients. Comput Biol Med. 2017;89:397–404. https://doi.org/10.1016/j.compbiomed.2017.08.028.
Article CAS PubMed Google Scholar
Wei PJ, Wu FX, Xia J, Su Y, Wang J, Zheng CH. Prioritizing cancer genes based on an improved random walk method. Front Genet. 2020;11:377. https://doi.org/10.3389/fgene.2020.00377.
Article CAS PubMed PubMed Central Google Scholar
Doungpan N, Engchuan W, Meechai A, Fong S, Chan JH. Gene-Network-Based Feature Set (GNFS) for expression-based cancer classification. Journal of Medical Imaging and Health Informatics. 2016;6(4):1093–101. https://doi.org/10.1166/jmihi.2016.1806.
Article Google Scholar
Doungpan N, Engchuan W, Chan JH, Meechai A. GSNFS: Gene subnetwork biomarker identification of lung cancer expression data. BMC Med Genomics. 2016;9(Suppl 3):70. https://doi.org/10.1186/s12920-016-0231-4.
Article CAS PubMed PubMed Central Google Scholar
Ma X, Liu Z, Zhang Z, Huang X, Tang W. Multiple network algorithm for epigenetic modules via the integration of genome-wide DNA methylation and gene expression data. BMC Bioinformatics. 2017;18(1):72. https://doi.org/10.1186/s12859-017-1490-6.
Article CAS PubMed PubMed Central Google Scholar
Liu ZP, Gao R. Detecting pathway biomarkers of diabetic progression with differential entropy. J Biomed Inform. 2018;82:143–53. https://doi.org/10.1016/j.jbi.2018.05.006.
Article PubMed Google Scholar
Al-Harazi O, Al Insaif S, Al-Ajlan MA, Kaya N, Dzimiri N, Colak D. Integrated genomic and network-based analyses of complex diseases and human disease network. J Genet Genomics. 2016;43(6):349–67. https://doi.org/10.1016/j.jgg.2015.11.002.
Article PubMed Google Scholar
Sajjadi SJ, Qian X, Zeng B, Adl AA. Network-based methods to identify highly discriminating subsets of biomarkers. IEEE/ACM Trans Comput Biol Bioinform. 2014;11(6):1029–37. https://doi.org/10.1109/TCBB.2014.2325014.
Article PubMed Google Scholar
Zhang X, Gao L, Liu ZP, Chen L. Identifying module biomarker in type 2 diabetes mellitus by discriminative area of functional activity. BMC Bioinformatics. 2015;16:92. https://doi.org/10.1186/s12859-015-0519-y.
Article CAS PubMed PubMed Central Google Scholar
Kori M, Gov E, Arga KY. Novel genomic biomarker candidates for cervical cancer as identified by differential co-expression network analysis. OMICS: A Journal of Integrative Biology. 2019;23(5):261–73. https://doi.org/10.1089/omi.2019.0025.
Article CAS PubMed Google Scholar
Monaco A, Pantaleo E, Amoroso N, Bellantuono L, Lombardi A, Tateo A, et al. Identifying potential gene biomarkers for Parkinson’s disease through an information entropy based approach. Phys Biol. 2020;18(1):016003. https://doi.org/10.1088/1478-3975/abc09a.
Article CAS PubMed Google Scholar
Das J, Gayvert KM, Bunea F, Wegkamp MH, Yu H. ENCAPP: elastic-net-based prognosis prediction and biomarker discovery for human cancers. BMC Genomics. 2015;16:263. https://doi.org/10.1186/s12864-015-1465-9.
Article CAS PubMed PubMed Central Google Scholar
Date Y, Kikuchi J. Application of a deep neural network to metabolomics studies and its performance in determining important variables. Anal Chem. 2018;90(3):1805–10. https://doi.org/10.1021/acs.analchem.7b03795.
Article CAS PubMed Google Scholar
Danaee P, Ghaeini R, Hendrix DA. A deep learning approach for cancer detection and relevant gene identification. Biocomputing 2017: WORLD SCIENTIFIC; 2016. p. 219–229. https://doi.org/10.1142/9789813207813_0022
Schulte-Sasse R, Budach S, Hnisz D, Marsico A, editors. Graph Convolutional Networks Improve the Prediction of Cancer Driver Genes. Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions; 2019 2019//; Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-30493-5_60
Liu J, Wang X, Cheng Y, Zhang L. Tumor gene expression data classification via sample expansion-based deep learning. Oncotarget. 2017;8:109646–60.
Article PubMed PubMed Central Google Scholar
Kong Y, Yu T. A graph-embedded deep feedforward network for disease outcome classification and feature selection using gene expression data. Bioinformatics. 2018;34(21):3727–37. https://doi.org/10.1093/bioinformatics/bty429.
Article CAS PubMed PubMed Central Google Scholar
Meng Y, Jin M. HFS-SLPEE: A novel hierarchical feature selection and second learning probability error ensemble model for precision cancer diagnosis. Front Cell Dev Biol. 2021;9:696359. https://doi.org/10.3389/fcell.2021.696359.
Article PubMed PubMed Central Google Scholar
Shi Z, Wen B, Gao Q, Zhang B. Feature selection methods for protein biomarker discovery from proteomics or multiomics data. Mol Cell Proteomics. 2021;20:100083. https://doi.org/10.1016/j.mcpro.2021.100083.
Article CAS PubMed PubMed Central Google Scholar
Kassaporn D, Thomas S, Jutarop P, Puangrat Y, Raynoo T, Anchalee T, et al. Discovery and qualification of serum protein biomarker candidates for cholangiocarcinoma diagnosis. J Proteome Res. 2019;18(9):3305–16. https://doi.org/10.1021/acs.jproteome.9b00242.
Article CAS Google Scholar

Download references

Funding

This study is supported by the Fundamental Research Funds for the Central Universities (DUT21YG115) and the foundation (No. 21876169) from the National Natural Science Foundation of China.

Author information

Authors and Affiliations

School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
Chao Li, Zhenbo Gao, Benzhe Su & Xiaohui Lin
CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China
Chao Li & Guowang Xu

Authors

Chao Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhenbo Gao
View author publications
You can also search for this author in PubMed Google Scholar
Benzhe Su
View author publications
You can also search for this author in PubMed Google Scholar
Guowang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaohui Lin.

Ethics declarations

Ethics approval

Not applicable.

Source of biological material

Not applicable.

Statement on animal welfare

Not applicable.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Published in the topical collection celebrating ABCs 20th Anniversary. Chao Li and Zhenbo Gao contributed equally to this paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, C., Gao, Z., Su, B. et al. Data analysis methods for defining biomarkers from omics data. Anal Bioanal Chem 414, 235–250 (2022). https://doi.org/10.1007/s00216-021-03813-7

Download citation

Received: 15 September 2021
Revised: 26 November 2021
Accepted: 29 November 2021
Published: 24 December 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s00216-021-03813-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data analysis methods for defining biomarkers from omics data