Skip to main content

Advertisement

Log in

Data analysis methods for defining biomarkers from omics data

  • Review
  • Published:
Analytical and Bioanalytical Chemistry Aims and scope Submit manuscript

Abstract

Omics mainly includes genomics, epigenomics, transcriptomics, proteomics and metabolomics. The rapid development of omics technology has opened up new ways to study disease diagnosis and prognosis and to define prospective information of complex diseases. Since omics data are usually large and complex, the method used to analyze the data and to define important information is crucial in omics study. In this review, we focus on advances in biomarker discovery methods based on omics data in the last decade, and categorize them as individual feature analysis, combinatorial feature analysis and network analysis. We also discuss the challenges and perspectives in this field.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Abbreviations

ANN:

Artificial neural network

ANOVA:

Analysis of variance

ATSD-DN:

Analyzing time-series data based on dynamic networks

AUC:

Area under the receiver operating characteristic curve

AUCTSP:

AUC-based TSP

BPCA:

Bayesian principal component analysis

CFC-CM:

Construct feature combinations and a classification model

Chi-TSG:

Chi-square statistic-based top-scoring genes

CRV:

Carcinogenesis relevance value

DCEN:

Differential co-expression network

DFS:

Deep feature selection

DiSNEP:

Disease-specific network enhancement prioritization

DN:

Differential network

DNB:

Dynamic network biomarker

DNB-HC:

Defining network biomarkers based on horizontal comparison

DNN:

Deep neural network

EMDN:

Epigenetic module based on differential networks

ERGS:

Effective range-based gene selection

GA:

Genetic algorithm

GEDFN:

Graph-embedded deep feedforward networks

GGM:

Gaussian graphical modeling

GNFS:

Gene-network-based feature set

GO:

Gene ontology

GSNFS:

Gene subnetwork-based feature selection

HCC:

Hepatocellular carcinoma

HFS-SLPEE:

Hierarchical feature selection and second learning probability error ensemble model

IFSER:

Improved feature selection based on effective range

IG:

Information gain

ImRml:

Information maximization and redundancy minimization through feature interaction

INDEED:

Integrated differential expression and differential network analysis

ISFLA:

Improved shuffled frog leaping algorithm

kNN:

k-Nearest neighbors

kNN-TN:

kNN truncation

k-TSP:

k Top-scoring pairs

LASSO:

Least absolute shrinkage and selection operator

LC-k-TSP:

Linear combination of k top-scoring pairs

l-DNB:

Landscape dynamic network biomarker

LOD:

Limit of detection

LOPC:

Low-order partial correlation

MI:

Mutual information

MIC:

Maximal information coefficient

MIMAGA:

Hybrid feature selection algorithm based on mutual information maximization and the adaptive genetic algorithm

missForest:

Nonparametric missing value imputation using random forest

MPeMR:

Minimum projection error minimum redundancy

N-CSI:

Network-based metabolic feature selection method based on combinational significance index

ND:

Network diffusion

NFSM:

Network-based feature selection method

NS-kNN:

No-skip kNN

PB-DSN:

Potential biomarkers based on differential subnetworks

PCA:

Principal component analysis

PermFIT:

Permutation-based feature importance test

PLS-DA:

Partial-least-squares discriminant analysis

PNN:

Probabilistic neural network

PPI:

Protein–protein interaction

QC:

Quality control

RNGCS:

Reduced number of genes for combination selection

SDAE:

Stacked denoising autoencoder

SE1DCNN:

Sample expansion-based one-dimensional convolutional neural network

SESAE:

Sample expansion-based stacked autoencoder

SFLA:

Shuffled frog leaping algorithm

SR:

SpectralRank

SU-HAS:

Symmetrical uncertainty filter and harmony search algorithm wrapper

SVD:

Singular value decomposition

SVM:

Support vector machine

SVM-RFE:

Support vector machine-recursive feature elimination

T2DM:

Type 2 diabetes mellitus

TSN:

Top-scoring ‘N’

TSP:

Top-scoring pair

TST:

Top-scoring triplet

UGFS:

Unsupervised graph-based feature selection

VH-k-TSP:

Vertical and horizontal k-TSP

References

  1. Chen L, Wu J. Systems biology for complex diseases. J Mol Cell Biol. 2012;4(3):125–6. https://doi.org/10.1093/jmcb/mjs022.

    Article  PubMed  Google Scholar 

  2. Fu WJ, Stromberg AJ, Viele K, Carroll RJ, Wu G. Statistics and bioinformatics in nutritional sciences: analysis of complex data in the era of systems biology. J Nutr Biochem. 2010;21(7):561–72. https://doi.org/10.1016/j.jnutbio.2009.11.007.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Kim EY, Lee JW, Lee MY, Kim SH, Mok HJ, Ha K, et al. Serum lipidomic analysis for the discovery of biomarkers for major depressive disorder in drug-free patients. Psychiatry Res. 2018;265:174–82. https://doi.org/10.1016/j.psychres.2018.04.029.

    Article  CAS  PubMed  Google Scholar 

  4. Fatai AA, Gamieldien J. A 35-gene signature discriminates between rapidly- and slowly-progressing glioblastoma multiforme and predicts survival in known subtypes of the cancer. BMC Cancer. 2018;18(1):1–13. https://doi.org/10.1186/s12885-018-4103-5.

    Article  CAS  Google Scholar 

  5. Usai MG, Goddard ME, Hayes BJ. LASSO with cross-validation for genomic selection. Genet Res. 2009;91(6):427–36. https://doi.org/10.1017/S0016672309990334.

    Article  CAS  Google Scholar 

  6. Geman D, d'Avignon C, Naiman DQ, Winslow RL. Classifying gene expression profiles from pairwise mRNA comparisons. Stat Appl Genet Mol Biol. 2004;3:Article19. https://doi.org/10.2202/1544-6115.1071

  7. Luo P, Yin P, Hua R, Tan Y, Li Z, Qiu G, et al. A Large-scale, multicenter serum metabolite biomarker identification study for the early detection of hepatocellular carcinoma. Hepatology. 2018;67(2):662–75. https://doi.org/10.1002/hep.29561.

    Article  CAS  PubMed  Google Scholar 

  8. Yang B, Li M, Tang W, Liu W, Zhang S, Chen L, et al. Dynamic network biomarker indicates pulmonary metastasis at the tipping point of hepatocellular carcinoma. Nat Commun. 2018;9(1):678. https://doi.org/10.1038/s41467-018-03024-2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Zuo Y, Cui Y, Di Poto C, Varghese RS, Yu G, Li R, et al. INDEED: Integrated differential expression and differential network analysis of omic data for biomarker discovery. Methods. 2016;111:12–20. https://doi.org/10.1016/j.ymeth.2016.08.015.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Chen YL, Zhang Y, Wang J, Chen N, Fang W, Zhong J, et al. A 17 gene panel for non-small-cell lung cancer prognosis identified through integrative epigenomic-transcriptomic analyses of hypoxia-induced epithelial-mesenchymal transition. Mol Oncol. 2019;13(7):1490–502. https://doi.org/10.1002/1878-0261.12491.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Ein-Dor L, Zuk O, Domany E. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci USA. 2006;103(15):5923–8. https://doi.org/10.1073/pnas.0601231103.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Ward PS, Thompson CB. Metabolic reprogramming: a cancer hallmark even warburg did not anticipate. Cancer Cell. 2012;21(3):297–308. https://doi.org/10.1016/j.ccr.2012.02.014.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Beloribi-Djefaflia S, Vasseur S, Guillaumond F. Lipid metabolic reprogramming in cancer cells. Oncogenesis. 2016;5: e189. https://doi.org/10.1038/oncsis.2015.49.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Lee JY, Styczynski MP. NS-kNN: a modified k-nearest neighbors approach for imputing metabolomics data. Metabolomics. 2018;14(12):153. https://doi.org/10.1007/s11306-018-1451-8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Moorthy K, Mohamad MS, Deris S. A review on missing value imputation algorithms for microarray gene expression data. Curr Bioinform. 2014;9:18–22. https://doi.org/10.2174/1574893608999140109120957.

    Article  CAS  Google Scholar 

  16. Gromski PS, Xu Y, Kotze HL, Correa E, Ellis DI, Armitage EG, et al. Influence of missing values substitutes on multivariate analysis of metabolomics data. Metabolites. 2014;4(2):433–52. https://doi.org/10.3390/metabo4020433.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17:520–5. https://doi.org/10.1093/bioinformatics/17.6.520.

    Article  CAS  PubMed  Google Scholar 

  18. Shah JS, Rai SN, DeFilippis AP, Hill BG, Bhatnagar A, Brock GN. Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies. BMC Bioinformatics. 2017;18(1):114. https://doi.org/10.1186/s12859-017-1547-6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Stekhoven DJ, Buhlmann P. MissForest–non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28(1):112–8. https://doi.org/10.1093/bioinformatics/btr597.

    Article  CAS  PubMed  Google Scholar 

  20. Nishanth KJ, Ravi V. Probabilistic neural network based categorical data imputation. Neurocomputing. 2016;218:17–25. https://doi.org/10.1016/j.neucom.2016.08.044.

    Article  Google Scholar 

  21. Gromski PS, Xu Y, Hollywood KA, Turner ML, Goodacre R. The influence of scaling metabolomics data on model classification accuracy. Metabolomics. 2014;11(3):684–95. https://doi.org/10.1007/s11306-014-0738-7.

    Article  CAS  Google Scholar 

  22. van den Berg RA, Hoefsloot HC, Westerhuis JA, Smilde AK, van der Werf MJ. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics. 2006;7:142. https://doi.org/10.1186/1471-2164-7-142.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Keun HC, Ebbels TMD, Antti H, Bollard ME, Beckonert O, Holmes E, et al. Improved analysis of multivariate data by variable stability scaling: application to NMR-based metabolic profiling. Anal Chem Acta. 2003;490(1–2):265–76. https://doi.org/10.1016/S0003-2670(03)00094-1.

    Article  CAS  Google Scholar 

  24. Luo P, Yin P, Zhang W, Zhou L, Lu X, Lin X, et al. Optimization of large-scale pseudotargeted metabolomics method based on liquid chromatography-mass spectrometry. J Chromatogr A. 2016;1437:127–36. https://doi.org/10.1016/j.chroma.2016.01.078.

    Article  CAS  PubMed  Google Scholar 

  25. Zhao Y, Hao Z, Zhao C, Zhao J, Zhang J, Li Y, et al. A novel strategy for large-scale metabolomics study by calibrating gross and systematic errors in gas chromatography-mass spectrometry. Anal Chem. 2016;88(4):2234–42. https://doi.org/10.1021/acs.analchem.5b0391.

    Article  CAS  PubMed  Google Scholar 

  26. Thonusin C, IglayReger HB, Soni T, Rothberg AE, Burant CF, Evans CR. Evaluation of intensity drift correction strategies using MetaboDrift, a normalization tool for multi-batch metabolomics data. J Chromatogr A. 2017;1523:265–74. https://doi.org/10.1016/j.chroma.2017.09.023.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Ferreira AJ, Figueiredo MAT. Efficient feature selection filters for high-dimensional data. Pattern Recogn Lett. 2012;33(13):1794–804. https://doi.org/10.1016/j.patrec.2012.05.019.

    Article  Google Scholar 

  28. Liu R, Wang X, Aihara K, Chen L. Early diagnosis of complex diseases by molecular biomarkers, network biomarkers, and dynamical network biomarkers. Med Res Rev. 2014;34(3):455–78. https://doi.org/10.1002/med.21293.

    Article  PubMed  Google Scholar 

  29. Mathys H, Davila-Velderrain J, Peng Z, Gao F, Mohammadi S, Young JZ, et al. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature. 2019;570(7761):332–7. https://doi.org/10.1038/s41586-019-1195-2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Mi X, Zou B, Zou F, Hu J. Permutation-based identification of important biomarkers for complex diseases via machine learning models. Nat Commun. 2021;12(1):3008. https://doi.org/10.1038/s41467-021-22756-2.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Chandra B, Gupta M. An efficient statistical feature selection approach for classification of gene expression data. J Biomed Inform. 2011;44(4):529–35. https://doi.org/10.1016/j.jbi.2011.01.001.

    Article  CAS  PubMed  Google Scholar 

  32. Wang J, Zhou S, Yi Y, Kong J. An improved feature selection based on effective range for classification. ScientificWorldJournal. 2014;2014: 972125. https://doi.org/10.1155/2014/972125.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Laing EE, Moller-Levet CS, Dijk DJ, Archer SN. Identifying and validating blood mRNA biomarkers for acute and chronic insufficient sleep in humans: a machine learning approach. Sleep. 2019;42(1). https://doi.org/10.1093/sleep/zsy186

  34. Li Y, Chen C-Y, Wasserman WW, editors. Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters. International Conference on Research in Computational Molecular Biology; 2015; Cham: Springer International Publishing.

  35. Kohavi R, John GH. Wrappers for feature subset selection. Artif Intell. 1997;97:273–324. https://doi.org/10.1016/S0004-3702(97)00043-X.

    Article  Google Scholar 

  36. Lv J, Peng Q, Chen X, Sun Z. A multi-objective heuristic algorithm for gene expression microarray data classification. Expert Syst Appl. 2016;59:13–9. https://doi.org/10.1016/j.eswa.2016.04.020.

    Article  Google Scholar 

  37. Hu B, Dai Y, Su Y, Moore P, Zhang X, Mao C, et al. Feature selection for optimized high-dimensional biomedical data using an improved shuffled frog leaping algorithm. IEEE/ACM Trans Comput Biol Bioinform. 2018;15(6):1765–73. https://doi.org/10.1109/TCBB.2016.2602263.

    Article  PubMed  Google Scholar 

  38. Lu H, Chen J, Yan K, Jin Q, Xue Y, Gao Z. A hybrid feature selection algorithm for gene expression data classification. Neurocomputing. 2017;256:56–62. https://doi.org/10.1016/j.neucom.2016.07.080.

    Article  Google Scholar 

  39. Qiao Y, Xiong Y, Gao H, Zhu X, Chen P. Protein-protein interface hot spots prediction based on a hybrid feature selection strategy. BMC Bioinformatics. 2018;19(1):14. https://doi.org/10.1186/s12859-018-2009-5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Shreem SS, Abdullah S, Nazri MZA. Hybrid feature selection algorithm using symmetrical uncertainty and a harmony search algorithm. Int J Syst Sci. 2014;47(6):1312–29. https://doi.org/10.1080/00207721.2014.924600.

    Article  Google Scholar 

  41. Civelek M, Lusis AJ. Systems genetics approaches to understand complex traits. Nat Rev Genet. 2014;15(1):34–48. https://doi.org/10.1038/nrg3575.

    Article  CAS  PubMed  Google Scholar 

  42. Chopra P, Lee J, Kang J, Lee S. Improving cancer classification accuracy using gene pairs. PLoS ONE. 2010;5(12): e14305. https://doi.org/10.1371/journal.pone.0014305.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Huang X, Zeng J, Zhou L, Hu C, Yin P, Lin X. A new strategy for analyzing time-series data using dynamic networks: identifying prospective biomarkers of hepatocellular carcinoma. Sci Rep. 2016;6:32448. https://doi.org/10.1038/srep32448.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Netzer M, Weinberger KM, Handler M, Seger M, Fang X, Kugler KG, et al. Profiling the human response to physical exercise: a computational strategy for the identification and kinetic analysis of metabolic biomarkers. J Clin Bioinform. 2011;1(1):34. https://doi.org/10.1186/2043-9113-1-34.

    Article  CAS  Google Scholar 

  45. Xing P, Chen Y, Gao J, Bai L, Yuan Z. A fast approach to detect gene-gene synergy. Sci Rep. 2017;7(1):16437. https://doi.org/10.1038/s41598-017-16748-w.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Chen Y, Cao D, Gao J, Yuan Z. Discovering pair-wise synergies in microarray data. Sci Rep. 2016;6:30672. https://doi.org/10.1038/srep30672.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Sreevani Murthy CA, Chanda B. Generation of compound features based on feature interaction for classification. Exp Syst Appl. 2018;108:61–73. https://doi.org/10.1016/j.eswa.2018.04.033.

    Article  Google Scholar 

  48. Murthy CA. Bridging feature selection and extraction: compound feature generation. IEEE Trans Knowl Data Eng. 2017;29(4):757–70. https://doi.org/10.1109/tkde.2016.2619712.

    Article  Google Scholar 

  49. Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D. Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics. 2005;21(20):3896–904. https://doi.org/10.1093/bioinformatics/bti631.

    Article  CAS  PubMed  Google Scholar 

  50. Lin X, Afsari B, Marchionni L, Cope L, Parmigiani G, Naiman D, et al. The ordering of expression among a few genes can provide simple cancer biomarkers and signal BRCA1 mutations. BMC Bioinformatics. 2009;10:256. https://doi.org/10.1186/1471-2105-10-256.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Magis AT, Price ND. The top-scoring ‘N’ algorithm: a generalized relative expression classification method from small numbers of biomolecules. BMC Bioinformatics. 2012;13:227. https://doi.org/10.1186/1471-2105-13-227.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Kagaris D, Khamesipour A, Yiannoutsos CT. AUCTSP: an improved biomarker gene pair class predictor. BMC Bioinformatics. 2018;19(1):244. https://doi.org/10.1186/s12859-018-2231-1.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Khamesipour A, Kagaris D. Speeding up the discovery of combinations of differentially expressed genes for disease prediction and classification. Comput Methods Programs Biomed. 2019;170:69–80. https://doi.org/10.1016/j.cmpb.2019.01.004.

    Article  PubMed  Google Scholar 

  54. Wang H, Zhang H, Dai Z, Chen M, Yuan Z. TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection. BMC Med Genomics. 2013;6:S3. https://doi.org/10.1186/1755-8794-6-S1-S3.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Huang X, Lin X, Zhou L, Su B. Analyzing omics data by pair-wise feature evaluation with horizontal and vertical comparisons. J Pharm Biomed Anal. 2018;157:20–6. https://doi.org/10.1016/j.jpba.2018.04.052.

    Article  CAS  PubMed  Google Scholar 

  56. Lin X, Zhang Y, Li C, Wang J, Luo P, Zhou H. A new data analysis method based on feature linear combination. J Biomed Inform. 2019;94: 103173. https://doi.org/10.1016/j.jbi.2019.103173.

    Article  PubMed  Google Scholar 

  57. Chen F, Xue J, Zhou L, Wu S, Chen Z. Identification of serum biomarkers of hepatocarcinoma through liquid chromatography/mass spectrometry-based metabonomic method. Anal Bioanal Chem. 2011;401(6):1899–904. https://doi.org/10.1007/s00216-011-5245-3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Andersen AH, Rayens WS, Liu Y, Smith CD. Partial least squares for discrimination in fMRI data. Magn Reson Imaging. 2012;30(3):446–52. https://doi.org/10.1016/j.mri.2011.11.001.

    Article  PubMed  PubMed Central  Google Scholar 

  59. Lin X, Huang X, Zhou L, Ren W, Zeng J, Yao W, et al. The robust classification model based on combinatorial features. IEEE/ACM Trans Comput Biol Bioinform. 2019;16(2):650–7. https://doi.org/10.1109/TCBB.2017.2779512.

    Article  PubMed  Google Scholar 

  60. Ochs MF, Farrar JE, Considine M, Wei Y, Meshinchi S, Arceci RJ. Outlier analysis and top scoring pair for integrated data analysis and biomarker discovery. IEEE/ACM Trans Comput Biol Bioinform. 2014;11(3):520–32. https://doi.org/10.1109/TCBB.2013.153.

    Article  PubMed  PubMed Central  Google Scholar 

  61. Hu JX, Thomas CE, Brunak S. Network biology concepts in complex disease comorbidities. Nat Rev Genet. 2016;17(10):615–29. https://doi.org/10.1038/nrg.2016.87.

    Article  CAS  PubMed  Google Scholar 

  62. Barabasi AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011;12(1):56–68. https://doi.org/10.1038/nrg2918.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Jin G, Zhou X, Wang H, Zhao H, Cui K, Zhang XS, et al. The knowledge-integrated network biomarkers discovery for major adverse cardiac events. J Proteome Res. 2008;7:4013–21. https://doi.org/10.1021/pr8002886.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Miryala SK, Anbarasu A, Ramaiah S. Discerning molecular interactions: a comprehensive review on biomolecular interaction databases and network analysis tools. Gene. 2018;642:84–94. https://doi.org/10.1016/j.gene.2017.11.028.

    Article  CAS  PubMed  Google Scholar 

  65. Oughtred R, Stark C, Breitkreutz BJ, Rust J, Boucher L, Chang C, et al. The BioGRID interaction database: 2019 update. Nucleic Acids Res. 2019;47(D1):D529–41. https://doi.org/10.1093/nar/gky1079.

    Article  CAS  PubMed  Google Scholar 

  66. Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021;49(D1):D605–12. https://doi.org/10.1093/nar/gkaa1074.

    Article  CAS  PubMed  Google Scholar 

  67. Mardinoglu A, Agren R, Kampf C, Asplund A, Uhlen M, Nielsen J. Genome-scale metabolic modelling of hepatocytes reveals serine deficiency in patients with non-alcoholic fatty liver disease. Nat Commun. 2014;5:3083. https://doi.org/10.1038/ncomms4083.

    Article  CAS  PubMed  Google Scholar 

  68. Jahagirdar S, Saccenti E. On the Use of Correlation and MI as a Measure of Metabolite-Metabolite Association for Network Differential Connectivity Analysis. Metabolites. 2020;10(4). https://doi.org/10.3390/metabo10040171

  69. Singh AJ, Ramsey SA, Filtz TM, Kioussi C. Differential gene regulatory networks in development and disease. Cell Mol Life Sci. 2018;75(6):1013–25. https://doi.org/10.1007/s00018-017-2679-6.

    Article  CAS  PubMed  Google Scholar 

  70. Chen L, Liu R, Liu ZP, Li M, Aihara K. Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers. Sci Rep. 2012;2:342. https://doi.org/10.1038/srep00342.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Liu X, Chang X, Leng S, Tang H, Aihara K, Chen L. Detection for disease tipping points by landscape dynamic network biomarkers. Natl Sci Rev. 2019;6(4):775–85. https://doi.org/10.1093/nsr/nwy162.

    Article  CAS  PubMed  Google Scholar 

  72. Li M, Zeng T, Liu R, Chen L. Detecting tissue-specific early warning signals for complex diseases based on dynamical network biomarkers: study of type 2 diabetes by cross-tissue analysis. Brief Bioinform. 2014;15(2):229–43. https://doi.org/10.1093/bib/bbt027.

    Article  CAS  PubMed  Google Scholar 

  73. Liu X, Liu ZP, Zhao XM, Chen L. Identifying disease genes and module biomarkers by differential interactions. J Am Med Inform Assoc. 2012;19(2):241–8. https://doi.org/10.1136/amiajnl-2011-000658.

    Article  PubMed  Google Scholar 

  74. Lui TW, Tsui NB, Chan LW, Wong CS, Siu PM, Yung BY. DECODE: an integrated differential co-expression and differential expression analysis of gene expression data. BMC Bioinformatics. 2015;16:182. https://doi.org/10.1186/s12859-015-0582-4.

    Article  PubMed  PubMed Central  Google Scholar 

  75. Krumsiek J, Suhre K, Illig T, Adamski J, Theis FJ. Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data. BMC Syst Biol. 2011;5(1):21. https://doi.org/10.1186/1752-0509-5-21.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Zuo Y, Yu G, Tadesse MG, Ressom HW. Biological network inference using low order partial correlation. Methods. 2014;69(3):266–73. https://doi.org/10.1016/j.ymeth.2014.06.010.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Ideker T, Krogan NJ. Differential network biology. Mol Syst Biol. 2012;8:565. https://doi.org/10.1038/msb.2011.99.

    Article  PubMed  PubMed Central  Google Scholar 

  78. Savino A, Provero P, Poli V. Differential co-expression analyses allow the identification of critical signalling pathways altered during tumour transformation and progression. Int J Mol Sci. 2020;21(24). https://doi.org/10.3390/ijms21249461

  79. Hsu CL, Juan HF, Huang HC. Functional analysis and characterization of differential coexpression networks. Sci Rep. 2015;5:13295. https://doi.org/10.1038/srep13295.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Siska C, Bowler R, Kechris K. The discordant method: a novel approach for differential correlation. Bioinformatics. 2016;32(5):690–6. https://doi.org/10.1093/bioinformatics/btv633.

    Article  CAS  PubMed  Google Scholar 

  81. Huang X, Lin X, Zeng J, Wang L, Yin P, Zhou L, et al. A computational method of defining potential biomarkers based on differential sub-networks. Sci Rep. 2017;7(1):14339. https://doi.org/10.1038/s41598-017-14682-5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Su B, Luo P, Yang Z, Yu P, Li Z, Yin P, et al. A novel analysis method for biomarker identification based on horizontal relationship: identifying potential biomarkers from large-scale hepatocellular carcinoma metabolomics data. Anal Bioanal Chem. 2019;411(24):6377–86. https://doi.org/10.1007/s00216-019-02011-w.

    Article  CAS  PubMed  Google Scholar 

  83. Wang Q, Su B, Dong L, Jiang T, Tan Y, Lu X, et al. Liquid chromatography-mass spectrometry-based nontargeted metabolomics predicts prognosis of hepatocellular carcinoma after curative resection. J Proteome Res. 2020;19(8):3533–41. https://doi.org/10.1021/acs.jproteome.0c00344.

    Article  CAS  PubMed  Google Scholar 

  84. Fang C, Su B, Jiang T, Li C, Tan Y, Wang Q, et al. Prognosis prediction of hepatocellular carcinoma after surgical resection based on serum metabolic profiling from gas chromatography-mass spectrometry. Anal Bioanal Chem. 2021;413(12):3153–65. https://doi.org/10.1007/s00216-021-03281-z.

    Article  CAS  PubMed  Google Scholar 

  85. Wang YC, Chen BS. A network-based biomarker approach for molecular investigation and diagnosis of lung cancer. BMC Med Genomics. 2011;4(1):2. https://doi.org/10.1186/1755-8794-4-2.

    Article  PubMed  PubMed Central  Google Scholar 

  86. Allahyar A, Ubels J, de Ridder J. A data-driven interactome of synergistic genes improves network-based cancer outcome prediction. PLoS Comput Biol. 2019;15(2): e1006657. https://doi.org/10.1371/journal.pcbi.1006657.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Ruan P, Wang S. DiSNEP: a Disease-Specific gene Network Enhancement to improve Prioritizing candidate disease genes. Brief Bioinform. 2021;22(4). https://doi.org/10.1093/bib/bbaa241

  88. Koutrouli M, Karatzas E, Paez-Espino D, Pavlopoulos GA. A guide to conquer the biological network era using graph theory. Front Bioeng Biotechnol. 2020;8:34. https://doi.org/10.3389/fbioe.2020.00034.

    Article  PubMed  PubMed Central  Google Scholar 

  89. Wang C, Chen L, Yang Y, Zhang M, Wong G. Identification of bladder cancer prognostic biomarkers using an ageing gene-related competitive endogenous RNA network. Oncotarget. 2017;8:111742–53. https://doi.org/10.18632/oncotarget.22905.

    Article  PubMed  PubMed Central  Google Scholar 

  90. Bernier M, Croteau E, Castellano CA, Cunnane SC, Whittingstall K. Spatial distribution of resting-state BOLD regional homogeneity as a predictor of brain glucose uptake: a study in healthy aging. Neuroimage. 2017;150:14–22. https://doi.org/10.1016/j.neuroimage.2017.01.055.

    Article  CAS  PubMed  Google Scholar 

  91. Cai S, Huang K, Kang Y, Jiang Y, von Deneen KM, Huang L. Potential biomarkers for distinguishing people with Alzheimer’s disease from cognitively intact elderly based on the rich-club hierarchical structure of white matter networks. Neurosci Res. 2019;144:56–66. https://doi.org/10.1016/j.neures.2018.07.005.

    Article  PubMed  Google Scholar 

  92. Li S, Chen X, Liu X, Yu Y, Pan H, Haak R, et al. Complex integrated analysis of lncRNAs-miRNAs-mRNAs in oral squamous cell carcinoma. Oral Oncol. 2017;73:1–9. https://doi.org/10.1016/j.oraloncology.2017.07.026.

    Article  CAS  PubMed  Google Scholar 

  93. Henni K, Mezghani N, Gouin-Vallerand C. Unsupervised graph-based feature selection via subspace and pagerank centrality. Expert Syst Appl. 2018;114:46–53. https://doi.org/10.1016/j.eswa.2018.07.029.

    Article  Google Scholar 

  94. Ahmed H, Howton TC, Sun Y, Weinberger N, Belkhadir Y, Mukhtar MS. Network biology discovers pathogen contact points in host protein-protein interactomes. Nat Commun. 2018;9(1):2312. https://doi.org/10.1038/s41467-018-04632-8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Wei B, Liu J, Wei D, Gao C, Deng Y. Weighted k-shell decomposition for complex networks based on potential edge weights. Physica A. 2015;420:277–83. https://doi.org/10.1016/j.physa.2014.11.012.

    Article  Google Scholar 

  96. Xu S, Wang P, Zhang CX, Lu J. Spectral learning algorithm reveals propagation capability of complex networks. IEEE Trans Cybern. 2019;49(12):4253–61. https://doi.org/10.1109/TCYB.2018.2861568.

    Article  PubMed  Google Scholar 

  97. Di Nanni N, Gnocchi M, Moscatelli M, Milanesi L, Mosca E. Gene relevance based on multiple evidences in complex networks. Bioinformatics. 2020;36(3):865–71. https://doi.org/10.1093/bioinformatics/btz652.

    Article  CAS  PubMed  Google Scholar 

  98. Ning Z, Feng C, Song C, Liu W, Shang D, Li M, et al. Topologically inferring active miRNA-mediated subpathways toward precise cancer classification by directed random walk. Mol Oncol. 2019;13(10):2211–26. https://doi.org/10.1002/1878-0261.12563.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Isik Z, Ercan ME. Integration of RNA-Seq and RPPA data for survival time prediction in cancer patients. Comput Biol Med. 2017;89:397–404. https://doi.org/10.1016/j.compbiomed.2017.08.028.

    Article  CAS  PubMed  Google Scholar 

  100. Wei PJ, Wu FX, Xia J, Su Y, Wang J, Zheng CH. Prioritizing cancer genes based on an improved random walk method. Front Genet. 2020;11:377. https://doi.org/10.3389/fgene.2020.00377.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Doungpan N, Engchuan W, Meechai A, Fong S, Chan JH. Gene-Network-Based Feature Set (GNFS) for expression-based cancer classification. Journal of Medical Imaging and Health Informatics. 2016;6(4):1093–101. https://doi.org/10.1166/jmihi.2016.1806.

    Article  Google Scholar 

  102. Doungpan N, Engchuan W, Chan JH, Meechai A. GSNFS: Gene subnetwork biomarker identification of lung cancer expression data. BMC Med Genomics. 2016;9(Suppl 3):70. https://doi.org/10.1186/s12920-016-0231-4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Ma X, Liu Z, Zhang Z, Huang X, Tang W. Multiple network algorithm for epigenetic modules via the integration of genome-wide DNA methylation and gene expression data. BMC Bioinformatics. 2017;18(1):72. https://doi.org/10.1186/s12859-017-1490-6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Liu ZP, Gao R. Detecting pathway biomarkers of diabetic progression with differential entropy. J Biomed Inform. 2018;82:143–53. https://doi.org/10.1016/j.jbi.2018.05.006.

    Article  PubMed  Google Scholar 

  105. Al-Harazi O, Al Insaif S, Al-Ajlan MA, Kaya N, Dzimiri N, Colak D. Integrated genomic and network-based analyses of complex diseases and human disease network. J Genet Genomics. 2016;43(6):349–67. https://doi.org/10.1016/j.jgg.2015.11.002.

    Article  PubMed  Google Scholar 

  106. Sajjadi SJ, Qian X, Zeng B, Adl AA. Network-based methods to identify highly discriminating subsets of biomarkers. IEEE/ACM Trans Comput Biol Bioinform. 2014;11(6):1029–37. https://doi.org/10.1109/TCBB.2014.2325014.

    Article  PubMed  Google Scholar 

  107. Zhang X, Gao L, Liu ZP, Chen L. Identifying module biomarker in type 2 diabetes mellitus by discriminative area of functional activity. BMC Bioinformatics. 2015;16:92. https://doi.org/10.1186/s12859-015-0519-y.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Kori M, Gov E, Arga KY. Novel genomic biomarker candidates for cervical cancer as identified by differential co-expression network analysis. OMICS: A Journal of Integrative Biology. 2019;23(5):261–73. https://doi.org/10.1089/omi.2019.0025.

    Article  CAS  PubMed  Google Scholar 

  109. Monaco A, Pantaleo E, Amoroso N, Bellantuono L, Lombardi A, Tateo A, et al. Identifying potential gene biomarkers for Parkinson’s disease through an information entropy based approach. Phys Biol. 2020;18(1):016003. https://doi.org/10.1088/1478-3975/abc09a.

    Article  CAS  PubMed  Google Scholar 

  110. Das J, Gayvert KM, Bunea F, Wegkamp MH, Yu H. ENCAPP: elastic-net-based prognosis prediction and biomarker discovery for human cancers. BMC Genomics. 2015;16:263. https://doi.org/10.1186/s12864-015-1465-9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Date Y, Kikuchi J. Application of a deep neural network to metabolomics studies and its performance in determining important variables. Anal Chem. 2018;90(3):1805–10. https://doi.org/10.1021/acs.analchem.7b03795.

    Article  CAS  PubMed  Google Scholar 

  112. Danaee P, Ghaeini R, Hendrix DA. A deep learning approach for cancer detection and relevant gene identification. Biocomputing 2017: WORLD SCIENTIFIC; 2016. p. 219–229. https://doi.org/10.1142/9789813207813_0022

  113. Schulte-Sasse R, Budach S, Hnisz D, Marsico A, editors. Graph Convolutional Networks Improve the Prediction of Cancer Driver Genes. Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions; 2019 2019//; Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-30493-5_60

  114. Liu J, Wang X, Cheng Y, Zhang L. Tumor gene expression data classification via sample expansion-based deep learning. Oncotarget. 2017;8:109646–60.

    Article  PubMed  PubMed Central  Google Scholar 

  115. Kong Y, Yu T. A graph-embedded deep feedforward network for disease outcome classification and feature selection using gene expression data. Bioinformatics. 2018;34(21):3727–37. https://doi.org/10.1093/bioinformatics/bty429.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  116. Meng Y, Jin M. HFS-SLPEE: A novel hierarchical feature selection and second learning probability error ensemble model for precision cancer diagnosis. Front Cell Dev Biol. 2021;9:696359. https://doi.org/10.3389/fcell.2021.696359.

    Article  PubMed  PubMed Central  Google Scholar 

  117. Shi Z, Wen B, Gao Q, Zhang B. Feature selection methods for protein biomarker discovery from proteomics or multiomics data. Mol Cell Proteomics. 2021;20:100083. https://doi.org/10.1016/j.mcpro.2021.100083.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Kassaporn D, Thomas S, Jutarop P, Puangrat Y, Raynoo T, Anchalee T, et al. Discovery and qualification of serum protein biomarker candidates for cholangiocarcinoma diagnosis. J Proteome Res. 2019;18(9):3305–16. https://doi.org/10.1021/acs.jproteome.9b00242.

    Article  CAS  Google Scholar 

Download references

Funding

This study is supported by the Fundamental Research Funds for the Central Universities (DUT21YG115) and the foundation (No. 21876169) from the National Natural Science Foundation of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohui Lin.

Ethics declarations

Ethics approval

Not applicable.

Source of biological material

Not applicable.

Statement on animal welfare

Not applicable.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Published in the topical collection celebrating ABCs 20th Anniversary. Chao Li and Zhenbo Gao contributed equally to this paper.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, C., Gao, Z., Su, B. et al. Data analysis methods for defining biomarkers from omics data. Anal Bioanal Chem 414, 235–250 (2022). https://doi.org/10.1007/s00216-021-03813-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00216-021-03813-7

Keywords

Navigation