A Comparative Analysis of Feature Selection Methods for Biomarker Discovery in Study of Toxicant-Treated Atlantic Cod (Gadus Morhua) Liver

  • Xiaokang Zhang
  • Inge JonassenEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1056)


Univariate and multivariate feature selection methods can be used for biomarker discovery in analysis of toxicant exposure. Among the univariate methods, differential expression analysis (DEA) is often applied for its simplicity and interpretability. A characteristic of methods for DEA is that they treat genes individually, disregarding the correlation that exists between them. On the other hand, some multivariate feature selection methods are proposed for biomarker discovery. Provided with various biomarker discovery methods, how to choose the most suitable method for a specific dataset becomes a problem. In this paper, we present a framework for comparison of potential biomarker discovery methods: three methods that stem from different theories are compared by how stable they are and how well they can improve the classification accuracy. The three methods we have considered are: Significance Analysis of Microarrays (SAM) which identifies the differentially expressed genes; minimum Redundancy Maximum Relevance (mRMR) based on information theory; and Characteristic Direction (GeoDE) inspired by a graphical perspective. Tested on the gene expression data from two experiments exposing the cod fish to two different toxicants (MeHg and PCB 153), different methods stand out in different cases, so a decision upon the most suitable method should be made based on the dataset under study and the research interest.


Feature selection Stability Classification Biomarker discovery 



We would like to thank the colleagues in Jonassen Group for helpful discussions and Computational Biology Unit at University of Bergen, where the work was carried out. We also would like to thank the Centre for Digital Life Norway (DLN) and the dCod 1.0 project to which the work is related.


The dCod 1.0 project is funded under the Digital Life Norway initiative of the BIOTEK 2021 program of the Research Council of Norway (project no. 248840).


  1. 1.
    Ageeva, T.N., et al.: Gender-specific responses of mature Atlantic cod (Gadus morhua L.) to feed deprivation. Fish. Res. 188, 95–99 (2017)CrossRefGoogle Scholar
  2. 2.
    Goksøyr, A., Solberg, T.S., Serigstad, B.: Immunochemical detection of cytochrome P450IA1 induction in cod larvae and juveniles exposed to a water soluble fraction of North Sea crude oil. Mar. Pollut. Bull. 22(3), 122–127 (1991)CrossRefGoogle Scholar
  3. 3.
    Balk, L., et al.: Biomarkers in natural fish populations indicate adverse biological effects of offshore oil production. PLoS ONE 6(5), e19735 (2011)CrossRefGoogle Scholar
  4. 4.
    Sundt, et al.: WCM 2010, 2012. NIVA, IMR, IRIS report (2012)Google Scholar
  5. 5.
    Chesman, B.S., et al.: Hepatic metallothionein and total oxyradical scavenging capacity in Atlantic cod Gadus morhua caged in open sea contamination gradients. Aquat. Toxicol. 84(3), 310–20 (2007)CrossRefGoogle Scholar
  6. 6.
    Olsvik, P.A., et al.: Are Atlantic cod in store Lungegrdsvann, a seawater recipient in Bergen, affected by environmental contaminants? A qRT-PCR survey. J. Toxicol. Environ. Health Part A Curr. Issues 72(3–4), 140–154 (2009)CrossRefGoogle Scholar
  7. 7.
    Robotti, E., Manfredi, M., Marengo, E.: Biomarkers discovery through multivariate statistical methods: a review of recently developed methods and applications in proteomics. J. Proteomics Bioinform. 3, 20 (2014)Google Scholar
  8. 8.
    De Winter, J.C.: Using the student’s t-test with extremely small sample sizes. Pract. Assess. Res. Eval. 18(10), 1–12 (2013)MathSciNetGoogle Scholar
  9. 9.
    Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc. Nat. Acad. Sci. 98(9), 5116–5121 (2001)zbMATHCrossRefGoogle Scholar
  10. 10.
    Yadetie, F., et al.: Global transcriptome analysis of Atlantic cod (Gadus morhua) liver after in vivo methylmercury exposure suggests effects on energy metabolism pathways. Aquat. Toxicol. 126, 314–325 (2013)CrossRefGoogle Scholar
  11. 11.
    Yadetie, F., et al.: Liver transcriptome analysis of Atlantic cod (Gadus morhua) exposed to PCB 153 indicates effects on cell cycle regulation and lipid metabolism. BMC Genom. 15(1), 481 (2014)CrossRefGoogle Scholar
  12. 12.
    Yadetie, F., et al.: Quantitative analyses of the hepatic proteome of methylmercury-exposed Atlantic cod (Gadus morhua) suggest oxidative stress-mediated effects on cellular energy metabolism. BMC Genom. 17(1), 554 (2016)CrossRefGoogle Scholar
  13. 13.
    Yadetie, F., et al.: Quantitative proteomics analysis reveals perturbation of lipid metabolic pathways in the liver of Atlantic cod (Gadus morhua) treated with PCB 153. Aquat. Toxicol. 185, 19–28 (2017)CrossRefGoogle Scholar
  14. 14.
    Shannon, P., et al.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11), 2498–2504 (2003)CrossRefGoogle Scholar
  15. 15.
    Tong, A.H.Y., et al.: Global mapping of the yeast genetic interaction network. Science 303(5659), 808–813 (2004)CrossRefGoogle Scholar
  16. 16.
    He, Z., Yu, W.: Stable feature selection for biomarker discovery. Comput. Biol. Chem. 34(4), 215–225 (2010)zbMATHCrossRefGoogle Scholar
  17. 17.
    Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  19. 19.
    Clark, N.R., et al.: The characteristic direction: a geometrical approach to identify differentially expressed genes. BMC Bioinform. 15(1), 79 (2014)CrossRefGoogle Scholar
  20. 20.
    Nogueira, S., Sechidis, K., Brown, G.: On the stability of feature selection algorithms. J. Mach. Learn. Res. 18, 1–54 (2018)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Davis, C.A., et al.: Reliable gene signatures for microarray classification: assessment of stability and performance. Bioinformatics 22(19), 2356–2363 (2006)CrossRefGoogle Scholar
  22. 22.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)zbMATHCrossRefGoogle Scholar
  23. 23.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  24. 24.
    Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1 (2010)CrossRefGoogle Scholar
  25. 25.
    Cawley, G.C., Talbot, N.L.: On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Yandell, B.: Practical Data Analysis for Designed Experiments. Routledge, Abingdon (2017)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Computational Biology Unit, Department of InformaticsUniversity of BergenBergenNorway

Personalised recommendations