Abstract
The pathway information of a given microarray gene expression data can be collected from the available public databases. Inferring the activity of a pathway is a crucial task in functional genomics. In general, the set of genes that are associated with a given pathway are equally considered for measuring goodness. But the contribution of each gene should be quantified differently. In the current study, we have quantified the degrees of relevance of different genes participating in a pathway by optimizing different goodness measures of pathway activity. Two popular goodness measures, namely t-score and z-score are modified to measure the goodness of the weighted gene vectors. Moreover, another goodness measure based on the protein-protein interaction scores of pairs of genes participated in a pathway is utilized as another objective function. All these measures are designed to handle the weighted importance of individual genes. The search capability of a multiobjective based particle swarm optimization (PSO) is utilized for searching the appropriate relevance vectors for different genes. The proposed approach is applied to five real-life gene expression datasets, and the performance is compared with eight existing feature selection methods. The comparative results demonstrate the superiority of the proposed particle swarm optimization based technique. The efficacy of the performance of the proposed method is validated by using a statistical significance test, and further, a biological significant test is done to justify the biological relevance of the extracted pathway-based gene markers.
Similar content being viewed by others
References
Aho KA, Foundational and applied statistics for biologists using R. Chapman and Hall/CRC (2016)
An FP, Liu ZW (2019) Bi-dimensional empirical mode decomposition (bemd) algorithm based on particle swarm optimization-fractal interpolation. Multimed Tools Appl 78(12):17239–17264
Baldi P, Long AD (2001) A bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17(6):509–519
Bandyopadhyay S, Mallik S, Mukhopadhyay A (2014) A survey and comparative study of statistical tests for identifying differential expression from microarray data. IEEE/ACM Trans Comput Biol Bioinform 11(1):95–115
Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A simulated annealing-based multiobjective optimization algorithm: Amosa. IEEE Trans Evolut Comput 12(3):269–283
Borawake-Satao R, Prasad R (2019) Mobility aware multi-objective routing in wireless multimedia sensor network. Multimed Tools Appl 78 (23):32659–32677
Chakraborty R, Sushil R, Garg M (2019) Hyper-spectral image segmentation using an improved pso aided with multilevel fuzzy entropy. Multimed Tools Appl 78(23):34027–34063
Coordinators NR (2013) Database resources of the national center for biotechnology information. Nucleic acids research 41(Database issue):D8
Daneshfar F, Kabudian SJ (2019) Speech emotion recognition using discriminative dimension reduction by employing a modified quantum-behaved particle swarm optimization algorithm. Multimedia Tools and Applications, 1–29
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6 (2):182–197
Deng L, Pei J, Ma J, Lee DL (2004) A rank sum test method for informative gene discovery. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 410–419
Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinf Comput Biol 3(02):185–205
Dutta P, Saha S (2017) Fusion of expression values and protein interaction information using multi-objective optimization for improving gene clustering. Comput Biol Med 89:31–43
Dutta P, Saha S, Chauhan AB (2018) Predicting degree of relevance of pathway markers from gene expression data: A pso based approach. In: International conference on neural information processing. Springer, Berlin, pp 3–14
Dutta P, Saha S, Chopra S, Miglani V (2019) Ensembling of gene clusters utilizing deep learning and protein-protein interaction information. IEEE/ACM transactions on computational biology and bioinformatics
Dutta P, Saha S, Gulati S (2019) Graph-based hub gene selection technique using protein interaction information: application to sample classification. IEEE J Biomed Health Inform 23(6):2670–2676
Dutta P, Saha S, Pai S, Kumar A (2020) A protein interaction information-based generative model for enhancing gene clustering. Sci Rep 10(1):1–12
El Aziz MA, Ewees AA, Hassanien AE (2018) Multi-objective whale optimization algorithm for content-based image retrieval. Multimedi Tools Appl 77(19):26135–26172
Fox RJ, Dimmic MW (2006) A two-sample bayesian t-test for microarray data. BMC Bioinform 7(1):126
Gupta DK, Reddy KS, Ekbal A, et al. (2015) Pso-asent: feature selection using particle swarm optimization for aspect based sentiment analysis. In: International conference on applications of natural language to information systems. Springer, Berlin, pp 220–233
Hall MA, Smith LA (1999) Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In: FLAIRS conference, vol 1999, pp 235–239
Huang DW, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nature Protoc 4(1):44
Jiang H, Deng Y, Chen HS, Tao L, Sha Q, Chen J, Tsai CJ, Zhang S (2004) Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes. BMC Bioinform 5(1):81
Kamandar M, Ghassemian H (2011) Maximum relevance, minimum redundancy band selection for hyperspectral images. In: Electrical engineering (ICEE), 2011 19th iranian conference on, IEEE, pp 1–5
Kanehisa M, Goto S (2000) Kegg: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28(1):27–30
Kennedy J (2011) Particle swarm optimization. In: Encyclopedia of machine learning. Springer, Berlin, pp 760–766
Kushwaha N, Pant M (2019) Modified particle swarm optimization for multimodal functions and its application. Multimed Tools Appl 78(17):23917–23947
Lee E, Chuang HY, Kim JW, Ideker T, Lee D (2008) Inferring pathway activity toward precise disease classification. PLoS Comput Biol 4(11):e1000217
Liu KQ, Liu ZP, Hao JK, Chen L, Zhao XM (2012) Identifying dysregulated pathways in cancers from pathway interaction networks. BMC Bioinform 13(1):126
Liu W, Wang W, Tian G, Xie W, Lei L, Liu J, Huang W, Xu L, Li E (2017) Topologically inferring pathway activity for precise survival outcome prediction: breast cancer as a case. Mol Biosyst 13(3):537–548
López Y, Nakai K, Patil A (2015) Hitpredict version 4: comprehensive reliability scoring of physical protein–protein interactions from more than 100 species. Database 2015
Mandal M, Mondal J, Mukhopadhyay A (2015) A pso-based approach for pathway marker identification from gene expression data. IEEE Trans Nanobiosci 14(6):591–597
Mandal M, Mukhopadhyay A (2014) A graph-theoretic approach for identifying non-redundant and relevant gene markers from microarray data using multiobjective binary pso. PloS one 9(3):e90949
Marcano-Cedeño A, Quintanilla-Domínguez J, Cortina-Januchs M, Andina D (2010) Feature selection using sequential forward selection and classification applying artificial metaplasticity neural network. In: IECON 2010-36th annual conference on IEEE industrial electronics society, IEEE, pp 2845–2850
Maulik U, Saha I (2009) Modified differential evolution based fuzzy clustering for pixel classification in remote sensing imagery. Pattern Recogn 42 (9):2135–2149
Mendenhall WM (2016) Statistics for engineering and the sciences, student solutions manual. Chapman and Hall/CRC, Boudreau, NS
Mukherjee S, Roberts SJ, Sykacek P, Gurr SJ (2003) Gene ranking using bootstrapped p-values. ACM SIGKDD Explor Newsletter 5(2):16–22
Mukhopadhyay A, Mandal M (2014) Identifying non-redundant gene markers from microarray data: a multiobjective variable length pso-based approach. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 11(6):1170–1183
Parsopoulos KE (2010) Particle swarm optimization and intelligence: advances and applications: advances and applications. IGI global
Piñero J, Bravo À, Queralt-Rosinach N, Gutiérrez-Sacristán A, Deu-Pons J, Centeno E, García-García J, Sanz F, Furlong LI (2016) Disgenet: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic acids research p gkw943
Poli R, Kennedy J, Blackwell T (2007) Particle swarm optimization. Swarm Intell 1(1):33–57
Seo M, Oh S (2012) Cbfs: High performance feature selection algorithm based on feature clearness. PloS one 7(7):e40419
Sethi R, Sreedevi I (2019) Adaptive enhancement of underwater images using multi-objective pso. Multimeda Tools Appl 78(22):31823–31845
Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS, et al. (2002) Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68
Su J, Yoon BJ, Dougherty ER (2010) Identification of diagnostic subnetwork markers for cancer in human protein-protein interaction network. In: BMC Bioinformatics, biomed central, vol 11, p S8
Troyanskaya OG, Garber ME, Brown PO, Botstein D, Altman RB (2002) Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 18(11):1454–1461
Wang K, Li M, Bucan M (2007) Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 81(6):1278–1283
Wang Y, Makedon FS, Ford JC, Pearlman J (2004) Hykgene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics 21(8):1530–1537
Wang X, Sun Z, Zimmermann MT, Bugrim A, Kocher JP (2019) Predict drug sensitivity of cancer cells with pathway activity inference. BMC Med Genomics 12(1):15
Welch BL (1947) The generalization of ‘student’s’ problem when several different population variances are involved, vol 34. http://www.jstor.org/stable/2332510
Xiao Y, Hsiao TH, Suresh U, Chen HIH, Wu X, Wolf SE, Chen Y (2012) A novel significance score for gene selection and ranking. Bioinformatics 30(6):801–807
Acknowledgments
Pratik Dutta acknowledges Visvesvaraya PhD Scheme for Electronics and IT, an initiative of Ministry of Electronics and Information Technology (MeitY), Government of India for fellowship support. Dr. Sriparna Saha gratefully acknowledges the Young Faculty Research Fellowship (YFRF) Award, supported by Visvesvaraya PhD scheme for Electronics and IT, Ministry of Electronics and Information Technology (MeitY), Government of India, being implemented by Digital India Corporation (formerly Media Lab Asia) for carrying out this research.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Availability of data and materials
Rights and permissions
About this article
Cite this article
Dutta, P., Saha, S. & Naskar, S. A multi-objective based PSO approach for inferring pathway activity utilizing protein interactions. Multimed Tools Appl 80, 30283–30303 (2021). https://doi.org/10.1007/s11042-020-09269-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09269-8