Pattern analysis of genetics and genomics: a survey of the state-of-art

  • Jyotismita ChakiEmail author
  • Nilanjan Dey


The endless enhancement and decreasing charges of a complete human genome have given rise to fast acceptance of genetic and genomic information at both research institutions and clinics. Biologists are enchanting the primary steps in the direction of knowing the locations and functions of all the genes and controlling sites in the genomes of various organisms. As these researchers govern the nucleotide arrangement of large stretches of the human genome, they are constructing excessive volumes of sequence data. Direct research laboratory investigation of this data is expensive and tough, creating computational techniques vital. The arena of pattern analysis, which intends to build computer algorithms that enhance with knowledge, embraces the capacity to empower computers to support humans in the analysis of complex, large genetic and genomic data sets. Here, an overview of pattern analysis techniques for the study of genome sequencing datasets, as well as the proteomics, epigenetic and metabolomic data is delivered. These techniques employ data pre-processing, feature extraction and selection, classification and clustering. The aim of this survey is to present deliberations and recurring challenges in the application of pattern analysis methods, as well as of discriminative and reproductive modeling approaches and discuss the future research directions of these methods for the analysis of genomic and genetic data sets.


Genomic Genetic Pattern analysis Pre-processing Feature selection Classification Clustering 



  1. 1.
    Abeel T, Helleputte T, Van de Peer Y, Dupont P, Saeys Y (2009) Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3):392–398CrossRefGoogle Scholar
  2. 2.
    Ahmed AA, Vias M, Iyer NG, Caldas C, Brenton JD (2004) Microarray segmentation methods significantly influence data precision. Nucleic Acids Res 32(5):1–7CrossRefGoogle Scholar
  3. 3.
    Akgün M, Bayrak AO, Ozer B, Sağıroğlu MŞ (2015) Privacy preserving processing of genomic data: a survey. J Biomed Inform 56:103–111CrossRefGoogle Scholar
  4. 4.
    Alexa A, Rahnenführer J, Lengauer T (2006) Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22(13):1600–1607CrossRefGoogle Scholar
  5. 5.
    Alexe G, Alexe S, Hammer PL, Vizvari B (2006) Pattern-based feature selection in genomics and proteomics. Ann Oper Res 148(1):189–201zbMATHCrossRefGoogle Scholar
  6. 6.
    Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831–838CrossRefGoogle Scholar
  7. 7.
    Allendorf FW, Hohenlohe PA, Luikart G (2010) Genomics and the future of conservation genetics. Nat Rev Genet 11(10):697–709CrossRefGoogle Scholar
  8. 8.
    Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci 99(10):6562–6566zbMATHCrossRefGoogle Scholar
  9. 9.
    Angerer P, Haghverdi L, Büttner M, Theis FJ, Marr C, Buettner F (2015) Destiny: diffusion maps for large-scale single-cell data in R. Bioinformatics 32(8):1241–1243CrossRefGoogle Scholar
  10. 10.
    Arcuri A (2018) Evaluating search-based techniques with statistical tests. In ACM Proceedings of the 11th International Workshop on Search-Based Software Testing 21–21Google Scholar
  11. 11.
    Ardaneswari G, Bustamam A, Sarwinda D (2017) Implementation of plaid model biclustering method on microarray of carcinoma and adenoma tumor gene expression data. In Journal of Physics: Conference Series 893(1)Google Scholar
  12. 12.
    Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30(1):41CrossRefGoogle Scholar
  13. 13.
    Arsenio J, Kakaradov B, Metz PJ, Kim SH, Yeo GW, Chang JT (2014) Early specification of CD8+ T lymphocyte fates during adaptive immunity revealed by single-cell gene-expression analyses. Nat Immunol 15(4):365–372CrossRefGoogle Scholar
  14. 14.
    Aßhauer KP, Wemheuer B, Daniel R, Meinicke P (2015) Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data. Bioinformatics 31(17):2882–2884CrossRefGoogle Scholar
  15. 15.
    Ayday E, Raisaro JL, Hengartner U, Molyneaux A, Hubaux JP (2014) Privacy-preserving processing of raw genomic data. In Data Privacy Management and Autonomous Spontaneous Security Springer (Berlin, Heidelberg) 133–147Google Scholar
  16. 16.
    Barros RC, Basgalupp MP, Freitas AA, De Carvalho AC (2014) Evolutionary design of decision-tree algorithms tailored to microarray gene expression data sets. IEEE Trans Evol Comput 18(6):873–892CrossRefGoogle Scholar
  17. 17.
    Bartenhagen C, Klein HU, Ruckert C, Jiang X, Dugas M (2010) Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data. BMC bioinformatics 11(1):1–11CrossRefGoogle Scholar
  18. 18.
    Ben-Dor A, Chor B, Karp R, Yakhini Z (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10(3–4):373–384CrossRefGoogle Scholar
  19. 19.
    Best MG, Sol N, Kooi I, Tannous J, Westerman BA, Rustenburg F, Schellen P, Verschueren H, Post E, Koster J, Ylstra B, Ameziane N, Dorsman J, Smit EF, Verheul HM, Noske DP, Rejineveld JC, Nilsson JA, Wurdinger T (2015) RNA-Seq of tumor-educated platelets enables blood-based pan-cancer, multiclass, and molecular pathway cancer diagnostics. Cancer Cell 28(5):666–676CrossRefGoogle Scholar
  20. 20.
    Birnbaum K, Shasha DE, Wang JY, Jung JW, Lambert GM, Galbraith DW, Benfey PN (2003) A gene expression map of the Arabidopsis root. Science 302(5652):1956–1960CrossRefGoogle Scholar
  21. 21.
    Bolón-Canedo V, Sánchez-Marono N, Alonso-Betanzos A, Benítez JM, Herrera F (2014) A review of microarray datasets and applied feature selection methods. Inf Sci 282:111–135CrossRefGoogle Scholar
  22. 22.
    Botía JA et al (2017) An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks. BMC Syst Biol 11(1):47CrossRefGoogle Scholar
  23. 23.
    Brennecke P, Reyes A, Pinto S, Rattay K, Nguyen M, Küchler R, Huber W, Kyewski B, Steinmetz LM (2015) Single-cell transcriptome analysis reveals coordinated ectopic gene-expression patterns in medullary thymic epithelial cells. Nat Immunol 16(9):933–941CrossRefGoogle Scholar
  24. 24.
    Brozynska M, Furtado A, Henry RJ (2016) Genomics of crop wild relatives: expanding the gene pool for crop improvement. Plant Biotechnol J 14(4):1070–1085CrossRefGoogle Scholar
  25. 25.
    Bruneau M, Mottet T, Moulin S, Kerbiriou M, Chouly F, Chretien S, Guyeux C (2016) A clustering tool for nucleotide sequences using Laplacian Eigenmaps and Gaussian Mixture Models. arXiv preprint 1–17Google Scholar
  26. 26.
    Bumgarner R (2013) Overview of DNA microarrays: types, applications, and their future. Current protocols in molecular biology 101(1):1–11Google Scholar
  27. 27.
    Caldecott KW (2008) Single-strand break repair and genetic disease. Nat Rev Genet 9(8):619–631CrossRefGoogle Scholar
  28. 28.
    Campbell K, Ponting CP, Webber C (2015) Laplacian eigenmaps and principal curves for high resolution pseudotemporal ordering of single-cell RNA-seq profiles. bioRxiv Google Scholar
  29. 29.
    Castillo-Davis CI, Hartl DL (2003) GeneMerge—post-genomic analysis, data mining, and hypothesis testing. Bioinformatics 19(7):891–892CrossRefGoogle Scholar
  30. 30.
    Çetin GS, Chen H, Laine K, Lauter K, Rindal P, Xia Y (2017) Private queries on encrypted genomic data. BMC Med Genet 10(2):1–14Google Scholar
  31. 31.
    Chandra B, Gupta M (2011) Robust approach for estimating probabilities in Naïve–Bayes classifier for gene expression data. Expert Syst Appl 38(3):1293–1298CrossRefGoogle Scholar
  32. 32.
    Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Computers & Electrical Engineering 40(1):16–28CrossRefGoogle Scholar
  33. 33.
    Chavez-Alvarez R, Chavoya A, Mendez-Vazquez A (2014) Discovery of possible gene relationships through the application of self-organizing maps to DNA microarray databases. PLoS One 9(4):e93233CrossRefGoogle Scholar
  34. 34.
    Cheadle C, Vawter MP, Freed WJ, Becker KG (2003) Analysis of microarray data using Z score transformation. The Journal of molecular diagnostics 5(2):73–81CrossRefGoogle Scholar
  35. 35.
    Chen YJ, Kodell R, Sistare F, Thompson KL, Morris S, Chen JJ (2003) Normalization methods for analysis of microarray gene-expression data. J Biopharm Stat 13(1):57–74zbMATHCrossRefGoogle Scholar
  36. 36.
    Chen KH, Wang KJ, Tsai ML, Wang KM, Adrian AM, Cheng WC, Yang TS, Teng NC, Tan KP, Chang KS (2014) Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm. BMC bioinformatics 15(1):49CrossRefGoogle Scholar
  37. 37.
    Chen KH, Wang KJ, Wang KM, Angelia MA (2014) Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data. Appl Soft Comput 24:773–780CrossRefGoogle Scholar
  38. 38.
    Chen Y, Li Y, Narayan R, Subramanian A, Xie X (2016) Gene expression inference with deep learning. Bioinformatics 32(12):1832–1839CrossRefGoogle Scholar
  39. 39.
    Chen Y, Zhang Z, Zheng J, Ma Y, Xue Y (2017) Gene selection for tumor classification using neighborhood rough sets and entropy measures. J Biomed Inform 67:59–68CrossRefGoogle Scholar
  40. 40.
    Chen X, Huang JZ, Wu Q, Yang M (2017) Subspace weighting co-clustering of gene expression data. IEEE/ACM transactions on computational biology and bioinformatics Google Scholar
  41. 41.
    Chinnaswamy A, Srinivasan R (2016) Hybrid feature selection using correlation coefficient and particle swarm optimization on microarray gene expression data. In Springer Innovations in Bio-Inspired Computing and Applications 229–239Google Scholar
  42. 42.
    Chinnaswamy A, Srinivasan R (2017) Performance analysis of classifiers on filter-based feature selection approaches on microarray data. In Bio-Inspired Computing for Information Retrieval Applications 41–70Google Scholar
  43. 43.
    Chou CC, Chen CH, Lee TT, Peck K (2004) Optimization of probe length and the number of probes per gene for optimal microarray analysis of gene expression. Nucleic Acids Res 32(12):1–8CrossRefGoogle Scholar
  44. 44.
    Chu Z, Cao B, Yu F (2018) Study on Ensemble based Clustering Algorithm for Gene Expression Data. In Journal of Physics: Conference Series 1069(1)Google Scholar
  45. 45.
    Cohen IR, Domany E, Quintana FJ, Hed G, Getz G (2018) US Patent Application No 10(/082):503Google Scholar
  46. 46.
    Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szczesniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17(1):1–19CrossRefGoogle Scholar
  47. 47.
    Corus D, Dang DC, Eremeev AV, Lehre PK (2017) Level-based analysis of genetic algorithms and other search processes. IEEE Trans Evol ComputGoogle Scholar
  48. 48.
    Craddock TJ, Harvey JM, Nathanson L, Barnes ZM, Klimas NG, Fletcher MA, Broderick G (2015) Using gene expression signatures to identify novel treatment strategies in gulf war illness. BMC Med Genet 8(1):1–13Google Scholar
  49. 49.
    Cui P, Zhong T, Wang Z, Wang T, Zhao H, Liu C, Lu H (2018) Identification of human circadian genes based on time course gene expression profiles by using a deep learning method. Biochimica et Biophysica Acta (BBA)-Molecular Basis of Disease 1864(6):2274–2283CrossRefGoogle Scholar
  50. 50.
    Dai J, Xu Q (2013) Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl Soft Comput 13(1):211–221CrossRefGoogle Scholar
  51. 51.
    Dai J, Xu Q (2013) Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl Soft Comput 13(1):211–221CrossRefGoogle Scholar
  52. 52.
    Dai JJ, Lieu L, Rocke D (2006) Dimension reduction for classification with gene expression microarray data. Statistical applications in genetics and molecular biology 5(1)Google Scholar
  53. 53.
    Damelin SB, Gu Y, Wunsch DC, Xu R (2015) Fuzzy adaptive resonance theory diffusion maps and their applications to clustering and biclustering. Mathematical Modelling of Natural Phenomena 10(3):206–211MathSciNetzbMATHCrossRefGoogle Scholar
  54. 54.
    Danaee P, Ghaeini R, Hendrix DA (2017) A deep learning approach for cancer detection and relevant gene identification. In PACIFIC SYMPOSIUM ON BIOCOMPUTING 219–229Google Scholar
  55. 55.
    Das K, Mishra D (2016) Hybridized univariate and multivariate filter based approaches for gene selection. Int J Pharm Bio Sci 7(3):1215–1226Google Scholar
  56. 56.
    Das S, Deb T, Dey N, Ashour AS, Bhattacharya DK, Tibarewala DN (2018) Optimal choice of k-mer in composition vector method for genome sequence comparison. Genomics 110(5):263–273CrossRefGoogle Scholar
  57. 57.
    DeLaughter DM, Bick AG, Wakimoto H, McKean D, Gorham JM, Kathiriya IS, Hinson JT, Gray J, Pu W, Bruneau BG, Seidman JG, Seidman CE (2016) Single-cell resolution of temporal gene expression during heart development. Dev Cell 39(4):480–490CrossRefGoogle Scholar
  58. 58.
    Dettling M, Bühlmann P (2003) Boosting for tumor classification with gene expression data. Bioinformatics 19(9):1061–1069CrossRefGoogle Scholar
  59. 59.
    Dettling M, Bühlmann P (2003) Boosting for tumor classification with gene expression data. Bioinformatics 19(9):1061–1069CrossRefGoogle Scholar
  60. 60.
    D'haeseleer P (2005) How does gene expression clustering work? Nat Biotechnol 23(12):1499–1501CrossRefGoogle Scholar
  61. 61.
    Dheda K, Huggett JF, Bustin SA, Johnson MA, Rook G, Zumla A (2004) Validation of housekeeping genes for normalizing RNA expression in real-time PCR. Biotechniques 37(1):112–119CrossRefGoogle Scholar
  62. 62.
    Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC bioinformatics 7(1):1–13CrossRefGoogle Scholar
  63. 63.
    Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinforma Comput Biol 3(02):185–205CrossRefGoogle Scholar
  64. 64.
    Dopazo J, Erten C (2017) Graph-theoretical comparison of normal and tumor networks in identifying BRCA genes. BMC Syst Biol 11(1):1–17CrossRefGoogle Scholar
  65. 65.
    Edwards D (2003) Non-linear normalization and background correction in one-channel cDNA microarray studies. Bioinformatics 19(7):825–833CrossRefGoogle Scholar
  66. 66.
    El-Assaad W, El-Kouhen K, Mohammad AH, Yang J, Morita M, Gamache I, Mamer O, Avizonis D, Hermance N, Kersten S, Tremblay ML, Kelliher MA, Teodoro JG (2015) Deletion of the gene encoding G0/G1 switch protein 2 (G0s2) alleviates high-fat-diet-induced weight gain and insulin resistance, and promotes browning of white adipose tissue in mice. Diabetologia 58(1):149–157CrossRefGoogle Scholar
  67. 67.
    Eren AM, Morrison HG, Lescault PJ, Reveillaud J, Vineis JH, Sogin ML (2015) Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences. The ISME journal 9(4):968–979CrossRefGoogle Scholar
  68. 68.
    Fan R, Zhong M, Wang S, Zhang Y, Andrew A, Karagas M, Chen H, Amos CI, Xiong M, Moore JH (2011) Entropy-based information gain approaches to detect and to characterize gene-gene and gene-environment interactions/correlations of complex diseases. Genet Epidemiol 35(7):706–721CrossRefGoogle Scholar
  69. 69.
    Fang HR, Sakellaridi S, Saad Y (2009) Multilevel nonlinear dimensionality reduction for manifold learning. Technical report, Minnesota Supercomputer Institute, University of MinnesotaGoogle Scholar
  70. 70.
    Frandsen PB, Calcott B, Mayer C, Lanfear R (2015) Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates. BMC Evol Biol 15(1):13CrossRefGoogle Scholar
  71. 71.
    Franzén O, Hu J, Bao X, Itzkowitz SH, Peter I, Bashir A (2015) Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering. Microbiome 3(1):43CrossRefGoogle Scholar
  72. 72.
    Friedman N, Linial M, Nachman I, Pe'er D (2000) Using Bayesian networks to analyze expression data. J Comput Biol 7(3–4):601–620CrossRefGoogle Scholar
  73. 73.
    Fundel K, Haag J, Gebhard PM, Zimmer R, Aigner T (2008) Normalization strategies for mRNA expression data in cartilage research. Osteoarthr Cartil 16(8):947–955CrossRefGoogle Scholar
  74. 74.
    Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914CrossRefGoogle Scholar
  75. 75.
    Gamazon ER et al (2015) A gene-based association method for mapping traits using reference transcriptome data. Nat Genet 47(9):1091CrossRefGoogle Scholar
  76. 76.
    Gao C, McDowell IC, Zhao S, Brown CD, Engelhardt BE (2016) Context specific and differential gene co-expression networks via Bayesian biclustering. PLoS Comput Biol 12(7):e1004791CrossRefGoogle Scholar
  77. 77.
    Gardner JW, Boilot P, Hines EL (2005) Enhancing electronic nose performance by sensor selection using a new integer-based genetic algorithm approach. Sensors Actuators B Chem 106(1):114–121CrossRefGoogle Scholar
  78. 78.
    Geiss GK, Bumgarner RE, An MC, Agy MB, van't Wout AB, Hammersmark E, Carter V, Upchurch D, Mullins J, Katze MG (2000) Large-scale monitoring of host cell gene expression during HIV-1 infection using cDNA microarrays. Virology 266(1): 8–16Google Scholar
  79. 79.
    Gerstung M, Pellagatti A, Malcovati L, Giagounidis A, Della Porta MG, Jädersten M, Dolatshad H, Verma A, Cross NCP, Vyas P, Hellström-Lindberg E, Cazzola M, Papaemmanuil E, Campbell PJ, Boultwood J, Killick S (2015) Combining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes. Nat Commun 6:5901CrossRefGoogle Scholar
  80. 80.
    Ghasemi R, Al Aziz MM, Mohammed N, Dehkordi MH, Jiang X (2017) Private and efficient query processing on outsourced genomic databases. IEEE journal of biomedical and health informatics 21(5):1466–1472CrossRefGoogle Scholar
  81. 81.
    Ghosh A, Barman S (2016) Application of Euclidean distance measurement and principal component analysis for gene identification. Gene 583(2):112–120CrossRefGoogle Scholar
  82. 82.
    Ginsburg GS, Willard HF (2009) Genomic and personalized medicine: foundations and applications. Transl Res 154(6):277–287CrossRefGoogle Scholar
  83. 83.
    Goodwin CR, Covington BC, Derewacz DK, McNees CR, Wikswo JP, McLean JA, Bachmann BO (2015) Structuring microbial metabolic responses to multiplexed stimuli via self-organizing metabolomics maps. Chem Biol 22(5):661–670CrossRefGoogle Scholar
  84. 84.
    Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adicoins X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29(7):644CrossRefGoogle Scholar
  85. 85.
    Guo G, Pinello L, Han X, Lai S, Shen L, Lin TW, Zou K, Orkin SH (2016) Serum-based culture conditions provoke gene expression variability in mouse embryonic stem cells as revealed by single-cell analysis. Cell Rep 14(4):956–965CrossRefGoogle Scholar
  86. 86.
    Gupta A, Wang H, Ganapathiraju M (2015) Learning structure in gene expression data using deep architectures, with an application to gene clustering. In IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 1328–1335Google Scholar
  87. 87.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182zbMATHGoogle Scholar
  88. 88.
    Ha VS, Nguyen HN (2016) C-KPCA: custom kernel PCA for cancer classification. In Springer Machine Learning and Data Mining in Pattern Recognition 459–467Google Scholar
  89. 89.
    Haghverdi L, Buettner F, Theis FJ (2015) Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 31(18):2989–2998CrossRefGoogle Scholar
  90. 90.
    Hamid JS, Hu P, Roslin NM, Ling V, Greenwood CM, Beyene J (2009) Data integration in genetics and genomics: methods and challenges. Human genomics and proteomics: HGP 2009(869093):1–13Google Scholar
  91. 91.
    Hartuv E, Schmitt AO, Lange J, Meier-Ewert S, Lehrach H, Shamir R (2000) An algorithm for clustering cDNA fingerprints. Genomics 66(3):249–256CrossRefGoogle Scholar
  92. 92.
    Hauskrecht M, Pelikan R, Valko M, Lyons-Weiler J (2007) Feature selection and dimensionality reduction in genomics and proteomics. In Fundamentals of data mining in genomics and proteomics Springer (Boston, MA) 149–172Google Scholar
  93. 93.
    He KY, Ge D, He MM (2017) Big data analytics for genomic medicine. Int J Mol Sci 18(2):1–18CrossRefGoogle Scholar
  94. 94.
    Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, Harp LF, Ye Z, Lee LK, Stuart RK, Ching CW, Ching KA, Antosiewicz-Bourget JE, Liu H, Zhang X, Green RD, Lobanencov VV, Stewart R, Thomson JA, Crawford GE, Kellis M, Ren B (2009) Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459(7243):108–112CrossRefGoogle Scholar
  95. 95.
    Hernandez JCH, Duval B, Hao JK (2007) A genetic embedded approach for gene selection and classification of microarray data. In European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, Springer, Berlin, Heidelberg 90–101Google Scholar
  96. 96.
    Herrero J, Díaz-Uriarte R, Dopazo J (2003) Gene expression data preprocessing. Bioinformatics 19(5):655–656CrossRefGoogle Scholar
  97. 97.
    Herrero J, Al-Shahrour F, Diaz-Uriarte R, Mateos A, Vaquerizas JM, Santoyo J, Dopazo J (2003) GEPAS: a web-based resource for microarray gene expression data analysis. Nucleic Acids Res 31(13):3461–3467CrossRefGoogle Scholar
  98. 98.
    Heydarian Z, Gruber M, Glick BR, Hegedus DD (2018) Gene Expression Patterns in Roots of Camelina sativa With Enhanced Salinity Tolerance Arising From Inoculation of Soil With Plant Growth Promoting Bacteria Producing 1-Aminocyclopropane-1-Carboxylate Deaminase or Expression the Corresponding acdS Gene. Frontiers in microbiology 9 Google Scholar
  99. 99.
    van Hijum SA, Baerends RJ, Zomer AL, Karsens HA, Martin-Requena V, Trelles O, Kok Jan, Kuipers OP (2008) Supervised Lowess normalization of comparative genome hybridization data–application to lactococcal strain comparisons. BMC bioinformatics 9(1): 1–10Google Scholar
  100. 100.
    Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinforma 2015(198363):1–13CrossRefGoogle Scholar
  101. 101.
    Huang DS, Zheng CH (2006) Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 22(15):1855–1862CrossRefGoogle Scholar
  102. 102.
    Inza I, Sierra B, Blanco R, Larrañaga P (2002) Gene selection by sequential search wrapper approaches in microarray cancer class prediction. Journal of Intelligent & Fuzzy Systems 12(1):25–33zbMATHGoogle Scholar
  103. 103.
    Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215CrossRefGoogle Scholar
  104. 104.
    Jaskowiak PA, Campello RJ, Costa IG (2014, January) On the selection of appropriate distances for gene expression data clustering. BMC bioinformatics 15(2):1–17Google Scholar
  105. 105.
    Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11):1370–1386CrossRefGoogle Scholar
  106. 106.
    Jin X, Xu A, Bie R, Guo P (2006) Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. In International Workshop on Data Mining for Biomedical Applications Springer (Berlin, Heidelberg) 106–115Google Scholar
  107. 107.
    Johnson TA, Stedtfeld RD, Wang Q, Cole JR, Hashsham SA, Looft T, Zhu YG, Tiedje JM (2016) Clusters of antibiotic resistance genes enriched together stay together in swine agriculture. MBio 7(2):1–11CrossRefGoogle Scholar
  108. 108.
    Kamal MS, Parvin S, Ashour AS, Shi F, Dey N (2017) De-Bruijn graph with MapReduce framework towards metagenomic data classification. Int J Inf Technol 9(1):59–75Google Scholar
  109. 109.
    Kamal MS, Trivdedi, MC, Alam JB, Dey N, Ashour AS, Shi F, Tavares JMR (Preprint) Big DNA datasets analysis under push down automata. Journal of Intelligent & Fuzzy Systems: 1–11Google Scholar
  110. 110.
    Kar S, Sharma KD, Maitra M (2015) Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique. Expert Syst Appl 42(1):612–627CrossRefGoogle Scholar
  111. 111.
    Kasabov NK (2014) NeuCube: a spiking neural network architecture for mapping, learning and understanding of spatio-temporal brain data. Neural Netw 52:62–76CrossRefGoogle Scholar
  112. 112.
    Keller NP (2015) Translating biosynthetic gene clusters into fungal armor and weaponry. Nat Chem Biol 11(9):671CrossRefGoogle Scholar
  113. 113.
    Kelley DR, Snoek J, Rinn JL (2016) Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome ResGoogle Scholar
  114. 114.
    Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In IEEE Science and Information Conference (SAI) 372–378Google Scholar
  115. 115.
    Kim D. H, (2015) Single-cell transcriptome analysis reveals dynamic changes in lncRNA expression during reprogramming. Cell Stem Cell 16(1): 88–101Google Scholar
  116. 116.
    Kooperberg C, Fazzio TG, Delrow JJ, Tsukiyama T (2002) Improved background correction for spotted DNA microarrays. J Comput Biol 9(1):55–66CrossRefGoogle Scholar
  117. 117.
    Kursa MB (2014) Robustness of random Forest-based gene selection methods. BMC bioinformatics 15(1):1–8CrossRefGoogle Scholar
  118. 118.
    Kuznetsova I, Lugmayr A, Holzinger A (2018) Visualisation Methods of Hierarchical Biological Data: A Survey and Review. International SERIES on Information Systems and Management in Creative eMedia (CreMedia) (2017/2), 32–39Google Scholar
  119. 119.
    Lamparter D, Marbach D, Rueedi R, Kutalik Z, Bergmann S (2016) Fast and rigorous computation of gene and pathway scores from SNP-based summary statistics. PLoS Comput Biol 12(1):e1004714CrossRefGoogle Scholar
  120. 120.
    Lan K, Wang DT, Fong S, Liu LS, Wong KK, Dey N (2018) A survey of data mining and deep learning in bioinformatics. J Med Syst 42(8):139CrossRefGoogle Scholar
  121. 121.
    Lancashire LJ, Rees RC, Ball GR (2008) Identification of gene transcript signatures predictive for estrogen receptor and lymph node status using a stepwise forward selection artificial neural network modelling approach. Artif Intell Med 43(2):99–111CrossRefGoogle Scholar
  122. 122.
    Landfors M, Philip P, Rydén P, Stenberg P (2011) Normalization of high dimensional genomics data where the distribution of the altered variables is skewed. PLoS One 6(11)Google Scholar
  123. 123.
    Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 9(4):1106–1119CrossRefGoogle Scholar
  124. 124.
    Lazzeroni L, Owen A (2002) Plaid models for gene expression data. Stat Sin 12(1):61–86MathSciNetzbMATHGoogle Scholar
  125. 125.
    Lê Cao KA, Rohart F, McHugh L, Korn O, Wells CA (2014) YuGene: a simple approach to scale gene expression data derived from different platforms for integrated analyses. Genomics 103(4):239–251CrossRefGoogle Scholar
  126. 126.
    Leardi R, Nørgaard L (2004) Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions. Journal of Chemometrics: A Journal of the Chemometrics Society 18(11):486–497CrossRefGoogle Scholar
  127. 127.
    Lee PS, Lee KH (2000) Genomic analysis. Curr Opin Biotechnol 11(2):171–175CrossRefGoogle Scholar
  128. 128.
    Lee Y, Lee CK (2003) Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 19(9):1132–1139CrossRefGoogle Scholar
  129. 129.
    Lee G, Rodriguez C, Madabhushi A (2008) Investigating the efficacy of nonlinear dimensionality reduction schemes in classifying gene and protein expression studies. IEEE/ACM Transactions on Computational Biology and Bioinformatics 5(3):368–384CrossRefGoogle Scholar
  130. 130.
    Lee AB, Luca D, Klei L, Devlin B, Roeder K (2010) Discovering genetic ancestry using spectral graph theory. Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology. Society 34(1):51–59Google Scholar
  131. 131.
    Leung YF, Cavalieri D (2003) Fundamentals of cDNA microarray data analysis. Trends Genet 19(11):649–659CrossRefGoogle Scholar
  132. 132.
    Li L, Weinberg CR, Darden TA, Pedersen LG (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12):1131–1142CrossRefGoogle Scholar
  133. 133.
    Li L, Darden TA, Weingberg CR, Levine AJ, Pedersen LG (2001) Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Comb Chem High Throughput Screen 4(8):727–739CrossRefGoogle Scholar
  134. 134.
    Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15):2429–2437CrossRefGoogle Scholar
  135. 135.
    Li Q, Fraley C, Bumgarner RE, Yeung KY, Raftery AE (2005) Donuts, scratches and blanks: robust model-based segmentation of microarray images. Bioinformatics 21(12):2875–2882CrossRefGoogle Scholar
  136. 136.
    Li MW, Han DF, Wang WL (2015) Vessel traffic flow forecasting by RSVR with chaotic cloud simulated annealing genetic algorithm and KPCA. Neurocomputing 157:243–255CrossRefGoogle Scholar
  137. 137.
    Li J, Malley JD, Andrew AS, Karagas MR, Moore JH (2016) Detecting gene-gene interactions using a permutation-based random forest method. BioData mining 9(1):14CrossRefGoogle Scholar
  138. 138.
    Liang H, Sun D, Ding Z, Ge M (2015) Protein function prediction using multi-label learning and ISOMAP embedding. In: Bio-inspired computing-theories and applications. Springer, Berlin, pp 249–259CrossRefGoogle Scholar
  139. 139.
    Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P (2015) The molecular signatures database hallmark gene set collection. Cell systems 1(6):417–425CrossRefGoogle Scholar
  140. 140.
    Liew AWC, Law NF, Yan H (2010) Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Brief Bioinform 12(5):498–513CrossRefGoogle Scholar
  141. 141.
    Liu H, Li J, Wong L (2002) A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome informatics 13:51–60Google Scholar
  142. 142.
    Liu B, Cui Q, Jiang T, Ma S (2004) A combinational feature selection and ensemble neural network method for classification of gene expression data. BMC bioinformatics 5(1):1–12CrossRefGoogle Scholar
  143. 143.
    Liu Z, Chen D, Bensmail H (2005) Gene expression data classification with kernel principal component analysis. Biomed Res Int 2005(2):155–159Google Scholar
  144. 144.
    Liu J, Pérez-Liébana D, Lucas SM (2017) Bandit-based random mutation hill-climbing. In IEEE Congress on Evolutionary Computation (CEC) 2145–2151Google Scholar
  145. 145.
    Loomba R, Schork N, Chen CH, Bettencourt R, Bhatt A, Ang B, Nguyen P, Hernandez C, Richards L, Salotti J, Lin S, Seki E, Nelson KE, Sirlin CB, Brenner D (2015) Heritability of hepatic fibrosis and steatosis based on a prospective twin study. Gastroenterology 149(7):1784–1793CrossRefGoogle Scholar
  146. 146.
    Lu H, Meng Y, Yan K, Xue Y, Gao Z (2017) Classifying Non-linear Gene Expression Data Using a Novel Hybrid Rotation Forest Method. In Springer International Conference on Intelligent Computing 732–743Google Scholar
  147. 147.
    Luo F, Tang K, Khan L (2003, March) Hierarchical clustering of gene expression data. In Proceedings. Third IEEE Symposium on Bioinformatics and. Bioengineering:328–335Google Scholar
  148. 148.
    Mallick P, Ghosh O, Seth P, Ghosh A (2019) Kohonen’s Self-organizing Map Optimizing Prediction of Gene Dependency for Cancer Mediating Biomarkers. In Springer Emerging Technologies in Data Mining and Information Security 863–870Google Scholar
  149. 149.
    Manikandan SP, Manimegalai R, Hariharan M (2016) Gene selection from microarray data using binary Grey Wolf algorithm for classifying acute leukemia. Current Signal Transduction Therapy 11(2):76–83CrossRefGoogle Scholar
  150. 150.
    Mann KM, Newberg JY, Black MA, Jones DJ, Amaya-Manzanares F, Guzman-Rojas L, Kodama T, Ward JM, Rust AG, Weyden L, Yew CCK, Waters JL, Leung ML, Rogers K, Rogers SM, McNoe LA, Selvanesan L, Navin N, Jenkins NA, Copeland NG, Mann MB (2016) Analyzing tumor heterogeneity and driver genes in single myeloid leukemia cells with SBCapSeq. Nat Biotechnol 34(9):962–972CrossRefGoogle Scholar
  151. 151.
    McCarthy MI (2010) Genomics, type 2 diabetes, and obesity. N Engl J Med 363(24):2339–2350CrossRefGoogle Scholar
  152. 152.
    McGee M, Chen Z (2006) Parameter estimation for the exponential-normal convolution model for background correction of affymetrix GeneChip data. Statistical applications in genetics and molecular biology 5(1)Google Scholar
  153. 153.
    McInerney JO, Smith T, Mahony S, Golden A (2017) Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models. Cancer Google Scholar
  154. 154.
    McLachlan GJ, Bean RW, Peel D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18(3):413–422CrossRefGoogle Scholar
  155. 155.
    McPherson JD, Marra M, Hillier L, Waterston RH, Chinwalla A, Wallis J, Fulton R (2001) A physical map of the human genome. Nature 409(6822):934–942CrossRefGoogle Scholar
  156. 156.
    McSharry PE, Crampin EJ (2016) Identifying statistically significant patterns in gene expression data arXiv preprint arXiv:1606.02801Google Scholar
  157. 157.
    Medvedovic M, Sivaganesan S (2002) Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics 18(9):1194–1206CrossRefGoogle Scholar
  158. 158.
    Mehrotra P (2016) Biosensors and their applications–a review. Journal of oral biology and craniofacial research 6(2):153–159CrossRefGoogle Scholar
  159. 159.
    Melo ALDA, Soccol VT, Soccol CR (2016) Bacillus thuringiensis: mechanism of action, resistance, and new applications: a review. Crit Rev Biotechnol 36(2):317–326CrossRefGoogle Scholar
  160. 160.
    Meng J, Zhang J, Luan Y (2015) Gene selection integrated with biological knowledge for plant stress response using neighborhood system and rough set theory. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 12(2):433–444CrossRefGoogle Scholar
  161. 161.
    Min X, Wang H, Yang Z, Ge S, Zhang J, Shao N (2015) Relevant component locally linear embedding dimensionality reduction for gene expression data analysis. Metallurgical & Mining Industry 4:186–194Google Scholar
  162. 162.
    Moorthy K, Saberi Mohamad M, Deris S (2014) A review on missing value imputation algorithms for microarray gene expression data. Curr Bioinforma 9(1):18–22CrossRefGoogle Scholar
  163. 163.
    Murray SN, Walsh BP, Kelliher D, O'Sullivan DTJ (2014) Multi-variable optimization of thermal energy efficiency retrofitting of buildings using static modelling and genetic algorithms–a case study. Build Environ 75:98–107CrossRefGoogle Scholar
  164. 164.
    National Research Council. (1988). Mapping and sequencing the human genome. National Academies PressGoogle Scholar
  165. 165.
    Newton MA, Kendziorski CM, Richmond CS, Blattner FR, Tsui KW (2001) On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J Comput Biol 8(1):37–52CrossRefGoogle Scholar
  166. 166.
    Nilsson J (2006) Nonlinear dimensionality reduction of gene expression data. Centre for Mathematical Sciences, Lund UniversityGoogle Scholar
  167. 167.
    Nimmy SF, Sarowar MG, Dey N, Ashour AS, Santosh KC (2018) Investigation of DNA discontinuity for detecting tuberculosis. Journal of Ambient Intelligence and Humanized Computing 1–15Google Scholar
  168. 168.
    Njeunje FON, Czaja W, Benedetto JJ (2014) Linear and Non-linear Dimension Reduction Applied to Gene Expression Data of Cancer Tissue SamplesGoogle Scholar
  169. 169.
    Oba S, Sato MA, Takemasa I, Monden M, Matsubara KI, Ishii S (2003) A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16):2088–2096CrossRefGoogle Scholar
  170. 170.
    Oghabian A, Kilpinen S, Hautaniemi S, Czeizler E (2014) Biclustering methods: biological relevance and application in gene expression analysis. PLoS One 9(3):e90801CrossRefGoogle Scholar
  171. 171.
    Ogutu JO, Schulz-Streeck T, Piepho HP (2012) Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions. BMC Proc 6(2):1–6Google Scholar
  172. 172.
    Orsenigo C, Vercellis C (2013) Dimensionality reduction via isomap with lock-step and elastic measures for time series gene expression classification. In European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics Springer (Berlin, Heidelberg) 92–103Google Scholar
  173. 173.
    Ott J, Wang J, Leal SM (2015) Genetic linkage analysis in the age of whole-genome sequencing. Nat Rev Genet 16(5):275–284CrossRefGoogle Scholar
  174. 174.
    Palmer OMP, Rogers G, Yende S, Angus DC, Clermont G, Langston MA (2018) Graph theoretical analysis of genome-scale data: examination of gene activation occurring in the setting of community-acquired pneumonia. Shock 50(1):53–59CrossRefGoogle Scholar
  175. 175.
    Pan M, Zhang J (2018) Quantile normalization for combining gene-expression datasets. Biotechnology & Biotechnological Equipment 32(3):751–758MathSciNetCrossRefGoogle Scholar
  176. 176.
    Paradis E, Gosselin T, Goudet J, Jombart T, Schliep K (2017) Linking genomics and population genetics with R. Mol Ecol Resour 17(1):54–66CrossRefGoogle Scholar
  177. 177.
    Parikshak NN, Swarup V, Belgard TG, Irimia M, Ramaswami G, Gandal MJ, Harti C, Leppa V, Ubieta LT, Huang J, Lowe JK, Blencowe BJ, Horvath S, Geschwind DH (2016) Genome-wide changes in lncRNA, splicing, and regional gene expression patterns in autism. Nature 540(7633):423–427CrossRefGoogle Scholar
  178. 178.
    Parmigiani G, Garrett ES, Irizarry RA, Zeger SL (2003) The analysis of gene expression data: an overview of methods and software. In The analysis of gene expression data Springer (New York, NY) 1–45Google Scholar
  179. 179.
    Parry RM, Jones W, Stokes TH, Phan JH, Moffitt RA, Fang H, Shi L, Oberthuer A, Fischer M, Tong W, Wang MD (2010) K-nearest neighbor models for microarray gene expression analysis and clinical outcome prediction. The pharmacogenomics journal 10(4):292–309CrossRefGoogle Scholar
  180. 180.
    Perkins AD, Langston MA (2009) Threshold selection in gene co-expression networks using spectral graph theory techniques. In BMC bioinformatics 10 (11): S4Google Scholar
  181. 181.
    Petralia F, Wang P, Yang J, Tu Z (2015) Integrative random forest for gene regulatory network inference. Bioinformatics 31(12):197–205CrossRefGoogle Scholar
  182. 182.
    Pickett JA, Khan ZR (2016) Plant volatile-mediated signalling and its application in agriculture: successes and challenges. New Phytol 212(4):856–870CrossRefGoogle Scholar
  183. 183.
    Pillati M, Viroli C (2005) Locally linear embedding for nonlinear dimension reduction in classification problems: an application to gene expression data. Statistica 65(1):61–71MathSciNetzbMATHGoogle Scholar
  184. 184.
    Pillati M, Viroli C (2005) Supervised locally linear embedding for classification: an application to gene expression data analysis. In Proceedings of 29th Annual Conference of the German Classification Society 15–18Google Scholar
  185. 185.
    Prabhakaran S, Azizi E, Carr A, Pe’er D (2016) Dirichlet process mixture model for correcting technical variation in single-cell gene expression data. In International Conference on Machine Learning 1070–1079Google Scholar
  186. 186.
    Qiu X, Wu H, Hu R (2013) The impact of quantile and rank normalization procedures on the testing power of gene differential expression analysis. BMC bioinformatics 14(1):1–10CrossRefGoogle Scholar
  187. 187.
    Quang D, Xie X (2016) DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res 44(11):1–6CrossRefGoogle Scholar
  188. 188.
    Rajan K (2015) Materials informatics: the materials “gene” and big data. Annu Rev Mater Res 45:153–169CrossRefGoogle Scholar
  189. 189.
    Rajan K (2015) Materials informatics: the materials “gene” and big data. Annu Rev Mater Res 45:153–169CrossRefGoogle Scholar
  190. 190.
    Ramalho JS, Tolmachova T, Hume AN, McGuigan A, Gregory-Evans CY, Huxley C, Seabra MC (2001) Chromosomal mapping, gene structure and characterization of the human and murine RAB27B gene. BMC Genet 2(1)Google Scholar
  191. 191.
    Ray SS, Ganivada A, Pal SK (2016) A granular self-organizing map for clustering and gene selection in microarray data. IEEE transactions on neural networks and learning systems 27(9):1890–1906MathSciNetCrossRefGoogle Scholar
  192. 192.
    Reverter F, Vegas E, Oller JM (2014) Kernel-PCA data integration with enhanced interpretability. BMC Syst Biol 8(2):1–9Google Scholar
  193. 193.
    Ritchie ME, Silver J, Oshlack A, Holmes M, Diyagama D, Holloway A, Smyth GK (2007) A comparison of background correction methods for two-colour microarrays. Bioinformatics 23(20):2700–2707CrossRefGoogle Scholar
  194. 194.
    Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D (2015) Methods of integrating data to uncover genotype–phenotype interactions. Nat Rev Genet 16(2):85–97CrossRefGoogle Scholar
  195. 195.
    Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140CrossRefGoogle Scholar
  196. 196.
    Rocke DM, Durbin B (2003) Approximate variance-stabilizing transformations for gene-expression microarray data. Bioinformatics 19(8):966–972CrossRefGoogle Scholar
  197. 197.
    Rodríguez-Rodríguez J, Sevilla A, Martínez-Bazán C, Gordillo JM (2015) Generation of microbubbles with applications to industry and medicine. Annu Rev Fluid Mech 47:405–429MathSciNetCrossRefGoogle Scholar
  198. 198.
    Roffler GH, Schwartz MK, Pilgrim KL, Talbot SL, Sage GK, Adams LG, Luikart G (2016) Identification of landscape features influencing gene flow: how useful are habitat selection models? Evol Appl 9(6):805–817CrossRefGoogle Scholar
  199. 199.
    Romualdi C, Campanaro S, Campagna D, Celegato B, Cannata N, Toppo S, Valle G, Lanfranchi G (2003) Pattern recognition in gene expression profiling using DNA array: a comparative study of different statistical methods applied to cancer classification. Hum Mol Genet 12(8):823–836CrossRefGoogle Scholar
  200. 200.
    Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2006) Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recogn 39(12):2383–2392CrossRefGoogle Scholar
  201. 201.
    Rupp R, Mucha S, Larroque H, McEwan J, Conington J (2016) Genomic application in sheep and goat breeding. Animal Frontiers 6(1):39–44CrossRefGoogle Scholar
  202. 202.
    Ryman N (2006) Chifish: a computer program testing for genetic heterogeneity at multiple loci using chi-square and Fisher's exact test. Mol Ecol Notes 6(1):285–287CrossRefGoogle Scholar
  203. 203.
    Saelens W, Cannoodt R, Saeys Y (2018) A comprehensive evaluation of module detection methods for gene expression data. Nat Commun 9(1):1–12CrossRefGoogle Scholar
  204. 204.
    Saghir H, Megherbi DB (2013) An efficient comparative machine learning-based metagenomics binning technique via using Random forest. In IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA) 191–196Google Scholar
  205. 205.
    Salleh AHM, Mohamad MS, Deris S, Omatu S, Fdez-Riverola F, Corchado JM (2015) Gene knockout identification for metabolite production improvement using a hybrid of genetic ant colony optimization and flux balance analysis. Biotechnol Bioprocess Eng 20(4):685–693CrossRefGoogle Scholar
  206. 206.
    Saul LK, Weinberger KQ, Ham JH, Sha F, Lee DD (2006) Spectral methods for dimensionality reduction. Semisupervised learning:293–308Google Scholar
  207. 207.
    Schmitt P, Mandel J, Guedj M (2015) A comparison of six methods for missing data imputation. Journal of Biometrics & Biostatistics 6(1):1–6Google Scholar
  208. 208.
    Seno A, Kasai T, Ikeda M, Vaidyanath A, Masuda J, Mizutani A, Murakami H, Ishikawa T, Seno M (2016) Characterization of gene expression patterns among artificially developed cancer stem cells using spherical self-organizing map. Cancer informatics 15, CIN-S39839Google Scholar
  209. 209.
    Sewer A, Gubian S, Kogel U, Veljkovic E, Han W, Hengstermann A, Peitsch MC, Hoeng J (2014) Assessment of a novel multi-array normalization method based on spike-in control probes suitable for microRNA datasets with global decreases in expression. BMC research notes 7(1):1–18CrossRefGoogle Scholar
  210. 210.
    Shabani M, Borry P (2015) Challenges of web-based personal genomic data sharing. Life sciences, society and policy 11(1):1–13CrossRefGoogle Scholar
  211. 211.
    Shamir R, Sharan R (2002) Algorithmic approaches to clustering gene expression data. Current Topics in Computational Molecular Biology 269Google Scholar
  212. 212.
    Sharbaf FV, Mosafer S, Moattar MH (2016) A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Genomics 107(6):231–238CrossRefGoogle Scholar
  213. 213.
    Shehu A, De Jong KA (2014) Evolutionary search algorithms for protein modeling: from de novo structure prediction to comprehensive maps of functionally-relevant structures of protein chains and assemblies. In Proceedings of the ACM Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation 839–856Google Scholar
  214. 214.
    Sherlock G (2000) Analysis of large-scale gene expression data. Curr Opin Immunol 12(2):201–205CrossRefGoogle Scholar
  215. 215.
    Shimada K, Nakamura M, Ishida E, Higuchi T, Yamamoto H, Tsujikawa K, Konishi N (2008) Prostate cancer antigen-1 contributes to cell survival and invasion though discoidin receptor 1 in human prostate cancer. Cancer Sci 99(1):39–45Google Scholar
  216. 216.
    Shreem SS, Abdullah S, Nazri MZA (2014) Hybridising harmony search with a Markov blanket for gene selection problems. Inf Sci 258:108–121MathSciNetCrossRefGoogle Scholar
  217. 217.
    Simerska P, Moyle PM, Toth I (2011) Modern lipid-, carbohydrate-, and peptide-based delivery systems for peptide, vaccine, and gene products. Med Res Rev 31(4):520–547CrossRefGoogle Scholar
  218. 218.
    Simko I (2016) High-resolution DNA melting analysis in plant research. Trends Plant Sci 21(6):528–537CrossRefGoogle Scholar
  219. 219.
    Singh D, al e (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209CrossRefGoogle Scholar
  220. 220.
    Slonim DK (2002) From patterns to pathways: gene expression data analysis comes of age. Nat Genet 32:502–508CrossRefGoogle Scholar
  221. 221.
    Southern EM (1992) Genome mapping: cDNA approaches. Curr Opin Genet Dev 2(3):412–416CrossRefGoogle Scholar
  222. 222.
    Steiner L, Hopp L, Wirth H, Galle J, Binder H, Prohaska SJ, Rohlf T (2012) A global genome segmentation method for exploration of epigenetic patterns. PLoS One 7(10)Google Scholar
  223. 223.
    Sun S, Peng Q, Shakoor A (2014) A kernel-based multivariate feature selection method for microarray data classification. PloS one 9(7)Google Scholar
  224. 224.
    Tabakhi S, Najafi A, Ranjbar R, Moradi P (2015) Gene selection for microarray data classification using a novel ant colony optimization. Neurocomputing 168:1024–1036CrossRefGoogle Scholar
  225. 225.
    Tan AC, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classificationGoogle Scholar
  226. 226.
    Tang EK, Suganthan PN, Yao X (2006) Gene selection algorithms for microarray data based on least squares support vector machine. BMC bioinformatics 7(1):95CrossRefGoogle Scholar
  227. 227.
    Tang H, Jiang X, Wang X, Wang S, Sofia H, Fox D, Lauter K, Malin B, Telenti A, Xiong L, Ohno-Machado L (2016) Protecting genomic data analytics in the cloud: state of the art and opportunities. BMC Med Genet 9(1):1–9Google Scholar
  228. 228.
    Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci 99(10):6567–6572CrossRefGoogle Scholar
  229. 229.
    Tran LH, Tran LH (2017) Applications of (SPARSE)-PCA and LAPLACIAN EIGENMAPS to biological network inference problem using gene expression data. International Journal of Advances in Soft Computing & Its Applications 9(2):45–62Google Scholar
  230. 230.
    Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525CrossRefGoogle Scholar
  231. 231.
    Tuikkala J, Elo LL, Nevalainen OS, Aittokallio T (2008) Missing value imputation improves clustering and interpretation of gene expression microarray data. BMC bioinformatics 9(1):1–14CrossRefGoogle Scholar
  232. 232.
    Tutz G, Ramzan S (2015) Improved methods for the imputation of missing data by nearest neighbor methods. Computational Statistics & Data Analysis 90:84–99MathSciNetzbMATHCrossRefGoogle Scholar
  233. 233.
    Uğuz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl-Based Syst 24(7):1024–1032CrossRefGoogle Scholar
  234. 234.
    van Dijk D, Nainys J, Sharma R, Kathail P, Carr AJ, Moon KR, Mazutis L, Wolf G, Krishnaswamy S, Pe'er D (2017) MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data. BioRxiv Google Scholar
  235. 235.
    Venna J, Peltonen J, Nybo K, Aidos H, Kaski S (2010) Information retrieval perspective to nonlinear dimensionality reduction for data visualization. J Mach Learn Res 11(Feb):451–490MathSciNetzbMATHGoogle Scholar
  236. 236.
    Vepakomma P, Elgammal A (2016) A fast algorithm for manifold learning by posing it as a symmetric diagonally dominant linear system. Appl Comput Harmon Anal 40(3):622–628MathSciNetzbMATHCrossRefGoogle Scholar
  237. 237.
    Vidaki A, Johansson C, Giangasparo F, Court DS (2017) Differentially methylated embryonal Fyn-associated substrate (EFS) gene as a blood-specific epigenetic marker and its potential application in forensic casework. Forensic Science International: Genetics 29:165–173CrossRefGoogle Scholar
  238. 238.
    Vohradsky J (2001) Neural network model of gene expression. FASEB J 15(3):846–854CrossRefGoogle Scholar
  239. 239.
    Wang H, van der Laan MJ (2011) Dimension reduction with gene expression data using targeted variable importance measurement. BMC bioinformatics 12(1):1–12CrossRefGoogle Scholar
  240. 240.
    Wang Z, Li G, Robinson RW, Huang X (2016) UniBic: sequential row-based biclustering algorithm for analysis of gene expression data. Sci Rep 6:1–10CrossRefGoogle Scholar
  241. 241.
    Wang A, An N, Yang J, Chen G, Li L, Alterovitz G (2017) Wrapper-based gene selection with Markov blanket. Comput Biol Med 81:11–23CrossRefGoogle Scholar
  242. 242.
    Westcott SL, Schloss PD (2015) De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units. PeerJ 3:e1487CrossRefGoogle Scholar
  243. 243.
    Willems E, Leyns L, Vandesompele J (2008) Standardization of real-time PCR gene expression data from independent biological replicates. Anal Biochem 379(1):127–129CrossRefGoogle Scholar
  244. 244.
    Wilson A, Fenton B, Malloch G, Boag B, Hubbard S, Begg G (2016) Urbanisation versus agriculture: a comparison of local genetic diversity and gene flow between wood mouse Apodemus sylvaticus populations in human-modified landscapes. Ecography 39(1):87–97CrossRefGoogle Scholar
  245. 245.
    Wong MH, Mutch DM, McNicholas PD (2017) Two-way learning with one-way supervision for gene expression data. BMC bioinformatics 18(1):150CrossRefGoogle Scholar
  246. 246.
    Xu Y, Olman V, Xu D (2002) Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. Bioinformatics 18(4):536–545CrossRefGoogle Scholar
  247. 247.
    Xu R, Damelin S, Wunsch DC (2007) Applications of diffusion maps in gene expression data-based cancer diagnosis analysis. In IEEE 29th annual international conference of Engineering in medicine and biology society 4613–4616Google Scholar
  248. 248.
    Xu J, Mu H, Wang Y, Huang F (2018) Feature genes selection using supervised locally linear embedding and correlation coefficient for microarray classification. Computational and mathematical methods in medicine 2018(5490513):1–11Google Scholar
  249. 249.
    Xuan P, Guo MZ, Wang J, Wang CY, Liu XY, Liu Y (2011) Genetic algorithm-based efficient feature selection for classification of pre-miRNAs. Genet Mol Res 10(2):588–603CrossRefGoogle Scholar
  250. 250.
    Yang YH, Buckley MJ, Dudoit S, Speed TP (2002) Comparison of methods for image analysis on cDNA microarray data. J Comput Graph Stat 11(1):108–136MathSciNetCrossRefGoogle Scholar
  251. 251.
    Yang Y, Xie B, Yan J (2014) Application of next-generation sequencing technology in forensic science. Genomics, proteomics & bioinformatics 12(5):190–197CrossRefGoogle Scholar
  252. 252.
    Ye J, Li T, Xiong T, Janardan R (2004) Using uncorrelated discriminant analysis for tissue classification with gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 1(4):181–190CrossRefGoogle Scholar
  253. 253.
    Yeung KY, Haynor DR, Ruzzo WL (2001) Validating clustering for gene expression data. Bioinformatics 17(4):309–318CrossRefGoogle Scholar
  254. 254.
    Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987CrossRefGoogle Scholar
  255. 255.
    Yu Z, Wong HS, Wang H (2007) Graph-based consensus clustering for class discovery from gene expression data. Bioinformatics 23(21):2888–2896CrossRefGoogle Scholar
  256. 256.
    Yuan B, Zhang C, Shao X (2015) A late acceptance hill-climbing algorithm for balancing two-sided assembly lines with multiple constraints. J Intell Manuf 26(1):159–168CrossRefGoogle Scholar
  257. 257.
    Zamani-Dahaj SA, Okasha M, Kosakowski J, Higgs PG (2016) Estimating the frequency of horizontal gene transfer using phylogenetic models of gene gain and loss. Mol Biol Evol 33(7):1843–1857CrossRefGoogle Scholar
  258. 258.
    Zeng T, Li R, Mukkamala R, Ye J, Ji S (2015) Deep convolutional neural networks for annotating gene expression patterns in the mouse brain. BMC bioinformatics 16(1):1–10CrossRefGoogle Scholar
  259. 259.
    Zhang S, Chen S, Li W, Guo X, Zhao P, Xu J, Chen Y, Pan Q, Liu X, Lu H, Wang Y, Pei D, Esteban MA (2011) Rescue of ATP7B function in hepatocyte-like cells from Wilson's disease induced pluripotent stem cells using gene therapy or the chaperone drug curcumin. Hum Mol Genet 20(16):3176–3187CrossRefGoogle Scholar
  260. 260.
    Zhang L, Qian L, Ding C, Zhou W, Li F (2015) Similarity-balanced discriminant neighbor embedding and its application to cancer classification based on gene expression data. Comput Biol Med 64:236–245CrossRefGoogle Scholar
  261. 261.
    Zhu Z, Ong YS, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn 40(11):3236–3248zbMATHCrossRefGoogle Scholar
  262. 262.
    Zou Q, Zeng J, Cao L, Ji R (2016) A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing 173:346–354CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Information Technology & EngineeringVellore Institute of TechnologyVelloreIndia
  2. 2.Department of Information TechnologyTechno India College of TechnologyKolkataIndia

Personalised recommendations