Skip to main content
Log in

Pattern analysis of genetics and genomics: a survey of the state-of-art

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The endless enhancement and decreasing charges of a complete human genome have given rise to fast acceptance of genetic and genomic information at both research institutions and clinics. Biologists are enchanting the primary steps in the direction of knowing the locations and functions of all the genes and controlling sites in the genomes of various organisms. As these researchers govern the nucleotide arrangement of large stretches of the human genome, they are constructing excessive volumes of sequence data. Direct research laboratory investigation of this data is expensive and tough, creating computational techniques vital. The arena of pattern analysis, which intends to build computer algorithms that enhance with knowledge, embraces the capacity to empower computers to support humans in the analysis of complex, large genetic and genomic data sets. Here, an overview of pattern analysis techniques for the study of genome sequencing datasets, as well as the proteomics, epigenetic and metabolomic data is delivered. These techniques employ data pre-processing, feature extraction and selection, classification and clustering. The aim of this survey is to present deliberations and recurring challenges in the application of pattern analysis methods, as well as of discriminative and reproductive modeling approaches and discuss the future research directions of these methods for the analysis of genomic and genetic data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Abeel T, Helleputte T, Van de Peer Y, Dupont P, Saeys Y (2009) Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3):392–398

    Article  Google Scholar 

  2. Ahmed AA, Vias M, Iyer NG, Caldas C, Brenton JD (2004) Microarray segmentation methods significantly influence data precision. Nucleic Acids Res 32(5):1–7

    Article  Google Scholar 

  3. Akgün M, Bayrak AO, Ozer B, Sağıroğlu MŞ (2015) Privacy preserving processing of genomic data: a survey. J Biomed Inform 56:103–111

    Article  Google Scholar 

  4. Alexa A, Rahnenführer J, Lengauer T (2006) Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22(13):1600–1607

    Article  Google Scholar 

  5. Alexe G, Alexe S, Hammer PL, Vizvari B (2006) Pattern-based feature selection in genomics and proteomics. Ann Oper Res 148(1):189–201

    Article  MATH  Google Scholar 

  6. Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831–838

    Article  Google Scholar 

  7. Allendorf FW, Hohenlohe PA, Luikart G (2010) Genomics and the future of conservation genetics. Nat Rev Genet 11(10):697–709

    Article  Google Scholar 

  8. Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci 99(10):6562–6566

    Article  MATH  Google Scholar 

  9. Angerer P, Haghverdi L, Büttner M, Theis FJ, Marr C, Buettner F (2015) Destiny: diffusion maps for large-scale single-cell data in R. Bioinformatics 32(8):1241–1243

    Article  Google Scholar 

  10. Arcuri A (2018) Evaluating search-based techniques with statistical tests. In ACM Proceedings of the 11th International Workshop on Search-Based Software Testing 21–21

  11. Ardaneswari G, Bustamam A, Sarwinda D (2017) Implementation of plaid model biclustering method on microarray of carcinoma and adenoma tumor gene expression data. In Journal of Physics: Conference Series 893(1)

  12. Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30(1):41

    Article  Google Scholar 

  13. Arsenio J, Kakaradov B, Metz PJ, Kim SH, Yeo GW, Chang JT (2014) Early specification of CD8+ T lymphocyte fates during adaptive immunity revealed by single-cell gene-expression analyses. Nat Immunol 15(4):365–372

    Article  Google Scholar 

  14. Aßhauer KP, Wemheuer B, Daniel R, Meinicke P (2015) Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data. Bioinformatics 31(17):2882–2884

    Article  Google Scholar 

  15. Ayday E, Raisaro JL, Hengartner U, Molyneaux A, Hubaux JP (2014) Privacy-preserving processing of raw genomic data. In Data Privacy Management and Autonomous Spontaneous Security Springer (Berlin, Heidelberg) 133–147

  16. Barros RC, Basgalupp MP, Freitas AA, De Carvalho AC (2014) Evolutionary design of decision-tree algorithms tailored to microarray gene expression data sets. IEEE Trans Evol Comput 18(6):873–892

    Article  Google Scholar 

  17. Bartenhagen C, Klein HU, Ruckert C, Jiang X, Dugas M (2010) Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data. BMC bioinformatics 11(1):1–11

    Article  Google Scholar 

  18. Ben-Dor A, Chor B, Karp R, Yakhini Z (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10(3–4):373–384

    Article  Google Scholar 

  19. Best MG, Sol N, Kooi I, Tannous J, Westerman BA, Rustenburg F, Schellen P, Verschueren H, Post E, Koster J, Ylstra B, Ameziane N, Dorsman J, Smit EF, Verheul HM, Noske DP, Rejineveld JC, Nilsson JA, Wurdinger T (2015) RNA-Seq of tumor-educated platelets enables blood-based pan-cancer, multiclass, and molecular pathway cancer diagnostics. Cancer Cell 28(5):666–676

    Article  Google Scholar 

  20. Birnbaum K, Shasha DE, Wang JY, Jung JW, Lambert GM, Galbraith DW, Benfey PN (2003) A gene expression map of the Arabidopsis root. Science 302(5652):1956–1960

    Article  Google Scholar 

  21. Bolón-Canedo V, Sánchez-Marono N, Alonso-Betanzos A, Benítez JM, Herrera F (2014) A review of microarray datasets and applied feature selection methods. Inf Sci 282:111–135

    Article  Google Scholar 

  22. Botía JA et al (2017) An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks. BMC Syst Biol 11(1):47

    Article  Google Scholar 

  23. Brennecke P, Reyes A, Pinto S, Rattay K, Nguyen M, Küchler R, Huber W, Kyewski B, Steinmetz LM (2015) Single-cell transcriptome analysis reveals coordinated ectopic gene-expression patterns in medullary thymic epithelial cells. Nat Immunol 16(9):933–941

    Article  Google Scholar 

  24. Brozynska M, Furtado A, Henry RJ (2016) Genomics of crop wild relatives: expanding the gene pool for crop improvement. Plant Biotechnol J 14(4):1070–1085

    Article  Google Scholar 

  25. Bruneau M, Mottet T, Moulin S, Kerbiriou M, Chouly F, Chretien S, Guyeux C (2016) A clustering tool for nucleotide sequences using Laplacian Eigenmaps and Gaussian Mixture Models. arXiv preprint 1–17

  26. Bumgarner R (2013) Overview of DNA microarrays: types, applications, and their future. Current protocols in molecular biology 101(1):1–11

    Google Scholar 

  27. Caldecott KW (2008) Single-strand break repair and genetic disease. Nat Rev Genet 9(8):619–631

    Article  Google Scholar 

  28. Campbell K, Ponting CP, Webber C (2015) Laplacian eigenmaps and principal curves for high resolution pseudotemporal ordering of single-cell RNA-seq profiles. bioRxiv

  29. Castillo-Davis CI, Hartl DL (2003) GeneMerge—post-genomic analysis, data mining, and hypothesis testing. Bioinformatics 19(7):891–892

    Article  Google Scholar 

  30. Çetin GS, Chen H, Laine K, Lauter K, Rindal P, Xia Y (2017) Private queries on encrypted genomic data. BMC Med Genet 10(2):1–14

    Google Scholar 

  31. Chandra B, Gupta M (2011) Robust approach for estimating probabilities in Naïve–Bayes classifier for gene expression data. Expert Syst Appl 38(3):1293–1298

    Article  Google Scholar 

  32. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Computers & Electrical Engineering 40(1):16–28

    Article  Google Scholar 

  33. Chavez-Alvarez R, Chavoya A, Mendez-Vazquez A (2014) Discovery of possible gene relationships through the application of self-organizing maps to DNA microarray databases. PLoS One 9(4):e93233

    Article  Google Scholar 

  34. Cheadle C, Vawter MP, Freed WJ, Becker KG (2003) Analysis of microarray data using Z score transformation. The Journal of molecular diagnostics 5(2):73–81

    Article  Google Scholar 

  35. Chen YJ, Kodell R, Sistare F, Thompson KL, Morris S, Chen JJ (2003) Normalization methods for analysis of microarray gene-expression data. J Biopharm Stat 13(1):57–74

    Article  MATH  Google Scholar 

  36. Chen KH, Wang KJ, Tsai ML, Wang KM, Adrian AM, Cheng WC, Yang TS, Teng NC, Tan KP, Chang KS (2014) Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm. BMC bioinformatics 15(1):49

    Article  Google Scholar 

  37. Chen KH, Wang KJ, Wang KM, Angelia MA (2014) Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data. Appl Soft Comput 24:773–780

    Article  Google Scholar 

  38. Chen Y, Li Y, Narayan R, Subramanian A, Xie X (2016) Gene expression inference with deep learning. Bioinformatics 32(12):1832–1839

    Article  Google Scholar 

  39. Chen Y, Zhang Z, Zheng J, Ma Y, Xue Y (2017) Gene selection for tumor classification using neighborhood rough sets and entropy measures. J Biomed Inform 67:59–68

    Article  Google Scholar 

  40. Chen X, Huang JZ, Wu Q, Yang M (2017) Subspace weighting co-clustering of gene expression data. IEEE/ACM transactions on computational biology and bioinformatics

  41. Chinnaswamy A, Srinivasan R (2016) Hybrid feature selection using correlation coefficient and particle swarm optimization on microarray gene expression data. In Springer Innovations in Bio-Inspired Computing and Applications 229–239

  42. Chinnaswamy A, Srinivasan R (2017) Performance analysis of classifiers on filter-based feature selection approaches on microarray data. In Bio-Inspired Computing for Information Retrieval Applications 41–70

  43. Chou CC, Chen CH, Lee TT, Peck K (2004) Optimization of probe length and the number of probes per gene for optimal microarray analysis of gene expression. Nucleic Acids Res 32(12):1–8

    Article  Google Scholar 

  44. Chu Z, Cao B, Yu F (2018) Study on Ensemble based Clustering Algorithm for Gene Expression Data. In Journal of Physics: Conference Series 1069(1)

  45. Cohen IR, Domany E, Quintana FJ, Hed G, Getz G (2018) US Patent Application No 10(/082):503

    Google Scholar 

  46. Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szczesniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17(1):1–19

    Article  Google Scholar 

  47. Corus D, Dang DC, Eremeev AV, Lehre PK (2017) Level-based analysis of genetic algorithms and other search processes. IEEE Trans Evol Comput

  48. Craddock TJ, Harvey JM, Nathanson L, Barnes ZM, Klimas NG, Fletcher MA, Broderick G (2015) Using gene expression signatures to identify novel treatment strategies in gulf war illness. BMC Med Genet 8(1):1–13

    Google Scholar 

  49. Cui P, Zhong T, Wang Z, Wang T, Zhao H, Liu C, Lu H (2018) Identification of human circadian genes based on time course gene expression profiles by using a deep learning method. Biochimica et Biophysica Acta (BBA)-Molecular Basis of Disease 1864(6):2274–2283

    Article  Google Scholar 

  50. Dai J, Xu Q (2013) Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl Soft Comput 13(1):211–221

    Article  Google Scholar 

  51. Dai J, Xu Q (2013) Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl Soft Comput 13(1):211–221

    Article  Google Scholar 

  52. Dai JJ, Lieu L, Rocke D (2006) Dimension reduction for classification with gene expression microarray data. Statistical applications in genetics and molecular biology 5(1)

  53. Damelin SB, Gu Y, Wunsch DC, Xu R (2015) Fuzzy adaptive resonance theory diffusion maps and their applications to clustering and biclustering. Mathematical Modelling of Natural Phenomena 10(3):206–211

    Article  MathSciNet  MATH  Google Scholar 

  54. Danaee P, Ghaeini R, Hendrix DA (2017) A deep learning approach for cancer detection and relevant gene identification. In PACIFIC SYMPOSIUM ON BIOCOMPUTING 219–229

  55. Das K, Mishra D (2016) Hybridized univariate and multivariate filter based approaches for gene selection. Int J Pharm Bio Sci 7(3):1215–1226

    Google Scholar 

  56. Das S, Deb T, Dey N, Ashour AS, Bhattacharya DK, Tibarewala DN (2018) Optimal choice of k-mer in composition vector method for genome sequence comparison. Genomics 110(5):263–273

    Article  Google Scholar 

  57. DeLaughter DM, Bick AG, Wakimoto H, McKean D, Gorham JM, Kathiriya IS, Hinson JT, Gray J, Pu W, Bruneau BG, Seidman JG, Seidman CE (2016) Single-cell resolution of temporal gene expression during heart development. Dev Cell 39(4):480–490

    Article  Google Scholar 

  58. Dettling M, Bühlmann P (2003) Boosting for tumor classification with gene expression data. Bioinformatics 19(9):1061–1069

    Article  Google Scholar 

  59. Dettling M, Bühlmann P (2003) Boosting for tumor classification with gene expression data. Bioinformatics 19(9):1061–1069

    Article  Google Scholar 

  60. D'haeseleer P (2005) How does gene expression clustering work? Nat Biotechnol 23(12):1499–1501

    Article  Google Scholar 

  61. Dheda K, Huggett JF, Bustin SA, Johnson MA, Rook G, Zumla A (2004) Validation of housekeeping genes for normalizing RNA expression in real-time PCR. Biotechniques 37(1):112–119

    Article  Google Scholar 

  62. Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC bioinformatics 7(1):1–13

    Article  Google Scholar 

  63. Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinforma Comput Biol 3(02):185–205

    Article  Google Scholar 

  64. Dopazo J, Erten C (2017) Graph-theoretical comparison of normal and tumor networks in identifying BRCA genes. BMC Syst Biol 11(1):1–17

    Article  Google Scholar 

  65. Edwards D (2003) Non-linear normalization and background correction in one-channel cDNA microarray studies. Bioinformatics 19(7):825–833

    Article  Google Scholar 

  66. El-Assaad W, El-Kouhen K, Mohammad AH, Yang J, Morita M, Gamache I, Mamer O, Avizonis D, Hermance N, Kersten S, Tremblay ML, Kelliher MA, Teodoro JG (2015) Deletion of the gene encoding G0/G1 switch protein 2 (G0s2) alleviates high-fat-diet-induced weight gain and insulin resistance, and promotes browning of white adipose tissue in mice. Diabetologia 58(1):149–157

    Article  Google Scholar 

  67. Eren AM, Morrison HG, Lescault PJ, Reveillaud J, Vineis JH, Sogin ML (2015) Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences. The ISME journal 9(4):968–979

    Article  Google Scholar 

  68. Fan R, Zhong M, Wang S, Zhang Y, Andrew A, Karagas M, Chen H, Amos CI, Xiong M, Moore JH (2011) Entropy-based information gain approaches to detect and to characterize gene-gene and gene-environment interactions/correlations of complex diseases. Genet Epidemiol 35(7):706–721

    Article  Google Scholar 

  69. Fang HR, Sakellaridi S, Saad Y (2009) Multilevel nonlinear dimensionality reduction for manifold learning. Technical report, Minnesota Supercomputer Institute, University of Minnesota

  70. Frandsen PB, Calcott B, Mayer C, Lanfear R (2015) Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates. BMC Evol Biol 15(1):13

    Article  Google Scholar 

  71. Franzén O, Hu J, Bao X, Itzkowitz SH, Peter I, Bashir A (2015) Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering. Microbiome 3(1):43

    Article  Google Scholar 

  72. Friedman N, Linial M, Nachman I, Pe'er D (2000) Using Bayesian networks to analyze expression data. J Comput Biol 7(3–4):601–620

    Article  Google Scholar 

  73. Fundel K, Haag J, Gebhard PM, Zimmer R, Aigner T (2008) Normalization strategies for mRNA expression data in cartilage research. Osteoarthr Cartil 16(8):947–955

    Article  Google Scholar 

  74. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914

    Article  Google Scholar 

  75. Gamazon ER et al (2015) A gene-based association method for mapping traits using reference transcriptome data. Nat Genet 47(9):1091

    Article  Google Scholar 

  76. Gao C, McDowell IC, Zhao S, Brown CD, Engelhardt BE (2016) Context specific and differential gene co-expression networks via Bayesian biclustering. PLoS Comput Biol 12(7):e1004791

    Article  Google Scholar 

  77. Gardner JW, Boilot P, Hines EL (2005) Enhancing electronic nose performance by sensor selection using a new integer-based genetic algorithm approach. Sensors Actuators B Chem 106(1):114–121

    Article  Google Scholar 

  78. Geiss GK, Bumgarner RE, An MC, Agy MB, van't Wout AB, Hammersmark E, Carter V, Upchurch D, Mullins J, Katze MG (2000) Large-scale monitoring of host cell gene expression during HIV-1 infection using cDNA microarrays. Virology 266(1): 8–16

  79. Gerstung M, Pellagatti A, Malcovati L, Giagounidis A, Della Porta MG, Jädersten M, Dolatshad H, Verma A, Cross NCP, Vyas P, Hellström-Lindberg E, Cazzola M, Papaemmanuil E, Campbell PJ, Boultwood J, Killick S (2015) Combining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes. Nat Commun 6:5901

    Article  Google Scholar 

  80. Ghasemi R, Al Aziz MM, Mohammed N, Dehkordi MH, Jiang X (2017) Private and efficient query processing on outsourced genomic databases. IEEE journal of biomedical and health informatics 21(5):1466–1472

    Article  Google Scholar 

  81. Ghosh A, Barman S (2016) Application of Euclidean distance measurement and principal component analysis for gene identification. Gene 583(2):112–120

    Article  Google Scholar 

  82. Ginsburg GS, Willard HF (2009) Genomic and personalized medicine: foundations and applications. Transl Res 154(6):277–287

    Article  Google Scholar 

  83. Goodwin CR, Covington BC, Derewacz DK, McNees CR, Wikswo JP, McLean JA, Bachmann BO (2015) Structuring microbial metabolic responses to multiplexed stimuli via self-organizing metabolomics maps. Chem Biol 22(5):661–670

    Article  Google Scholar 

  84. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adicoins X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29(7):644

    Article  Google Scholar 

  85. Guo G, Pinello L, Han X, Lai S, Shen L, Lin TW, Zou K, Orkin SH (2016) Serum-based culture conditions provoke gene expression variability in mouse embryonic stem cells as revealed by single-cell analysis. Cell Rep 14(4):956–965

    Article  Google Scholar 

  86. Gupta A, Wang H, Ganapathiraju M (2015) Learning structure in gene expression data using deep architectures, with an application to gene clustering. In IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 1328–1335

  87. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  88. Ha VS, Nguyen HN (2016) C-KPCA: custom kernel PCA for cancer classification. In Springer Machine Learning and Data Mining in Pattern Recognition 459–467

  89. Haghverdi L, Buettner F, Theis FJ (2015) Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 31(18):2989–2998

    Article  Google Scholar 

  90. Hamid JS, Hu P, Roslin NM, Ling V, Greenwood CM, Beyene J (2009) Data integration in genetics and genomics: methods and challenges. Human genomics and proteomics: HGP 2009(869093):1–13

  91. Hartuv E, Schmitt AO, Lange J, Meier-Ewert S, Lehrach H, Shamir R (2000) An algorithm for clustering cDNA fingerprints. Genomics 66(3):249–256

    Article  Google Scholar 

  92. Hauskrecht M, Pelikan R, Valko M, Lyons-Weiler J (2007) Feature selection and dimensionality reduction in genomics and proteomics. In Fundamentals of data mining in genomics and proteomics Springer (Boston, MA) 149–172

  93. He KY, Ge D, He MM (2017) Big data analytics for genomic medicine. Int J Mol Sci 18(2):1–18

    Article  Google Scholar 

  94. Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, Harp LF, Ye Z, Lee LK, Stuart RK, Ching CW, Ching KA, Antosiewicz-Bourget JE, Liu H, Zhang X, Green RD, Lobanencov VV, Stewart R, Thomson JA, Crawford GE, Kellis M, Ren B (2009) Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459(7243):108–112

    Article  Google Scholar 

  95. Hernandez JCH, Duval B, Hao JK (2007) A genetic embedded approach for gene selection and classification of microarray data. In European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, Springer, Berlin, Heidelberg 90–101

  96. Herrero J, Díaz-Uriarte R, Dopazo J (2003) Gene expression data preprocessing. Bioinformatics 19(5):655–656

    Article  Google Scholar 

  97. Herrero J, Al-Shahrour F, Diaz-Uriarte R, Mateos A, Vaquerizas JM, Santoyo J, Dopazo J (2003) GEPAS: a web-based resource for microarray gene expression data analysis. Nucleic Acids Res 31(13):3461–3467

    Article  Google Scholar 

  98. Heydarian Z, Gruber M, Glick BR, Hegedus DD (2018) Gene Expression Patterns in Roots of Camelina sativa With Enhanced Salinity Tolerance Arising From Inoculation of Soil With Plant Growth Promoting Bacteria Producing 1-Aminocyclopropane-1-Carboxylate Deaminase or Expression the Corresponding acdS Gene. Frontiers in microbiology 9

  99. van Hijum SA, Baerends RJ, Zomer AL, Karsens HA, Martin-Requena V, Trelles O, Kok Jan, Kuipers OP (2008) Supervised Lowess normalization of comparative genome hybridization data–application to lactococcal strain comparisons. BMC bioinformatics 9(1): 1–10

  100. Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinforma 2015(198363):1–13

    Article  Google Scholar 

  101. Huang DS, Zheng CH (2006) Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 22(15):1855–1862

    Article  Google Scholar 

  102. Inza I, Sierra B, Blanco R, Larrañaga P (2002) Gene selection by sequential search wrapper approaches in microarray cancer class prediction. Journal of Intelligent & Fuzzy Systems 12(1):25–33

    MATH  Google Scholar 

  103. Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215

    Article  Google Scholar 

  104. Jaskowiak PA, Campello RJ, Costa IG (2014, January) On the selection of appropriate distances for gene expression data clustering. BMC bioinformatics 15(2):1–17

    Google Scholar 

  105. Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11):1370–1386

    Article  Google Scholar 

  106. Jin X, Xu A, Bie R, Guo P (2006) Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. In International Workshop on Data Mining for Biomedical Applications Springer (Berlin, Heidelberg) 106–115

  107. Johnson TA, Stedtfeld RD, Wang Q, Cole JR, Hashsham SA, Looft T, Zhu YG, Tiedje JM (2016) Clusters of antibiotic resistance genes enriched together stay together in swine agriculture. MBio 7(2):1–11

    Article  Google Scholar 

  108. Kamal MS, Parvin S, Ashour AS, Shi F, Dey N (2017) De-Bruijn graph with MapReduce framework towards metagenomic data classification. Int J Inf Technol 9(1):59–75

    Google Scholar 

  109. Kamal MS, Trivdedi, MC, Alam JB, Dey N, Ashour AS, Shi F, Tavares JMR (Preprint) Big DNA datasets analysis under push down automata. Journal of Intelligent & Fuzzy Systems: 1–11

  110. Kar S, Sharma KD, Maitra M (2015) Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique. Expert Syst Appl 42(1):612–627

    Article  Google Scholar 

  111. Kasabov NK (2014) NeuCube: a spiking neural network architecture for mapping, learning and understanding of spatio-temporal brain data. Neural Netw 52:62–76

    Article  Google Scholar 

  112. Keller NP (2015) Translating biosynthetic gene clusters into fungal armor and weaponry. Nat Chem Biol 11(9):671

    Article  Google Scholar 

  113. Kelley DR, Snoek J, Rinn JL (2016) Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res

  114. Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In IEEE Science and Information Conference (SAI) 372–378

  115. Kim D. H, et.al. (2015) Single-cell transcriptome analysis reveals dynamic changes in lncRNA expression during reprogramming. Cell Stem Cell 16(1): 88–101

  116. Kooperberg C, Fazzio TG, Delrow JJ, Tsukiyama T (2002) Improved background correction for spotted DNA microarrays. J Comput Biol 9(1):55–66

    Article  Google Scholar 

  117. Kursa MB (2014) Robustness of random Forest-based gene selection methods. BMC bioinformatics 15(1):1–8

    Article  Google Scholar 

  118. Kuznetsova I, Lugmayr A, Holzinger A (2018) Visualisation Methods of Hierarchical Biological Data: A Survey and Review. International SERIES on Information Systems and Management in Creative eMedia (CreMedia) (2017/2), 32–39

  119. Lamparter D, Marbach D, Rueedi R, Kutalik Z, Bergmann S (2016) Fast and rigorous computation of gene and pathway scores from SNP-based summary statistics. PLoS Comput Biol 12(1):e1004714

    Article  Google Scholar 

  120. Lan K, Wang DT, Fong S, Liu LS, Wong KK, Dey N (2018) A survey of data mining and deep learning in bioinformatics. J Med Syst 42(8):139

    Article  Google Scholar 

  121. Lancashire LJ, Rees RC, Ball GR (2008) Identification of gene transcript signatures predictive for estrogen receptor and lymph node status using a stepwise forward selection artificial neural network modelling approach. Artif Intell Med 43(2):99–111

    Article  Google Scholar 

  122. Landfors M, Philip P, Rydén P, Stenberg P (2011) Normalization of high dimensional genomics data where the distribution of the altered variables is skewed. PLoS One 6(11)

  123. Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 9(4):1106–1119

    Article  Google Scholar 

  124. Lazzeroni L, Owen A (2002) Plaid models for gene expression data. Stat Sin 12(1):61–86

    MathSciNet  MATH  Google Scholar 

  125. Lê Cao KA, Rohart F, McHugh L, Korn O, Wells CA (2014) YuGene: a simple approach to scale gene expression data derived from different platforms for integrated analyses. Genomics 103(4):239–251

    Article  Google Scholar 

  126. Leardi R, Nørgaard L (2004) Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions. Journal of Chemometrics: A Journal of the Chemometrics Society 18(11):486–497

    Article  Google Scholar 

  127. Lee PS, Lee KH (2000) Genomic analysis. Curr Opin Biotechnol 11(2):171–175

    Article  Google Scholar 

  128. Lee Y, Lee CK (2003) Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 19(9):1132–1139

    Article  Google Scholar 

  129. Lee G, Rodriguez C, Madabhushi A (2008) Investigating the efficacy of nonlinear dimensionality reduction schemes in classifying gene and protein expression studies. IEEE/ACM Transactions on Computational Biology and Bioinformatics 5(3):368–384

    Article  Google Scholar 

  130. Lee AB, Luca D, Klei L, Devlin B, Roeder K (2010) Discovering genetic ancestry using spectral graph theory. Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology. Society 34(1):51–59

    Google Scholar 

  131. Leung YF, Cavalieri D (2003) Fundamentals of cDNA microarray data analysis. Trends Genet 19(11):649–659

    Article  Google Scholar 

  132. Li L, Weinberg CR, Darden TA, Pedersen LG (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12):1131–1142

    Article  Google Scholar 

  133. Li L, Darden TA, Weingberg CR, Levine AJ, Pedersen LG (2001) Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Comb Chem High Throughput Screen 4(8):727–739

    Article  Google Scholar 

  134. Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15):2429–2437

    Article  Google Scholar 

  135. Li Q, Fraley C, Bumgarner RE, Yeung KY, Raftery AE (2005) Donuts, scratches and blanks: robust model-based segmentation of microarray images. Bioinformatics 21(12):2875–2882

    Article  Google Scholar 

  136. Li MW, Han DF, Wang WL (2015) Vessel traffic flow forecasting by RSVR with chaotic cloud simulated annealing genetic algorithm and KPCA. Neurocomputing 157:243–255

    Article  Google Scholar 

  137. Li J, Malley JD, Andrew AS, Karagas MR, Moore JH (2016) Detecting gene-gene interactions using a permutation-based random forest method. BioData mining 9(1):14

    Article  Google Scholar 

  138. Liang H, Sun D, Ding Z, Ge M (2015) Protein function prediction using multi-label learning and ISOMAP embedding. In: Bio-inspired computing-theories and applications. Springer, Berlin, pp 249–259

    Chapter  Google Scholar 

  139. Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P (2015) The molecular signatures database hallmark gene set collection. Cell systems 1(6):417–425

    Article  Google Scholar 

  140. Liew AWC, Law NF, Yan H (2010) Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Brief Bioinform 12(5):498–513

    Article  Google Scholar 

  141. Liu H, Li J, Wong L (2002) A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome informatics 13:51–60

    Google Scholar 

  142. Liu B, Cui Q, Jiang T, Ma S (2004) A combinational feature selection and ensemble neural network method for classification of gene expression data. BMC bioinformatics 5(1):1–12

    Article  Google Scholar 

  143. Liu Z, Chen D, Bensmail H (2005) Gene expression data classification with kernel principal component analysis. Biomed Res Int 2005(2):155–159

    Google Scholar 

  144. Liu J, Pérez-Liébana D, Lucas SM (2017) Bandit-based random mutation hill-climbing. In IEEE Congress on Evolutionary Computation (CEC) 2145–2151

  145. Loomba R, Schork N, Chen CH, Bettencourt R, Bhatt A, Ang B, Nguyen P, Hernandez C, Richards L, Salotti J, Lin S, Seki E, Nelson KE, Sirlin CB, Brenner D (2015) Heritability of hepatic fibrosis and steatosis based on a prospective twin study. Gastroenterology 149(7):1784–1793

    Article  Google Scholar 

  146. Lu H, Meng Y, Yan K, Xue Y, Gao Z (2017) Classifying Non-linear Gene Expression Data Using a Novel Hybrid Rotation Forest Method. In Springer International Conference on Intelligent Computing 732–743

  147. Luo F, Tang K, Khan L (2003, March) Hierarchical clustering of gene expression data. In Proceedings. Third IEEE Symposium on Bioinformatics and. Bioengineering:328–335

  148. Mallick P, Ghosh O, Seth P, Ghosh A (2019) Kohonen’s Self-organizing Map Optimizing Prediction of Gene Dependency for Cancer Mediating Biomarkers. In Springer Emerging Technologies in Data Mining and Information Security 863–870

  149. Manikandan SP, Manimegalai R, Hariharan M (2016) Gene selection from microarray data using binary Grey Wolf algorithm for classifying acute leukemia. Current Signal Transduction Therapy 11(2):76–83

    Article  Google Scholar 

  150. Mann KM, Newberg JY, Black MA, Jones DJ, Amaya-Manzanares F, Guzman-Rojas L, Kodama T, Ward JM, Rust AG, Weyden L, Yew CCK, Waters JL, Leung ML, Rogers K, Rogers SM, McNoe LA, Selvanesan L, Navin N, Jenkins NA, Copeland NG, Mann MB (2016) Analyzing tumor heterogeneity and driver genes in single myeloid leukemia cells with SBCapSeq. Nat Biotechnol 34(9):962–972

    Article  Google Scholar 

  151. McCarthy MI (2010) Genomics, type 2 diabetes, and obesity. N Engl J Med 363(24):2339–2350

    Article  Google Scholar 

  152. McGee M, Chen Z (2006) Parameter estimation for the exponential-normal convolution model for background correction of affymetrix GeneChip data. Statistical applications in genetics and molecular biology 5(1)

  153. McInerney JO, Smith T, Mahony S, Golden A (2017) Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models. Cancer

  154. McLachlan GJ, Bean RW, Peel D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18(3):413–422

    Article  Google Scholar 

  155. McPherson JD, Marra M, Hillier L, Waterston RH, Chinwalla A, Wallis J, Fulton R (2001) A physical map of the human genome. Nature 409(6822):934–942

    Article  Google Scholar 

  156. McSharry PE, Crampin EJ (2016) Identifying statistically significant patterns in gene expression data arXiv preprint arXiv:1606.02801

  157. Medvedovic M, Sivaganesan S (2002) Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics 18(9):1194–1206

    Article  Google Scholar 

  158. Mehrotra P (2016) Biosensors and their applications–a review. Journal of oral biology and craniofacial research 6(2):153–159

    Article  Google Scholar 

  159. Melo ALDA, Soccol VT, Soccol CR (2016) Bacillus thuringiensis: mechanism of action, resistance, and new applications: a review. Crit Rev Biotechnol 36(2):317–326

    Article  Google Scholar 

  160. Meng J, Zhang J, Luan Y (2015) Gene selection integrated with biological knowledge for plant stress response using neighborhood system and rough set theory. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 12(2):433–444

    Article  Google Scholar 

  161. Min X, Wang H, Yang Z, Ge S, Zhang J, Shao N (2015) Relevant component locally linear embedding dimensionality reduction for gene expression data analysis. Metallurgical & Mining Industry 4:186–194

    Google Scholar 

  162. Moorthy K, Saberi Mohamad M, Deris S (2014) A review on missing value imputation algorithms for microarray gene expression data. Curr Bioinforma 9(1):18–22

    Article  Google Scholar 

  163. Murray SN, Walsh BP, Kelliher D, O'Sullivan DTJ (2014) Multi-variable optimization of thermal energy efficiency retrofitting of buildings using static modelling and genetic algorithms–a case study. Build Environ 75:98–107

    Article  Google Scholar 

  164. National Research Council. (1988). Mapping and sequencing the human genome. National Academies Press

  165. Newton MA, Kendziorski CM, Richmond CS, Blattner FR, Tsui KW (2001) On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J Comput Biol 8(1):37–52

    Article  Google Scholar 

  166. Nilsson J (2006) Nonlinear dimensionality reduction of gene expression data. Centre for Mathematical Sciences, Lund University

  167. Nimmy SF, Sarowar MG, Dey N, Ashour AS, Santosh KC (2018) Investigation of DNA discontinuity for detecting tuberculosis. Journal of Ambient Intelligence and Humanized Computing 1–15

  168. Njeunje FON, Czaja W, Benedetto JJ (2014) Linear and Non-linear Dimension Reduction Applied to Gene Expression Data of Cancer Tissue Samples

  169. Oba S, Sato MA, Takemasa I, Monden M, Matsubara KI, Ishii S (2003) A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16):2088–2096

    Article  Google Scholar 

  170. Oghabian A, Kilpinen S, Hautaniemi S, Czeizler E (2014) Biclustering methods: biological relevance and application in gene expression analysis. PLoS One 9(3):e90801

    Article  Google Scholar 

  171. Ogutu JO, Schulz-Streeck T, Piepho HP (2012) Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions. BMC Proc 6(2):1–6

    Google Scholar 

  172. Orsenigo C, Vercellis C (2013) Dimensionality reduction via isomap with lock-step and elastic measures for time series gene expression classification. In European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics Springer (Berlin, Heidelberg) 92–103

  173. Ott J, Wang J, Leal SM (2015) Genetic linkage analysis in the age of whole-genome sequencing. Nat Rev Genet 16(5):275–284

    Article  Google Scholar 

  174. Palmer OMP, Rogers G, Yende S, Angus DC, Clermont G, Langston MA (2018) Graph theoretical analysis of genome-scale data: examination of gene activation occurring in the setting of community-acquired pneumonia. Shock 50(1):53–59

    Article  Google Scholar 

  175. Pan M, Zhang J (2018) Quantile normalization for combining gene-expression datasets. Biotechnology & Biotechnological Equipment 32(3):751–758

    Article  MathSciNet  Google Scholar 

  176. Paradis E, Gosselin T, Goudet J, Jombart T, Schliep K (2017) Linking genomics and population genetics with R. Mol Ecol Resour 17(1):54–66

    Article  Google Scholar 

  177. Parikshak NN, Swarup V, Belgard TG, Irimia M, Ramaswami G, Gandal MJ, Harti C, Leppa V, Ubieta LT, Huang J, Lowe JK, Blencowe BJ, Horvath S, Geschwind DH (2016) Genome-wide changes in lncRNA, splicing, and regional gene expression patterns in autism. Nature 540(7633):423–427

    Article  Google Scholar 

  178. Parmigiani G, Garrett ES, Irizarry RA, Zeger SL (2003) The analysis of gene expression data: an overview of methods and software. In The analysis of gene expression data Springer (New York, NY) 1–45

  179. Parry RM, Jones W, Stokes TH, Phan JH, Moffitt RA, Fang H, Shi L, Oberthuer A, Fischer M, Tong W, Wang MD (2010) K-nearest neighbor models for microarray gene expression analysis and clinical outcome prediction. The pharmacogenomics journal 10(4):292–309

    Article  Google Scholar 

  180. Perkins AD, Langston MA (2009) Threshold selection in gene co-expression networks using spectral graph theory techniques. In BMC bioinformatics 10 (11): S4

  181. Petralia F, Wang P, Yang J, Tu Z (2015) Integrative random forest for gene regulatory network inference. Bioinformatics 31(12):197–205

    Article  Google Scholar 

  182. Pickett JA, Khan ZR (2016) Plant volatile-mediated signalling and its application in agriculture: successes and challenges. New Phytol 212(4):856–870

    Article  Google Scholar 

  183. Pillati M, Viroli C (2005) Locally linear embedding for nonlinear dimension reduction in classification problems: an application to gene expression data. Statistica 65(1):61–71

    MathSciNet  MATH  Google Scholar 

  184. Pillati M, Viroli C (2005) Supervised locally linear embedding for classification: an application to gene expression data analysis. In Proceedings of 29th Annual Conference of the German Classification Society 15–18

  185. Prabhakaran S, Azizi E, Carr A, Pe’er D (2016) Dirichlet process mixture model for correcting technical variation in single-cell gene expression data. In International Conference on Machine Learning 1070–1079

  186. Qiu X, Wu H, Hu R (2013) The impact of quantile and rank normalization procedures on the testing power of gene differential expression analysis. BMC bioinformatics 14(1):1–10

    Article  Google Scholar 

  187. Quang D, Xie X (2016) DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res 44(11):1–6

    Article  Google Scholar 

  188. Rajan K (2015) Materials informatics: the materials “gene” and big data. Annu Rev Mater Res 45:153–169

    Article  Google Scholar 

  189. Rajan K (2015) Materials informatics: the materials “gene” and big data. Annu Rev Mater Res 45:153–169

    Article  Google Scholar 

  190. Ramalho JS, Tolmachova T, Hume AN, McGuigan A, Gregory-Evans CY, Huxley C, Seabra MC (2001) Chromosomal mapping, gene structure and characterization of the human and murine RAB27B gene. BMC Genet 2(1)

  191. Ray SS, Ganivada A, Pal SK (2016) A granular self-organizing map for clustering and gene selection in microarray data. IEEE transactions on neural networks and learning systems 27(9):1890–1906

    Article  MathSciNet  Google Scholar 

  192. Reverter F, Vegas E, Oller JM (2014) Kernel-PCA data integration with enhanced interpretability. BMC Syst Biol 8(2):1–9

    Google Scholar 

  193. Ritchie ME, Silver J, Oshlack A, Holmes M, Diyagama D, Holloway A, Smyth GK (2007) A comparison of background correction methods for two-colour microarrays. Bioinformatics 23(20):2700–2707

    Article  Google Scholar 

  194. Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D (2015) Methods of integrating data to uncover genotype–phenotype interactions. Nat Rev Genet 16(2):85–97

    Article  Google Scholar 

  195. Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140

    Article  Google Scholar 

  196. Rocke DM, Durbin B (2003) Approximate variance-stabilizing transformations for gene-expression microarray data. Bioinformatics 19(8):966–972

    Article  Google Scholar 

  197. Rodríguez-Rodríguez J, Sevilla A, Martínez-Bazán C, Gordillo JM (2015) Generation of microbubbles with applications to industry and medicine. Annu Rev Fluid Mech 47:405–429

    Article  MathSciNet  Google Scholar 

  198. Roffler GH, Schwartz MK, Pilgrim KL, Talbot SL, Sage GK, Adams LG, Luikart G (2016) Identification of landscape features influencing gene flow: how useful are habitat selection models? Evol Appl 9(6):805–817

    Article  Google Scholar 

  199. Romualdi C, Campanaro S, Campagna D, Celegato B, Cannata N, Toppo S, Valle G, Lanfranchi G (2003) Pattern recognition in gene expression profiling using DNA array: a comparative study of different statistical methods applied to cancer classification. Hum Mol Genet 12(8):823–836

    Article  Google Scholar 

  200. Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2006) Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recogn 39(12):2383–2392

    Article  Google Scholar 

  201. Rupp R, Mucha S, Larroque H, McEwan J, Conington J (2016) Genomic application in sheep and goat breeding. Animal Frontiers 6(1):39–44

    Article  Google Scholar 

  202. Ryman N (2006) Chifish: a computer program testing for genetic heterogeneity at multiple loci using chi-square and Fisher's exact test. Mol Ecol Notes 6(1):285–287

    Article  Google Scholar 

  203. Saelens W, Cannoodt R, Saeys Y (2018) A comprehensive evaluation of module detection methods for gene expression data. Nat Commun 9(1):1–12

    Article  Google Scholar 

  204. Saghir H, Megherbi DB (2013) An efficient comparative machine learning-based metagenomics binning technique via using Random forest. In IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA) 191–196

  205. Salleh AHM, Mohamad MS, Deris S, Omatu S, Fdez-Riverola F, Corchado JM (2015) Gene knockout identification for metabolite production improvement using a hybrid of genetic ant colony optimization and flux balance analysis. Biotechnol Bioprocess Eng 20(4):685–693

    Article  Google Scholar 

  206. Saul LK, Weinberger KQ, Ham JH, Sha F, Lee DD (2006) Spectral methods for dimensionality reduction. Semisupervised learning:293–308

  207. Schmitt P, Mandel J, Guedj M (2015) A comparison of six methods for missing data imputation. Journal of Biometrics & Biostatistics 6(1):1–6

    Google Scholar 

  208. Seno A, Kasai T, Ikeda M, Vaidyanath A, Masuda J, Mizutani A, Murakami H, Ishikawa T, Seno M (2016) Characterization of gene expression patterns among artificially developed cancer stem cells using spherical self-organizing map. Cancer informatics 15, CIN-S39839

  209. Sewer A, Gubian S, Kogel U, Veljkovic E, Han W, Hengstermann A, Peitsch MC, Hoeng J (2014) Assessment of a novel multi-array normalization method based on spike-in control probes suitable for microRNA datasets with global decreases in expression. BMC research notes 7(1):1–18

    Article  Google Scholar 

  210. Shabani M, Borry P (2015) Challenges of web-based personal genomic data sharing. Life sciences, society and policy 11(1):1–13

    Article  Google Scholar 

  211. Shamir R, Sharan R (2002) Algorithmic approaches to clustering gene expression data. Current Topics in Computational Molecular Biology 269

  212. Sharbaf FV, Mosafer S, Moattar MH (2016) A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Genomics 107(6):231–238

    Article  Google Scholar 

  213. Shehu A, De Jong KA (2014) Evolutionary search algorithms for protein modeling: from de novo structure prediction to comprehensive maps of functionally-relevant structures of protein chains and assemblies. In Proceedings of the ACM Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation 839–856

  214. Sherlock G (2000) Analysis of large-scale gene expression data. Curr Opin Immunol 12(2):201–205

    Article  MathSciNet  Google Scholar 

  215. Shimada K, Nakamura M, Ishida E, Higuchi T, Yamamoto H, Tsujikawa K, Konishi N (2008) Prostate cancer antigen-1 contributes to cell survival and invasion though discoidin receptor 1 in human prostate cancer. Cancer Sci 99(1):39–45

    Google Scholar 

  216. Shreem SS, Abdullah S, Nazri MZA (2014) Hybridising harmony search with a Markov blanket for gene selection problems. Inf Sci 258:108–121

    Article  MathSciNet  Google Scholar 

  217. Simerska P, Moyle PM, Toth I (2011) Modern lipid-, carbohydrate-, and peptide-based delivery systems for peptide, vaccine, and gene products. Med Res Rev 31(4):520–547

    Article  Google Scholar 

  218. Simko I (2016) High-resolution DNA melting analysis in plant research. Trends Plant Sci 21(6):528–537

    Article  Google Scholar 

  219. Singh D, al e (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209

    Article  Google Scholar 

  220. Slonim DK (2002) From patterns to pathways: gene expression data analysis comes of age. Nat Genet 32:502–508

    Article  Google Scholar 

  221. Southern EM (1992) Genome mapping: cDNA approaches. Curr Opin Genet Dev 2(3):412–416

    Article  Google Scholar 

  222. Steiner L, Hopp L, Wirth H, Galle J, Binder H, Prohaska SJ, Rohlf T (2012) A global genome segmentation method for exploration of epigenetic patterns. PLoS One 7(10)

  223. Sun S, Peng Q, Shakoor A (2014) A kernel-based multivariate feature selection method for microarray data classification. PloS one 9(7)

  224. Tabakhi S, Najafi A, Ranjbar R, Moradi P (2015) Gene selection for microarray data classification using a novel ant colony optimization. Neurocomputing 168:1024–1036

    Article  Google Scholar 

  225. Tan AC, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification

  226. Tang EK, Suganthan PN, Yao X (2006) Gene selection algorithms for microarray data based on least squares support vector machine. BMC bioinformatics 7(1):95

    Article  Google Scholar 

  227. Tang H, Jiang X, Wang X, Wang S, Sofia H, Fox D, Lauter K, Malin B, Telenti A, Xiong L, Ohno-Machado L (2016) Protecting genomic data analytics in the cloud: state of the art and opportunities. BMC Med Genet 9(1):1–9

    Google Scholar 

  228. Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci 99(10):6567–6572

    Article  Google Scholar 

  229. Tran LH, Tran LH (2017) Applications of (SPARSE)-PCA and LAPLACIAN EIGENMAPS to biological network inference problem using gene expression data. International Journal of Advances in Soft Computing & Its Applications 9(2):45–62

    Google Scholar 

  230. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525

    Article  Google Scholar 

  231. Tuikkala J, Elo LL, Nevalainen OS, Aittokallio T (2008) Missing value imputation improves clustering and interpretation of gene expression microarray data. BMC bioinformatics 9(1):1–14

    Article  Google Scholar 

  232. Tutz G, Ramzan S (2015) Improved methods for the imputation of missing data by nearest neighbor methods. Computational Statistics & Data Analysis 90:84–99

    Article  MathSciNet  MATH  Google Scholar 

  233. Uğuz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl-Based Syst 24(7):1024–1032

    Article  Google Scholar 

  234. van Dijk D, Nainys J, Sharma R, Kathail P, Carr AJ, Moon KR, Mazutis L, Wolf G, Krishnaswamy S, Pe'er D (2017) MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data. BioRxiv

  235. Venna J, Peltonen J, Nybo K, Aidos H, Kaski S (2010) Information retrieval perspective to nonlinear dimensionality reduction for data visualization. J Mach Learn Res 11(Feb):451–490

    MathSciNet  MATH  Google Scholar 

  236. Vepakomma P, Elgammal A (2016) A fast algorithm for manifold learning by posing it as a symmetric diagonally dominant linear system. Appl Comput Harmon Anal 40(3):622–628

    Article  MathSciNet  MATH  Google Scholar 

  237. Vidaki A, Johansson C, Giangasparo F, Court DS (2017) Differentially methylated embryonal Fyn-associated substrate (EFS) gene as a blood-specific epigenetic marker and its potential application in forensic casework. Forensic Science International: Genetics 29:165–173

    Article  Google Scholar 

  238. Vohradsky J (2001) Neural network model of gene expression. FASEB J 15(3):846–854

    Article  Google Scholar 

  239. Wang H, van der Laan MJ (2011) Dimension reduction with gene expression data using targeted variable importance measurement. BMC bioinformatics 12(1):1–12

    Article  Google Scholar 

  240. Wang Z, Li G, Robinson RW, Huang X (2016) UniBic: sequential row-based biclustering algorithm for analysis of gene expression data. Sci Rep 6:1–10

    Article  Google Scholar 

  241. Wang A, An N, Yang J, Chen G, Li L, Alterovitz G (2017) Wrapper-based gene selection with Markov blanket. Comput Biol Med 81:11–23

    Article  Google Scholar 

  242. Westcott SL, Schloss PD (2015) De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units. PeerJ 3:e1487

    Article  Google Scholar 

  243. Willems E, Leyns L, Vandesompele J (2008) Standardization of real-time PCR gene expression data from independent biological replicates. Anal Biochem 379(1):127–129

    Article  Google Scholar 

  244. Wilson A, Fenton B, Malloch G, Boag B, Hubbard S, Begg G (2016) Urbanisation versus agriculture: a comparison of local genetic diversity and gene flow between wood mouse Apodemus sylvaticus populations in human-modified landscapes. Ecography 39(1):87–97

    Article  Google Scholar 

  245. Wong MH, Mutch DM, McNicholas PD (2017) Two-way learning with one-way supervision for gene expression data. BMC bioinformatics 18(1):150

    Article  Google Scholar 

  246. Xu Y, Olman V, Xu D (2002) Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. Bioinformatics 18(4):536–545

    Article  Google Scholar 

  247. Xu R, Damelin S, Wunsch DC (2007) Applications of diffusion maps in gene expression data-based cancer diagnosis analysis. In IEEE 29th annual international conference of Engineering in medicine and biology society 4613–4616

  248. Xu J, Mu H, Wang Y, Huang F (2018) Feature genes selection using supervised locally linear embedding and correlation coefficient for microarray classification. Computational and mathematical methods in medicine 2018(5490513):1–11

  249. Xuan P, Guo MZ, Wang J, Wang CY, Liu XY, Liu Y (2011) Genetic algorithm-based efficient feature selection for classification of pre-miRNAs. Genet Mol Res 10(2):588–603

    Article  Google Scholar 

  250. Yang YH, Buckley MJ, Dudoit S, Speed TP (2002) Comparison of methods for image analysis on cDNA microarray data. J Comput Graph Stat 11(1):108–136

    Article  MathSciNet  Google Scholar 

  251. Yang Y, Xie B, Yan J (2014) Application of next-generation sequencing technology in forensic science. Genomics, proteomics & bioinformatics 12(5):190–197

    Article  Google Scholar 

  252. Ye J, Li T, Xiong T, Janardan R (2004) Using uncorrelated discriminant analysis for tissue classification with gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 1(4):181–190

    Article  Google Scholar 

  253. Yeung KY, Haynor DR, Ruzzo WL (2001) Validating clustering for gene expression data. Bioinformatics 17(4):309–318

    Article  Google Scholar 

  254. Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987

    Article  Google Scholar 

  255. Yu Z, Wong HS, Wang H (2007) Graph-based consensus clustering for class discovery from gene expression data. Bioinformatics 23(21):2888–2896

    Article  Google Scholar 

  256. Yuan B, Zhang C, Shao X (2015) A late acceptance hill-climbing algorithm for balancing two-sided assembly lines with multiple constraints. J Intell Manuf 26(1):159–168

    Article  Google Scholar 

  257. Zamani-Dahaj SA, Okasha M, Kosakowski J, Higgs PG (2016) Estimating the frequency of horizontal gene transfer using phylogenetic models of gene gain and loss. Mol Biol Evol 33(7):1843–1857

    Article  Google Scholar 

  258. Zeng T, Li R, Mukkamala R, Ye J, Ji S (2015) Deep convolutional neural networks for annotating gene expression patterns in the mouse brain. BMC bioinformatics 16(1):1–10

    Article  Google Scholar 

  259. Zhang S, Chen S, Li W, Guo X, Zhao P, Xu J, Chen Y, Pan Q, Liu X, Lu H, Wang Y, Pei D, Esteban MA (2011) Rescue of ATP7B function in hepatocyte-like cells from Wilson's disease induced pluripotent stem cells using gene therapy or the chaperone drug curcumin. Hum Mol Genet 20(16):3176–3187

    Article  Google Scholar 

  260. Zhang L, Qian L, Ding C, Zhou W, Li F (2015) Similarity-balanced discriminant neighbor embedding and its application to cancer classification based on gene expression data. Comput Biol Med 64:236–245

    Article  Google Scholar 

  261. Zhu Z, Ong YS, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn 40(11):3236–3248

    Article  MATH  Google Scholar 

  262. Zou Q, Zeng J, Cao L, Ji R (2016) A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing 173:346–354

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jyotismita Chaki.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chaki, J., Dey, N. Pattern analysis of genetics and genomics: a survey of the state-of-art. Multimed Tools Appl 79, 11163–11194 (2020). https://doi.org/10.1007/s11042-019-7181-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-7181-8

Keywords

Navigation