Skip to main content

A Survey of Computational Methods for Protein Function Prediction

  • Chapter
  • First Online:
Big Data Analytics in Genomics

Abstract

Rapid advances in high-throughout genome sequencing technologies have resulted in millions of protein-encoding gene sequences with no functional characterization. Automated protein function annotation or prediction is a prime problem for computational methods to tackle in the post-genomic era of big molecular data. While recent community-driven experiments demonstrate that the accuracy of function prediction methods has significantly improved, challenges remain. The latter are related to the different sources of data exploited to predict function, as well as different choices in representing and integrating heterogeneous data. Current methods predict function from a protein’s sequence, often in the context of evolutionary relationships, from a protein’s three-dimensional structure or specific patterns in the structure, from neighbors in a protein–protein interaction network, from microarray data, or a combination of these different types of data. Here we review these methods and the state of protein function prediction, emphasizing recent algorithmic developments, remaining challenges, and prospects for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abascal, F., Valencia, A.: Automatic annotation of protein function based on family identification. Proteins 53 (3), 683–692 (2003)

    Article  Google Scholar 

  2. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD Intl Conf on Management of Data, pp. 207–216. ACM (1993)

    Google Scholar 

  3. Albert, R.: Network inference, analysis, and modeling in systems biology. Plant Cell 19 (11), 3327–3338 (2007)

    Article  Google Scholar 

  4. Alberts, B., Johnson, A., Lewis, J., et al.: From RNA to protein. In: Molecular Biology of the Cell, 4 edn. New York: Garland Science (2002)

    Google Scholar 

  5. Alberts, B., Johnson, A., Lewis, J., et al.: Studying gene expression and function. In: Molecular Biology of the Cell, 4 edn. New York: Garland Science (2002)

    Google Scholar 

  6. Alexandrov, N.N.: SARFing the PDB. Protein Eng 9 (9), 727–732 (1996)

    Article  Google Scholar 

  7. Altman, D.G.: Practical Statistics for Medical Research. Chapman and Hall (1997)

    Google Scholar 

  8. Altschul, S.F., Madden, T.L., Schaeffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402 (1997)

    Article  Google Scholar 

  9. Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J., Chothia, C., Murzin, A.G.: Scop database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 32 (Database issue), D226–D229 (2004)

    Article  Google Scholar 

  10. Apeltsin, L., Morris, J.H., Babbitt, P.C., Ferrin, T.E.: Improving the quality of protein similarity network clustering algorithms using the network edge weight distribution. Bioinformatics 27 (3), 326–333 (2011)

    Article  Google Scholar 

  11. Arnau, V., Mars, S., Marin, I.: Iterative cluster analysis of protein interaction data. Bioinformatics 21 (3), 364–378 (2005)

    Article  Google Scholar 

  12. Ashburner, M., Ball, C., Blake, K., et al.: The gene ontology consortium. Nature Genetics 25 (1), 25–29 (2000)

    Article  Google Scholar 

  13. Aung, Z., Tan, K.L.: Rapid 3D protein structure database searching using information retrieval techniques. Bioinformatics 20 (7), 1045–1052 (2004)

    Article  Google Scholar 

  14. Badea, L.: Functional discrimination of gene expression patterns in terms of the gene ontology. In: Pacific Symp Biocomput (PSB), pp. 565–576 (2003)

    Google Scholar 

  15. Bader, G.D., Betel, D., Hogue, W.V.: BIND: the biomolecular interaction network database. Nucleic Acids Res 31 (1), 248–250 (2003)

    Article  Google Scholar 

  16. Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Intl Conf Intell Sys Mol Biol (RECOMB), pp. 28–36 (1998)

    Google Scholar 

  17. Bailey, T.L., Gribskov, M.: Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14 (1), 48–54 (1998)

    Article  Google Scholar 

  18. Bairoch, A., BUcher, P., Hoffmann, K.: The PROSITE database, its status in 1997. Nucl. Acids Res. 25 (1), 217–221 (1997)

    Article  Google Scholar 

  19. Bar-Joseph, Z.: Analyzing time series gene expression data. Bioinformatics 20 (16), 2493–2503 (2004)

    Article  Google Scholar 

  20. Bar-Joseph, Z., Gitter, A., Simon, I.: Studying and modelling dynamic biological processes using time-series gene expression data. Nat Rev Genet 13 (8), 552–564 (2012)

    Article  Google Scholar 

  21. Barabasi, A.L., Oltvai, Z.N.: Network biology: understanding the cell’s functional organization. Nature Rev Genet 5 (2), 101–113 (2004)

    Article  Google Scholar 

  22. Barrett, et al.: NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res 41 (Database issue), D991–D995 (2013)

    Google Scholar 

  23. Bder, G., Hogue, C.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinf 4 (1), 2 (2003)

    Article  Google Scholar 

  24. Bellaachia, A., Portnov, D., Chen, Y., Elkahloun, A.G.: E-CAST: a data mining algorithm for gene expression data. In: Workshop on Data Mining in Bioinformatics (BIOKDD), pp. 49–54 (2002)

    Google Scholar 

  25. Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. J Comput Biol 6 (3–4), 281–297 (1999)

    Article  Google Scholar 

  26. Ben-Hur, A., Brutlag, D.: Remote homology detection: a motif based approach. Bioinformatics 19 (Suppl 1), i26–i33 (2003)

    Article  Google Scholar 

  27. Ben-Hur, A., Brutlag, D.: Sequence motifs: Highly predictive features of protein function. In: I. Guyon, S. Gunn, M. Nikravesh, L. Zadeh (eds.) Feature extraction and foundations and applications. Springer Verlag (2005)

    Google Scholar 

  28. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N.,, Bourne, P.E.: The protein data bank. Nucl. Acids Res. 28 (1), 235–242 (2000)

    Article  Google Scholar 

  29. Bilu, Y., Linial, M.P.: Functional consequences in metabolic pathways from phylogenetic profiles. In: Intl Workshop on Algorithms in Bioinformatics (WABI), pp. 263–276 (2002)

    Google Scholar 

  30. Blatt, M., Wiseman, S., Domany, E.: Superparamagnetic clustering of data. FEBS Lett 76, 3251–3254 (1996)

    Google Scholar 

  31. Blei, D.: Probabilistic topic models. Communications of the ACM 55 (4), 77–84 (2012)

    Article  MathSciNet  Google Scholar 

  32. Blei, D.M.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  33. Blekas, K., Fotiadis, D.I., Likas, A.: Motif-based protein sequence classification using neural networks. J Comput Biol 12 (1), 64–82 (2005)

    Article  MATH  Google Scholar 

  34. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31 (1), 365–370 (2003)

    Article  Google Scholar 

  35. Bork, P., Koonin, E.V.: Protein sequence motifs. Curr Opin Struct Biol 6 (3), 366–376 (1996)

    Article  Google Scholar 

  36. Braberg, H., Webb, B.M., Tjioe, E., Pieper, U., Sali, A., Madhusudhan, M.S.: SALIGN: a web server for alignment of multiple protein sequences and structures. Bioinformatics 15 (28), 2071–2073 (2012)

    Google Scholar 

  37. Breitkreutz, B., Stark, C., Tyers, M.: The GRID: The general repository for interaction datasets. Genome Biol 4 (3), R3 (2003)

    Google Scholar 

  38. Brenner, S.E.: Errors in genome annotation. Trends Genet 15 (4), 132–133 (1999)

    Article  Google Scholar 

  39. Brenner, S.E., Levitt, M.: Expectations from structural genomics. Protein Sci. 9 (1), 197–200 (2000)

    Article  Google Scholar 

  40. Brown, K.R., Jurisica, I.: Online predicted human interaction database. Bioinformatics 21 (9), 2076–2082 (2005)

    Article  Google Scholar 

  41. Brown, M.P., et al.: Knowledge based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 97 (1), 262–267 (2000)

    Article  Google Scholar 

  42. Brun, C., Chevenet, F., Martin, D., Wojcik, J., Guénoche, A., Jacq, B.: Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network. Genome Biol 5 (1), R6 (2003)

    Article  Google Scholar 

  43. Bryan, K., Cunningham, P., Bolshakova, N.: Biclustering of expression data using simulated annealing. In: IEEE Symp Computer-based Medical Systems (CBMS), pp. 383–388 (2005)

    Google Scholar 

  44. Bucak, S., Jin, R., Jain, A.: Multi-label multiple kernel learning by stochastic approximation: Application to visual object recognition. In: Advances Neural Inform Processing Systems (NIPS), pp. 1145–1154 (2010)

    Google Scholar 

  45. Budowski-Tal, I.,, Nov, Y., Kolodny, R.: Fragbag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately. Proc. Natl. Acad. Sci. USA 107, 3481–3486 (2010)

    Article  Google Scholar 

  46. Butte, A.J., Bao, L., Reis, B.Y., Watkins, T.W., Kohane, I.S.: Comparing the similarity of time-series gene expression using signal processing metrics. J Biomed Bioinf 34 (6), 396–405 (2001)

    Article  Google Scholar 

  47. Cai, C.Z., Han, L.Y., Ji, Z.L., Chen, X., Chen, Y.Z.: SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res 31 (13) (2003)

    Google Scholar 

  48. Cai, Y.D., Doig, A.J.: Prediction of saccharomyces cerevisiae protein functional class from functional domain composition. Bioinformatics 20 (8), 1292–1300 (2004)

    Article  Google Scholar 

  49. Califano, A.: SPLASH: structural pattern localization analysis by sequential histograms. Bioinformatics 16 (4), 341–357 (2000)

    Article  Google Scholar 

  50. Cao, R., Cheng, J.: Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks. Methods 93, 84–99 (2016)

    Article  Google Scholar 

  51. Carpentier, M., Brouillet, S., Pothier, J.: YAKUSA: a fast structural database scanning method. Proteins: Struct. Funct. Bioinf. 61 (1), 137–151 (2005)

    Article  Google Scholar 

  52. Carugo, O.: Rapid methds for comparing protein structures and scanning structure databases. Current Bioinformatics 1, 75–83 (2006)

    Article  Google Scholar 

  53. Carugo, O., Pongor, S.: Protein fold similarity estimated by a probabilistic approach based on c(alpha)-c(alpha) distance comparison. J Mol Biol 315 (4), 887–898 (2002)

    Article  Google Scholar 

  54. Chakrabarti, S., Venkatramanan, K., Sowdhamini, R.: SMoS: a database of structural motifs of protein superfamilies. Protein Eng 16 (11), 791–793 (2003)

    Article  Google Scholar 

  55. Chatr-Aryamontri, A., et al.: The BioGRID interaction database: 2015 update. Nucleic Acids Res 43 (Database Issue), D470–D478 (2015)

    Article  Google Scholar 

  56. Chen, C., Chung, W., Su, C.: Exploiting homogeneity in protein sequence clusters for construction of protein family hierarchies. Pattern Recognition 39 (12), 2356–2369 (2006)

    Article  Google Scholar 

  57. Chen, L., Xuan, J., Riggins, R.B., Wang, Y., Clarke, R.: Identifying protein interaction subnetworks by a bagging markov random field-based method. Nucleic Acd Res 41 (2), e42 (2013)

    Article  Google Scholar 

  58. Chen, Y.J., Kodell, R., Sistare, F., Thompson, K.L., Moris, S., Chen, J.J.: Studying and modelling dynamic biological processes using time-series gene expression data. J Biopharm Stat 13 (1), 57–74 (2003)

    Article  Google Scholar 

  59. Chen, Y.J., Mamidipalli, S., Huan, T.: HAPPI: an online database of comprehensive human annotated and predicted protein interactions. BMC Genomics 10 (Suppl 1), S16 (2009)

    Article  Google Scholar 

  60. Cheng, B.Y., Carbonell, J.G., Klein-Seetharaman, J.: Protein classification based on text document classification techniques. Proteins 58 (4), 955–970 (2005)

    Article  Google Scholar 

  61. Cheng, F., et al.: Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Comput Biol 8 (5), e1002,503 (2012)

    Article  Google Scholar 

  62. Cheng, Y., Church, G.M.: Biclustering of expression data. In: Intl Conf Intell Sys Mol Biol (RECOMB), pp. 93–103 (2000)

    Google Scholar 

  63. Chitale, M., Hawkins, T., Park, C., Kihara, D.: ESG: extended similarity group method for automated protein function prediction. Bioinformatics 25 (14), 1739–1745 (2009)

    Article  Google Scholar 

  64. Cho, Y., Zhang, A.: Predicting protein function by frequent functional association pattern mining in protein interaction networks. IEEE Trans Info Technol Biomed 14 (1), 30–36 (2009)

    MathSciNet  Google Scholar 

  65. Chua, H.N., Sung, W.K., Wong, L.: Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics 22 (13), 1623–1630 (2006)

    Article  Google Scholar 

  66. Clark, W.T., Radivojac, P.: Analysis of protein function and its prediction from amino acid sequence. Proteins: Struct Funct Bioinf 79 (7), 2086–2096 (2011)

    Article  Google Scholar 

  67. Cong, L., Ran, F.A., Cox, D., Lin, S., Barretto, R., Habib, N., Hsu, P.D., Wu, X., Jiang, W., Marrafini, L.A., Zhang, F.: Multiplex genome engineering using CRISPR/Cas systems. Science 339 (6121), 819–823 (2013)

    Article  Google Scholar 

  68. Consortium, T.U.: Ongoing and future developments at the universal protein resource. Nucleic Acids Res 39 (Database issue), D214–D219 (2011)

    Article  Google Scholar 

  69. Cowley, M.J., Pinese, M., Kassahn, K.S., Waddell, N., Pearson, J.V., Grimmond, S.M., Biankin, A.V., Hautaniemi, S., Wu, J.: PINA v2.0: mining interactome modules. Nucleic Acids Res 40 (Database issue), D862–D865 (2012)

    Google Scholar 

  70. Cozzetto, D., Buchan, D.W.A., Jones, D.T.: Protein function prediction by massive integration of evolutionary analyses and multiple data sources. BMC Bioinf 14 (Suppl 1), S1 (2013)

    Article  Google Scholar 

  71. Dandekar, T., Snel, B., Huynen, M., Bork, P.: Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci 23 (9), 324–328 (1998)

    Article  Google Scholar 

  72. Das, R., Kalita, J., Bhattacharyya, D.K.: A new approach for clustering gene expression time series data. Intl J Bioinform Res Appl 5 (3), 310–328 (2009)

    Article  Google Scholar 

  73. Date, S.V., Marcotte, E.M.: Protein function prediction using the Protein Link EXplorer (PLEX). Bioinformatics 21 (10), 2558–2559 (2005)

    Article  Google Scholar 

  74. Déjean, S., Martin, P.G.P., Besse, P.: Clustering time-series gene expression data using smoothing spline derivatives. EURASIP J Bioinf Sys Biol 2007 (1), 70,561 (2007)

    Google Scholar 

  75. Deng, M., Sun, T., Chen, T.: Assessment of the reliability of protein-protein interactions and protein function prediction. In: Pacific Symp Biocomput (PSB), vol. 8, pp. 140–151 (2003)

    MATH  Google Scholar 

  76. Deng, M., Tu, Z., Sun, F., Chen, T.: Mapping gene ontology to proteins based on protein-protein interaction data. Bioinformatics 20 (6), 895–902 (2004)

    Article  Google Scholar 

  77. Deng, M., Zhang, K., Mehta, S., Chen, T., Sun, F.: Prediction of protein function using protein-protein interaction data. J Comput Biol 10 (6), 947–960 (2003)

    Article  Google Scholar 

  78. Deng, X., Ali, H.H.: A hidden markov model for gene function prediction from sequential expression data. In: IEEE Comput Sys Bioinf Conf (CSB), pp. 670–671 (2004)

    Google Scholar 

  79. Devos, D., Valencia, A.: Practical limits of function prediction. Proteins: Struct Funct Bioinf 41 (1), 98–107 (2000)

    Article  Google Scholar 

  80. Doerks, T., Bairoch, A., Bork, P.: Protein annotation: detective work for function prediction. Trends Genet 14 (6), 248–250 (1998)

    Article  Google Scholar 

  81. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2 edn. Wiley-Interscience (2000)

    Google Scholar 

  82. Dwight, S.S., et al.: Saccharomyces genome database (SGD) provides secondary gene annotation using the gene ontology (GO). Nucleic Acids Res 30 (1), 69–72 (2002)

    Article  Google Scholar 

  83. Eddy, S.R.: Profile hidden Markov models. Bioinformatics 14 (9), 755–763 (1998)

    Article  Google Scholar 

  84. Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30 (1), 207–210 (2003)

    Article  Google Scholar 

  85. Eisen, J.A.: Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res 8 (3), 163–167 (1998)

    Article  Google Scholar 

  86. Eisner, R.,, Poulin, B., Szafron, D., Lu, P., Greiner, R.: Improving protein function prediction using the hierarchical structure of the gene ontology. In: IEEE Comput Intell Bioinf Comput Biol (CIBCB), pp. 1–8 (2005)

    Google Scholar 

  87. Emig, D., Ivliev, A., Pustovalova, O., Lancashire, L., Bureeva, S., Nikolsky, Y., Bessarabova, M.: Drug target prediction and repositioning using an integrated network-based approach. PLoS One 8 (4), e60,618 (2013)

    Article  Google Scholar 

  88. Enault, F., Suhre, K., Abergel, C., Poirot, O., Claverie, J.: Annotation of bacterial genomes using improved phylogenomic profiles. Bioinformatics 19 (Suppl 1), i105–i107 (2003)

    Article  Google Scholar 

  89. Enault, F., Suhre, K., Abergel, C., Poirot, O., Claverie, J.: Phydbac (phylogenomic display of bacterial genes): An interactive resource for the annotation of bacterial genomes. Nucleic Acids Res 31 (13), 3720–3722 (2003)

    Article  Google Scholar 

  90. Enault, F., Suhre, K., Abergel, C., Poirot, O., Claverie, J.: Phydbac2: improved inference of gene function using interactive phylogenomic profile and chromosomal location analysis. Nucleic Acids Res 32 (Web Server Issue), W336–W339 (2004)

    Google Scholar 

  91. Enault, F., Suhre, K., Claverie, J.: Phydbac “gene function predictor”: a gene annotation tool based on genomic context analysis. BMC Bioinf 6 (247) (2005)

    Google Scholar 

  92. Engelhardt, B.E., Jordan, M.I., Muratore, K.E., Brenner, S.E.: Protein molecular function prediction by bayesian phylogenomics. PLoS Comput Biol 1 (5), e45 (2005)

    Article  Google Scholar 

  93. Enright, A.J., Ouzounis, C.A.: Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions. Genome Biol 2 (9), RESEARCH0034 (2001)

    Google Scholar 

  94. Enright, A.J., Van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30 (7), 1575–1584 (2002)

    Article  Google Scholar 

  95. Erickson, H.P.: Cooperativity in protein-protein association: the structure and stability of the actin filament. J Mol Biol 206 (3), 465–474 (1989)

    Article  Google Scholar 

  96. Ernst, J., Nau, G.J., Bar-Joseph, Z.: Clustering short time series gene expression data. Bioinformatics 21 (Suppl 1), i159–i168 (2005)

    Article  Google Scholar 

  97. Eskin, E., Agichtein, E.: Combining text mining and sequence analysis to discover protein functional regions. In: Pac. Symp. Biocomputing, pp. 288–299 (2004)

    Google Scholar 

  98. Falda, M., et al.: Argot2: a large scale function prediction tool relying on semantic similarity of weighted gene ontology terms. BMC Bioinf 28 (Suppl 4), S14 (2012)

    Article  Google Scholar 

  99. Fayech, S., Essoussi, N., Limam, M.: Partitioning clustering algorithms for protein sequence data sets. BioData Mining 2 (1), 3 (2009)

    Article  Google Scholar 

  100. Felsenstein, J.: PHYLIP - phylogeny inference package (version 3.2). Cladistics 5, 164–166 (1989)

    Google Scholar 

  101. Ferrer, L., Dale, J.M., Karp, P.D.: A systematic study of genome context methods: calibration, normalization and combination. BMC Genomics 11 (1), 1–24 (2010)

    Article  Google Scholar 

  102. Fetrow, J.S., Siew, N., Di Gennaro, J.A., Martinez-Yamout, M., Dyson, H.J., Skolnick, J.: Genomic-scale comparison of sequence- and structure-based methods of function prediction: Does structure provide additional insight? Protein Science: A Publication of the Protein Society 10 (5), 1005–1014 (2001)

    Article  Google Scholar 

  103. Forslund, K., Sonnhammer, E.L.: Predicting protein function from doma in content. Bioinformatics 24 (15), 1681–1687 (2008)

    Article  Google Scholar 

  104. French, L.: Fast protein superfamily classification using principal component null space analysis. appendix a: A survey on remote homology detection and protein superfamily classification. Master’s thesis, University of Windsor, Ontario, Canada (2005)

    Google Scholar 

  105. Funk, C.S., Kahanda, I., Ben-Hur, A., Verspoor, K.M.: Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct. J Biomed Semantics 18 (6), 9 (2015)

    Article  Google Scholar 

  106. Gascuel, O.: BIONJ: an improved version of the nj algorithm based on a simple model of sequence data. Mol Biol Evol 14 (7), 685–695 (1997)

    Article  Google Scholar 

  107. Gether, U.: Uncovering molecular mechanisms involved in activation of g protein-coupled receptors. Endocr Rev 21 (1), 90–113 (2000)

    Article  Google Scholar 

  108. Gibrat, J.F., Madej, T., Bryant, S.H.: Surprising similarities in structure comparison. Curr. Opinion Struct. Biol. 6 (3), 377–385 (1996)

    Article  Google Scholar 

  109. Gillis, J., Pavlidis, P.: The role of indirect connections in gene networks in predicting function. Bioinformatics 27 (13), 1860–1866 (2011)

    Article  Google Scholar 

  110. Gligorijevic, V., Przulj, N.: Methods for biological data integration: perspectives and challenges. Roy Soc Interface 12 (112), 20150,571 (2015)

    Article  Google Scholar 

  111. Godzik, A., Skolnick, J.: Flexible algorithm for direct multiple alignment of protein structures and sequences. Comput Appl Biosci 10 (6), 587–596 (1994)

    Google Scholar 

  112. Goh, C., Bogan, A.A., Joachimiak, M., Walther, D., Cohen, F.E.: Co-evolution of proteins with their interaction partners. J Mol Biol 299 (2), 283–293 (2000)

    Article  Google Scholar 

  113. Goldberg, D.S., Roth, F.P.: Assessing experimentally derived interactions in a small world. Proc Natl Acad Sci USA 100 (8), 4372–4376 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  114. Goll, J., Rajagopala, S.V., Shiau, S.C., Wu, H., Lamb, B.T., Uetz, P.: MPIDB: the microbial protein interaction database. Bioinformatics 24 (15), 1743–1744 (2008)

    Article  Google Scholar 

  115. Gomez, S.M., Noble, W.S., Rzhetsky, A.: Learning to predict protein-protein interactions from protein sequences. Bioinformatics 19 (15), 1875–1881 (2003)

    Article  Google Scholar 

  116. Gong, Q., Ning, W., Tian, W.: GoFDR: A sequence alignment based method for predicting protein functions. Methods S1046–2023 (15), 30,048–7 (2015)

    Google Scholar 

  117. Guan, Y., Myers, C.L., Hess, D.C., Barutcuoglu, Z., Caudy, A.A., Troyanskaya, O.G.: Predicting gene function in a hierarchical context with an ensemble of classifiers. Genome Biol 9 (Suppl 1), S3 (2008)

    Article  Google Scholar 

  118. Gui, J., Li, H.: Mixture functional discriminant analysis for gene function classification based on time course gene expression data. In: Joint Statistical Meeting: Biometrics Section (2003)

    Google Scholar 

  119. Gúldener, U., Muensterkoetter, M., Oesterheld, M., Pagel, P., Ruepp, A., Mewes, H.W., Stúmpflen, V.: MPact: the MIPS protein interaction resource on yeast. Nucleic Acids Res 34 (Database issue), D436–D441 (2006)

    Article  Google Scholar 

  120. Guo, X., Gao, L., Wei, C., Yang, X., Zhao, Y., Dong, A.: A computational method based on the integration of heterogeneous networks for predicting disease-gene associations. PLoS One 6 (e24171) (2011)

    Google Scholar 

  121. Guruprasad, K., Prasad, M.S., Kumar, G.R.: Database of structural motifs in proteins. Bioinformatics 16 (4), 372–375 (2000)

    Article  Google Scholar 

  122. Guthke, R., Schmidt-Heck, W., Hahn, D., Pfaff, M.: Gene expression data mining for functional genomics. In: European Symp Intelligent Techniques, pp. 170–1777 (2000)

    Google Scholar 

  123. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach Learn 46 (1–3), 389–422 (2002)

    Article  MATH  Google Scholar 

  124. Hamp, T., et al.: Homology-based inference sets the bar high for protein function prediction. BMC Bioinf 14 (Suppl 1), S7 (2013)

    Article  Google Scholar 

  125. Han, L.Y., Zheng, C.J., Lin, H.H., Cui, J., Li, H., Zhang, H.L., Tang, Z.Q., Chen, Y.Z.: Prediction of functional class of novel plant proteins by a statistical learning method. New Phytol 168 (1), 109–121 (2005)

    Article  Google Scholar 

  126. Hanisch, D., Zien, A., Zimmer, R., Lengauer, T.: Co-clustering of biological networks and gene expression data. Bioinformatics 18 (Suppl 1), S145–S154 (2002)

    Article  Google Scholar 

  127. Hartigan, J.A.: Direct clustering of a data matrix. J Amer Stat Assoc 67 (337), 123–129 (1972)

    Article  Google Scholar 

  128. Hartuv, E., Shamir, R.: A clustering algorithm based on graph connectivity. Information Processing Letters 76 (4–6), 175–181 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  129. Hawkins, T., Chitale, M., Luban, S., Kihara, D.: PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins: Struct Funct Bioinf 74 (3), 566–582 (2009)

    Article  Google Scholar 

  130. Hawkins, T., Luban, S., Kihara, D.: Enhanced automated function prediction using distantly related sequences and contextual association by PFP. Protein Sci 15 (6), 1550–1556 (2006)

    Article  Google Scholar 

  131. Hayete, B., Bienkowska, J.R.: GOTrees: Predicting go associations from protein domain composition using decision trees. In: Pacific Symp Biocomput (PSB), pp. 140–151 (2005)

    Google Scholar 

  132. Heard, N., Holmes, C.C., Stephens, D.A., Hand, D.J., Dimopoulos, G.: Bayesian coclustering of anopheles gene expression time series: Study of immune defense response to multiple experimental challenges. Proc Natl Acad Sci USA 102 (47), 16,939–16,944 (2005)

    Article  Google Scholar 

  133. Hegyi, H., Gerstein, M.: The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J Mol Biol 288 (1), 147–164 (1999)

    Article  Google Scholar 

  134. Hinson, J.T., Chopra, A., Nafissi, N., Polacheck, W.J., Benson, C.C., Swist, S., Gorham, J., Yang, L., Schafer, S., Sheng, C.C., Haghighi, A., Homsy, J., Hubner, N., Church, G., Cook, S.A., Linke, W.A., Chen, C.S., Seidman, J.G., Seidman, C.E.: Heart disease. titin mutations in iPS cells define sarcomere insufficiency as a cause of dilated cardiomyopathy. Science 349 (6251), 892–986 (2015)

    Google Scholar 

  135. Hishigaki, H., Nakai, K., Ono, T., Tanigami, A., Takagi, T.: Assessment of prediction accuracy of protein function from protein-protein interaction data. Yeast 18 (6), 523–531 (2001)

    Article  Google Scholar 

  136. Holm, L., Sander, C.: Protein structure comparison by alignment of distance matrices. jmb 233 (1), 123–138 (1993)

    Google Scholar 

  137. Hou, J., Chi, X.: Predicting protein functions from PPI networks using functional aggregation. Mathematical Biosciences 240 (1), 63–69 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  138. Hou, J., S.-R., J., Zhang, C., Kim, S.: Global mapping of the protein structure space and application in structure-based inference of protein function. Proc. Natl. Acad. Sci. USA 102, 3651–3656 (2005)

    Article  Google Scholar 

  139. Hou, Y., Hsu, W., Lee, M.L., Bystroff, C.: Efficient remote homology detection using local structure. Bioinformatics 19 (17), 2294–2301 (2003)

    Article  Google Scholar 

  140. Hsu, P.D., Lander, E.S., Zhang, F.: Development and applications of CRISPR-Cas9 for genome engineering. Cell 157 (6), 1262–1278 (2014)

    Article  Google Scholar 

  141. Huang, J.Y., Brutlag, D.L.: The EMOTIF database. Nucleic Acids Res 29 (1), 202–204 (2001)

    Article  Google Scholar 

  142. Huang, Y., Yeh, H., Soo, V.: Inferring drug-disease associations from integration of chemical, genomic and phenotype data using network propagation. BMC Med Genomics 6 (3), S4 (2013)

    Article  Google Scholar 

  143. Hulo, N., Sigrist, C.J., Le Saux, V., Langendijk-Genevaux, P.S., Bordoli, L., Gattiker, A., De Castro, E., Bucher, P., Bairoch, A.: Recent improvements to the PROSITE database. Nucl. Acids Res. 32 (1), D134–D137 (2003)

    Google Scholar 

  144. Hulo, N., et al.: The PROSITE database. Nucleic Acids Res 34 (Database issue), D227–D230 (2006)

    Article  Google Scholar 

  145. Humphrey, W., Dalke, A., Schulten, K.: VMD - Visual Molecular Dynamics. J. Mol. Graph. Model. 14 (1), 33–38 (1996). http://www.ks.uiuc.edu/Research/vmd/

    Article  Google Scholar 

  146. Hunter, S., et al.: InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40 (Database issue), 306–312 (2012)

    Article  Google Scholar 

  147. Huynen, M., Snel, B., Lathe, W., Bork, P.: Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res 10 (8), 1204–1210 (2000)

    Article  Google Scholar 

  148. Hvidsten, T., Komorowski, J., Sandvik, A., Laegreid, A.: Predicting gene function from gene expressions and ontologies. In: Pacific Symp Biocomput (PSB), pp. 299–310 (2001)

    Google Scholar 

  149. Iakoucheva, L.M., Dunker, A.K.: Order, disorder, and flexibility: Prediction from protein sequence. Structure 11 (11), 1316–1317 (2003)

    Article  Google Scholar 

  150. Jaakkola, T., Diekhans, M., Haussler, D.: Using the fisher kernel method to detect remote protein homologies. In: T. Lengauer, R. Schneider, P. Bork, D. Brutlag, J. Glasgow, H.W. Mewes, R. Zimmer (eds.) Int Conf Intell Sys Mol Biol (ISMB), pp. 149–159. AAAI Press, Menlo Park, CA (1999)

    Google Scholar 

  151. Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. J Comput Biol 7 (1–2), 95–114 (2000)

    Article  Google Scholar 

  152. Jaimovich, A., Elidan, G., Margalit, H., Friedman, N.: Towards an integrated protein-protein interaction network: A relational markov network approach. J Comput Biol 13 (2), 145–164 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  153. Jensen, L., et al.: Prediction of human protein function from post-translational modifications and localization features. J Mol Biol 319 (5), 1257–1265 (2002)

    Article  Google Scholar 

  154. Jensen, L.J., Gupta, R., Staerfeldt, H., Brunak, S.: Prediction of human protein function according to gene ontology categories. Bioinformatics 19 (5), 635–642 (2003)

    Article  Google Scholar 

  155. Jiang, D., Pei, J., Ramanathan, M., Tang, C., Zhang, A.: Mining coherent gene clusters from gene-sample-time microarray data. In: ACM Intl Conf Knowledge Discovery Data Mining (SIGKDD), pp. 430–439 (2004)

    Google Scholar 

  156. Jiang, J.Q.: Learning protein functions from bi-relational graph of proteins and function annotations. In: Algorithms in Bioinformatics, Lecture Notes in Computer Science, vol. 6833, pp. 128–138. Springer Verlag (2011)

    Google Scholar 

  157. Jiang, X., Nariai, N., Steffen, M., Kasif, S., Kolaczyk, E.: Integration of relational and hierarchical network information for protein function prediction. BMC Bioinf 9, 350 (2008)

    Article  Google Scholar 

  158. Jiang, X., et al.: An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Quantitative Methods arXiv pp. 1–70 (2016)

    Google Scholar 

  159. Joshi, T., Xu, D.: Quantitative assessment of relationship between sequence similarity and function similarity. BMC Genomics 8 (1), 1–10 (2007)

    Article  Google Scholar 

  160. Kabsch, W.: Efficient remote homology detection using local structure. Acta. Crystallog. sect. A 34, 827–828 (1978)

    Article  Google Scholar 

  161. Kalathur, R.K., Pinto, J.P., Hernández-Prieto, M.A., Machado, R.S., Almeida, D., Chaurasia, G., Futschik, M.E.: UniHI 7: an enhanced database for retrieval and interactive analysis of human molecular interaction networks. Nucleic Acids Res 42 (Database issue), D408–D414 (2014)

    Article  Google Scholar 

  162. Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., Hattori, M.: The KEGG resource for deciphering the genome. Nucleic Acids Res 32 (Database Issue), D277–D280 (2004)

    Article  Google Scholar 

  163. Karaoz, U., Murali, T.M., Letovsky, S., Zheng, Y., Ding, C., Cantor, C.R., Kasif, S.: Whole-genome annotation by using evidence integration in functional-linkage networks. Proc Natl Acad Sci USA 101 (9), 2888–2893 (2004)

    Article  Google Scholar 

  164. Karplus, K., Barret, C., Hughey, R.: Hidden markov models for detecting remote protein homologies. Bionformatics 14 (10), 846–856 (1998)

    Article  Google Scholar 

  165. Keasar, C., Kolodny, R.: Using protein fragments for searching and data-mining protein databases. In: AAAI Workshop, pp. 1–6 (2013)

    Google Scholar 

  166. Keck, H., Wetter, T.: Functional classification of proteins using a nearest neighbor algorithm. In Silico Biology 3 (3), 265–275 (2003)

    Google Scholar 

  167. Kelley, L.A., Sternberg, M.J.: rotein structure prediction on the web: a case study using the phyre server. Nat Protocols 4 (3), 363–371 (2009)

    Google Scholar 

  168. Keseler, I.M., Collado-Vides, J., Gama-Castro, S., Ingraham, J., Paley, S., Paulsen, I.T., Peralta-Gil, M., D., K.P.: EcoCyc: a comprehensive database resource for escherichia coli. Nucleic Acids Res 33 (Database Issue), D334–D337 (2005)

    Google Scholar 

  169. Keshava, P., et al.: Human protein reference database–2009 update. Nucleic Acids Res 37 (Database issue), D767–D772 (2009)

    Article  Google Scholar 

  170. Khan, I., Wei, Q., Chapman, S., Dukka, B.K., Kihara, D.: The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches. GigaScience 4, 43 (2015)

    Article  Google Scholar 

  171. King, A., Przulj, N., Jurisica, I.: Protein complex prediction via cost-based clustering. Bioinformatics 20 (17), 3013–3020 (2004)

    Article  Google Scholar 

  172. King, R.D., Karwath, A., Clare, A., Dehaspe, L.: Accurate prediction of protein functional class from sequence in the mycobacterium tuberculosis and escherichia coli genomes using data mining. Yeast 17 (4), 283–293 (2000)

    Article  Google Scholar 

  173. King, R.D., Karwath, A., Clare, A., Dehaspe, L.: The utility of different representations of protein sequence for predicting functional class. Bioinformatics 17 (5), 445–454 (2001)

    Article  Google Scholar 

  174. Kirilova, S., Carugo, O.: Progress in the PRIDE technique for rapidly comparing protein three-dimensional structures. BMC Research Notes 1, 44 (2008)

    Article  Google Scholar 

  175. Kissinel, E., Henrick, K.: Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallographica D Bio Crystallogr 60 (12.1), 2256–2268 (2004)

    Google Scholar 

  176. Kleywegt, G.J.: Use of noncrystallographic symmetry in protein structure refinement. Acta Crystallogr D. 52 (Pt. 4), 842–857 (1996)

    Article  Google Scholar 

  177. Koehl, P.: Protein structure similarities. Curr. Opinion Struct. Biol. 11, 348–353 (2001)

    Article  Google Scholar 

  178. Kolesnikov, N., et al.: Arrayexpress update–simplifying data submissions. Nucleic Acids Res 43 (Database issue), D1113–D1116 (2015)

    Article  Google Scholar 

  179. Kolesov, G., Mewes, H.W., Frishman, D.: Snapping up functionally related genes based on context information: a colinearity-free approach. J Mol Biol 311 (4), 639–656 (2001)

    Article  Google Scholar 

  180. Kolesov, G., Mewes, H.W., Frishman, D.: Snapper: gene order predicts gene function. Bioinformatics 18 (7), 1017–1019 (2002)

    Article  Google Scholar 

  181. Kolodny, R., Koehl, P., Guibas, L., Levitt, M.: Small libraries of protein fragments model native protein structures accurately. J. Mol. Biol. 323, 297–307 (2002)

    Article  Google Scholar 

  182. Kolodny, R., Koehl, P., Levitt, M.: Comprehensive evaluation of protein structure alignment methods: Scoring by geometric measures. J. Mol. Biol. 346, 1173–1188 (2005)

    Article  Google Scholar 

  183. Kondor, R.I., Lafferty, J.: Diffusion kernels on graphs and other discrete structures. In: Int Conf Mach Learn (ICML), pp. 315–322 (2002)

    Google Scholar 

  184. Koonin, E.V., Galperin, M.Y.: Sequence - evolution - function: Computational approaches in comparative genomics. In: Evolutionary Concept in Genetics and Genomics, 1 edn., chap. 2 Kluwer Academic, Boston, MA (2003)

    Google Scholar 

  185. Korbel, J.O., Jensen, L.J., von Mering, C., Bork, P.: Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nature Biotechnol 22 (7), 911–917 (2004)

    Article  Google Scholar 

  186. Koskinen, P., Törönen, P., Nokso-Koivisto, J., Holm, L.: PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment. Bioinformatics 31 (10), 1544–1552 (2015)

    Article  Google Scholar 

  187. Kourmpetis, Y.A., van Dijk, A.D., Bink, M.C., van Ham, R.C., ter Braak, C.J.: Bayesian markov random field analysis for protein function prediction based on network data. PLoS One 5 (2), e9293 (2010)

    Article  Google Scholar 

  188. Kourmpetis, Y.A., van Dijk, A.D., ter Braak, C.J.: Gene ontology consistent protein function prediction: the falcon algorithm applied to six eukaryotic genomes. Algorithms Mol Biol 8 (1), 10 (2013)

    Article  Google Scholar 

  189. Kuang, R., Ie, E., Wang, K., Wang, K., Siddiqi, M., Freund, Y., Leslie, C.: Profile-based string kernels for remote homology detection and motif extraction. J Bioinf Comput Biol 3 (3), 527–550 (2005)

    Article  Google Scholar 

  190. Kuncheva, L.I., Bezdek, J.C., Duin, R.P.W.: Simple ensemble methods are competitive with state-of-the-art data integration methods for gene function prediction. Pattern Recognition 34 (2), 299–314 (2011)

    Article  Google Scholar 

  191. Kunik, V., Solan, Z., Edelman, S., Ruppin, E., Horn, D.: Motif extraction and protein classification. In: Pacific Symp Biocomput (PSB), pp. 80–85 (2005)

    Google Scholar 

  192. Kuramochi, M., Karypis, G.: Gene classification using expression profiles. In: IEEE Symp Bioinf Bioeng (BIBE), pp. 191–200 (2001)

    Google Scholar 

  193. Lagreid, A., Hvidsten, T.R., Midelfart, H., Komorowski, J., Sandvik, A.K.: Predicting gene ontology biological process from temporal gene expression patterns. Genome Res 13 (5), 965–979 (2003)

    Article  Google Scholar 

  194. Lan, L., et al.: Ms-knn: Protein function prediction by integrating multiple data sources. BMC Bioinform 14 (Suppl 1), S8 (2013)

    Article  Google Scholar 

  195. Lanckriet, G.R.G., De Bie, T., Cristianini, N., Jordan, M.I., Noble, W.S.: A statistical framework for genomic data fusion. Bioinformatics 20 (16), 2626–2635 (2004)

    Article  Google Scholar 

  196. Lanckriet, G.R.G., Deng, M., Cristianini, N., Jordan, M.I., Noble, W.S.: Kernel-based data fusion and its application to protein function prediction in yeast. In: Pacific Symp Biocomput (PSB), pp. 300–311 (2004)

    Google Scholar 

  197. Lavezzo, E., Falda, M., Fontana, P., Bianco, L., Toppo, S.: Enhancing protein function prediction with taxonomic constraints - the Argot2.5 web server. Methods 93, 15–23 (2016)

    Article  Google Scholar 

  198. Lee, D., Redfern, O., Orengo, C.: Predicting protein function from sequence and structure. Nat. Rev. Mol. Cell Biol. 8, 995–1005 (2007)

    Article  Google Scholar 

  199. Lee, J., Gross, S.P., Lee, J.: Improved network community structure improves function prediction. Scientific Reports 3, 2197 (2013)

    Google Scholar 

  200. Lee, J., Lee, I., Lee, J.: Unbiased global optimization of Lennard-Jones clusters for n ≤ 201 using the conformational space annealing method. Phys Rev Lett 91 (8), 080,201 (2003)

    Article  Google Scholar 

  201. Lee, J., Scheraga, H.A., Rackovsky, S.: New optimization method for conformational energy calculations on polypeptides: conformational space annealing. J Comput Chem 18 (9), 1222–1232 (1997)

    Article  Google Scholar 

  202. Legrain, P., Wojcik, J., Gauthier, J.M.: Protein–protein interaction maps: a lead towards cellular functions. Trends Genet 17 (6), 346–352 (2001)

    Article  Google Scholar 

  203. Leslie, C.S., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20 (4), 467–476 (2003)

    Article  Google Scholar 

  204. Letovsky, S., Kasif, S.: Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics 19 (Suppl 1), i197–i204 (2003)

    Article  Google Scholar 

  205. Letsche, T.A., Berry, M.W.: Large-scale information retrieval with latent semantic indexing. Inf Sci 100 (1–4), 105–137 (1997)

    Article  Google Scholar 

  206. Levitt, M., Gerstein, M.: A unified statistical framework for sequence comparison and structure comparison. Proc. Natl. Acad. Sci. USA 95 (11), 5913–5920 (1998)

    Article  Google Scholar 

  207. Levy, E., Ouzounis, C.A., Gilks, W.R., Audit, B.: Probabilistic annotation of protein sequences based on functional classifications. BMC Bioinf 6, 302 (2005)

    Article  Google Scholar 

  208. Li, H., Liang, S.: Local network topology in human protein interaction data predicts functional association. PLoS One 4 (7), e6410 (2009)

    Article  MathSciNet  Google Scholar 

  209. Li, H., Tong, P., Gallegos, J., Dimmer, E., Cai, G., Molldrem, J.J., Liang, S.: PAND: A distribution to identify functional linkage from networks with preferential attachment property. PLoS One 10 (7), e0127,968 (15)

    Google Scholar 

  210. Li, H.L., Fujimoto, N., Sasakawa, N., Shirai, S., Ohkame, T., Sakuma, T., Tanaka, M., Amano, N., Watanabe, A., Sakurai, H., Yamamoto, T., Yamanaka, S., Hotta, A.: Precise correction of the dystrophin gene in duchenne muscular dystrophy patient induced pluripotent stem cells by TALEN and CRISPR-Cas9. Stem Cell Reports 4 (1), 143–154 (2015)

    Article  Google Scholar 

  211. Li, L., Stoeckert, C.J., Roos, D.S.: OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res 13 (9), 2178–2189 (2003)

    Article  Google Scholar 

  212. Li, Y., L., C.: Big biologica data: Challenges and opportunities. Genomics, Proteomics, and Bioinformatics 12 (5), 187–189 (2014)

    Google Scholar 

  213. Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comp. Biol. 10 (6), 857–868 (2002)

    Article  Google Scholar 

  214. Liberles, D.A., Thorn, A., von Heijne G. AN Elofsson, A.: The use of phylogenetic profiles for gene predictions. Current Genomics 3 (3), 131–137 (2002)

    Google Scholar 

  215. Lingling, A., Doerge, R.W.: Dynamic clustering of gene expression. ISRN Bioinformatics 2012 (537217), 1–12 (2012)

    Google Scholar 

  216. Lisewski, A.M., Lichtarge, O.: Rapid detection of similarity in protein structure and function through contact metric distances. Nucl. Acids Res. 34 (22), e152 (2006)

    Article  Google Scholar 

  217. Liu, A.H., Califano, A.: Functional classification of proteins by pattern discovery and top-down clustering of primary sequences. IBM Systems J 40 (2), 379–393 (2001)

    Article  Google Scholar 

  218. Liu, B., Wang, X., Chen, Q., Dong, Q., Lan, X.: Using amino acid physicochemical distance transformation for fast protein remote homology detection. PLoS One 7 (9), e46,633 (2012)

    Article  Google Scholar 

  219. Liu, B., Wang, X., Lin, L., Dong, Q., Wang, X.: A discriminative method for protein remote homology detection and fold recognition combining top-n-grams and latent semantic analysis. BMC Bioinf 9 (510) (2008)

    Google Scholar 

  220. Liu, B., et al.: Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 30 (4), 472–479 (2014)

    Article  Google Scholar 

  221. Liu, J., Wang, W., Yang, J.: Gene ontology friendly biclustering of expression profiles. In: IEEE Comput Sys Bioinf Conf (CSB), pp. 436–447 (2004)

    Google Scholar 

  222. Liu, Q., Chen, Y.P., Li, J.: k-partite cliques of protein interactions: A novel subgraph topology for functional coherence analysis on PPI networks. J Theoretical Biol 340 (7), 146–154 (2014)

    Google Scholar 

  223. Lobley, A., Swindells, M.B., Orengo, C.A., Jones, D.T.: Inferring function using patterns of native disorder in proteins. PLoS Comput Biol 3 (8), e162 (2007)

    Article  MathSciNet  Google Scholar 

  224. Lobley, A.E.: Human protein function prediction: application of machine learning for integration of heterogeneous data sources. Ph.D. thesis, University College London (2010)

    Google Scholar 

  225. Lobley, A.E., Nugent, T., Orengo, C.A., Jones, D.T.: FFPred: an integrated feature-based function prediction server for vertebrate proteomes. Nucleic Acids Res 36 (Web server issue), W297–W302 (2008)

    Google Scholar 

  226. Ma, Q., Chirn, G.W., Cai, R., Szustakowski, J., Nirmala, N.C.: Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks. BMC Bioinf 6 (1), 242 (2005)

    Article  Google Scholar 

  227. Ma, X., Chen, T., Sun, F.: Integrative approaches for predicting protein function and prioritizing genes for complex phenotypes using protein interaction networks. Briefings in Bioinformatics 15 (5), 685–698 (2013)

    Article  Google Scholar 

  228. Maciag, K., et al.: Systems-level analyses identify extensive coupling among gene expression machines. Mol Syst Biol 2 (1), 0003 (2006)

    Google Scholar 

  229. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: A survey. IEEE Trans Comput Biol Bioinf 1 (1), 24–45 (2004)

    Article  Google Scholar 

  230. Marchler-Bauer, A., et al.: CDD: a conserved domain database for protein classification. Nucleic Acids Res 33 (Database issue), D192–D196 (2005)

    Article  Google Scholar 

  231. Marco, F., Alberto, B., Valentini, G.: UNIPred: Unbalance-aware network integration and prediction of protein functions. J Comput Biol 22 (12), 1057–1074 (2015)

    Article  Google Scholar 

  232. Marcotte, C.J.V., Marcotte, E.M.: Predicting functional linkages from gene fusions with confidence. Applied Bioinf 1 (2), 93–100 (2002)

    Google Scholar 

  233. Marcotte, E.M., Pellegrini, M., Ng, H., Rice, D.W., Yeates, T.O., Eisenberg, D.: Detecting protein function and protein-protein interactions from genome sequences. Science 285 (5428), 751–753 (1999)

    Article  Google Scholar 

  234. Marti-Renom, M.A., Capriotti, E., Shindyalov, I.N., Bourne, P.E.: Structure comparison and alignment. In: J. Gu, P.E. Bourne (eds.) Structural Bioinformatics, 2 edn., chap. 16 John Wiley & Sons (2009)

    Google Scholar 

  235. Martin, A.C.: The ups and downs of protein topology; rapid comparison of protein structure. Protein Eng. 13 (12), 829–837 (2000)

    Article  Google Scholar 

  236. Martin, D.M., Berriman, M., Barton, G.J.: GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes. BMC Bioinf 5 (178) (2004)

    Google Scholar 

  237. Mateos, A., Dopazo, J., Jansen, R., Tu, Y., Gerstein, M., Stolovitzky, G.: Systematic learning of gene functional classes from dna array expression data by using multilayer perceptrons. Genome Res 12 (11), 1703–1715 (2002)

    Article  Google Scholar 

  238. McDowall, M.D., Scott, M.S., Barton, G.J.: PIPs: human protein-protein interaction prediction database. Nucleic Acids Res 37 (Database issue), D651–D656 (2009)

    Article  Google Scholar 

  239. Mi, H., Muruganujan, A., Casagrande, J.T., Thomas, P.T.: Large-scale gene function analysis with the PANTHER classification system. Nat Protocol 8 (8), 1551–1566 (2013)

    Article  Google Scholar 

  240. Mi, H., et al.: The PANTHER database of protein families and subfamilies and functions and pathways. Nucleic Acids Res 33 (Database issue), D284–D288 (2005)

    Article  Google Scholar 

  241. Midelfart, H., Laegreid, A., Komorowski, J.: Classification of gene expression data in an ontology. In: Medical Data Analysis, Lecture Notes in Computer Science, vol. 2199, pp. 186–194. Springer (2001)

    Google Scholar 

  242. Miele, V., Penel, S., Daubin, V., Picard, F., Kahn, D., Duret, L.: High-quality sequence clustering guided by network topology and multiple alignment likelihood. Bioinformatics 28 (8), 1078–1085 (2012)

    Article  Google Scholar 

  243. Möller-Levet, C.S., Cho, K., Yin, H., Wolkenhauer, O.: Clustering of gene expression time-series data. Tech. rep., University of Rostock, Germany (2003)

    MATH  Google Scholar 

  244. Möller-Levett, C.S., Klawonn, F., Cho, K.: Clustering of unevenly sampled gene expression time-series data. Science 152 (1), 49–66 (2005)

    MathSciNet  MATH  Google Scholar 

  245. Molloy, K., Min, J.V., Barbara, D., Shehu, A.: Exploring representations of protein structure for automated remote homology detection and mapping of protein structure space. BMC Bioinf 15 (Suppl 8), S4 (2014)

    Article  Google Scholar 

  246. Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52 (1), 91–118 (2003)

    Article  MATH  Google Scholar 

  247. Moosavi, S., Rahgozar, M., Rahimi, A.: Protein function prediction using neighbor relativity in protein-protein interaction network. Comput Biol Chem 43, 11–16 (2013)

    Article  Google Scholar 

  248. Mostfavi, S., Morris, Q.: Fast integration of heterogeneous data sources for predicting gene function with limited annotation. Bioinformatics 26 (14), 1759–1765 (2010)

    Article  Google Scholar 

  249. Muda, H.M., Saad, P., Othman, R.M.: Remote protein homology detection and fold recognition using two-layer support vector machine classifiers. Comput Biol Med 41 (8), 687–699 (2011)

    Article  Google Scholar 

  250. Mukherjee, S.: Classifying microarray data using support vector machines. In: D.P. Berrar, W. Dubitzky, M. Granzow (eds.) A Practical Approach to Microarray Data Analysis, chap. 9 Kluwer Academic Publishers (2003)

    Google Scholar 

  251. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995)

    Google Scholar 

  252. Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21 (Suppl 1), i302–i310 (2005)

    Article  Google Scholar 

  253. Nair, R., Carter, P., Rost, B.: Nlsdb: database of nuclear localization signals. Nucleic Acid Research 31 (1), 397–399 (2003)

    Article  Google Scholar 

  254. Najmanovich, R.J., Torrance, W., Thornton, J.M.: Prediction of protein function from structure: Insights from methods for the detection of local structural similarities. Bio Techniques 38 (6), 847–851 (2005)

    Google Scholar 

  255. Nariai, N., Kolaczyk, E.D., Kasif, S.: Probabilistic protein function prediction from heterogeneous genome-wide data. PLoS One 2 (3), e337 (2007)

    Article  Google Scholar 

  256. Narra, K., Liao, L.: Use of extended phylogenetic profiles with E-values and support vector machines for protein family classification. Intl J Computer Info Sci 6 (1) (2005)

    Google Scholar 

  257. Nepusz, T., Sasidharan, R., Paccanaro, A.: SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale. BMC Bioinf 11 (1), 120 (2010)

    Article  Google Scholar 

  258. Ng, S., Tan, S., Sundararajan, V.: On combining multiple microarray studies for improved functional classification by whole-dataset feature selection. Genome Informatics 14, 44–53 (2003)

    Google Scholar 

  259. Ng, S., Zhu, Z., Ong, Y.: Whole-genome functional classification of genes by latent semantic analysis on microarray data. In: Asia-Pacific Conf on Bioinformatics, pp. 123–129 (2004)

    Google Scholar 

  260. Ni, Q., Wang, Z., Han, Q., Li, G.: Using logistic regression method to predict protein function from protein-protein interaction data. In: IEEE Intl Conf Bioinf Biomed Eng (ICBBE), pp. 1–4 (2009)

    Google Scholar 

  261. Obozinski, G., Lanckriet, G., Grant, C., Jordan, M., Noble, W.S.: Consistent probabilistic output for protein function prediction. Genome Biol 9 (Suppl 1), S6 (2008)

    Article  Google Scholar 

  262. Ofer, D., Linial, M.: ProFET: Feature engineering captures high-level protein functions. Bioinformatics 31 (21), 3429–3436 (2015)

    Article  Google Scholar 

  263. Oliver, S.: Guilt-by-association goes global. Nature 403 (6770), 601–603 (2000)

    Article  Google Scholar 

  264. Oliver, S.G.: From DNA sequence to biological function. Nature 379 (6566), 597–600 (1996)

    Article  Google Scholar 

  265. Orchard, S., et al.: The MIntAct project–IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res 42 (Database issue), D358–D363 (2014)

    Article  Google Scholar 

  266. Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH database: A hierarchic classification of protein domain structures. Structure 5 (8), 1093–1108 (1997)

    Article  Google Scholar 

  267. Orengo, C.A., Taylor, W.R.: SSAP: sequential structure alignment program for protein structure comparison. Methods Enzymol 266, 617–635 (1996)

    Article  Google Scholar 

  268. Ortiz, A.R., Strauss, C.E., Olmea, O.: MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci 11 (11), 2606–2621 (2002)

    Article  Google Scholar 

  269. Osadchy, M., Kolodny, R.: Maps of protein structure space reveal a fundamental relationship between protein structure and function. Proc. Natl. Acad. Sci. USA 108, 12,301–12,306 (2011)

    Google Scholar 

  270. Ouali, M., King, R.D.: Cascaded multiple classifiers for secondary structure prediction. Protein Science 9 (6), 1162–1176 (2000)

    Article  Google Scholar 

  271. Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., Matlsev, N.: Use of contiguity on the chromosome to predict functional coupling. In Silico Biol 1 (2), 93–108 (1999)

    Google Scholar 

  272. Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., Matlsev, N.: The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA 96 (6), 2896–2901 (1999)

    Article  Google Scholar 

  273. Pagel, P., et al.: The MIPS mammalian protein-protein interaction database. Bioinformatics 21 (6), 832–834 (2005)

    Article  Google Scholar 

  274. Pasquier, C., Promponas, V., Hamodrakas, S.J.: PRED-CLASS: cascading neural networks for generalized protein classification and genome-wide application. Proteins 44 (3), 361–369 (2000)

    Article  Google Scholar 

  275. Pavlidis, P., Cai, J., Weston, J., Noble, W.S.: Learning gene functional classifications from multiple data types. J Comput Biol 9 (2), 401–411 (2002)

    Article  Google Scholar 

  276. Pazos, F., Valencia, A.: Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng 14 (9), 609–614 (2001)

    Article  Google Scholar 

  277. Pearl, F.M., Bennett, C.F., Bray, J.E., al., e.: The CATH database: an extended protein family resource for structural and functional genomics. Nucl. Acids Res. 31, 452–455 (2003)

    Google Scholar 

  278. Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc Natl Aca Sci USA 85 (8), 2444–2448 (1988)

    Article  Google Scholar 

  279. Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., Yeates, T.O.: The underlying hypothesis is that two genes with similar phylogeny profiles will also be functionally similar. Proc Natl Acad Sci USA 96 (8), 4285–4288 (1999)

    Article  Google Scholar 

  280. Pereira-Leal, J.B., Enright, A.J., Ouzounis, C.A.: Detection of functional modules from protein interaction networks. Proteins: Struct Funct Bioinf 54 (1), 49–57 (2004)

    Article  Google Scholar 

  281. Pérez, A.J., Rodriguez, A., Trelles, O., Thode, G.: A computational strategy for protein function assignment which addresses the multidomain problem. Comp Funct Genomics 3 (5), 423–440 (2002)

    Article  Google Scholar 

  282. Perutz, M.F., Rossmann, M.G., Cullis, A.F., Muirhead, H., Will, G., North, A.C.T.: Structure of myoglobin: a three-dimensional fourier synthesis at 5.5 angstrom resolution. Nature 185, 416–422 (1960)

    Article  Google Scholar 

  283. Piovesan, D., Giollo, M., Ferrari, C., Tossato, S.C.E.: Protein function prediction using guilty by association from interaction networks. Amino Acids 47 (12), 2583–2592 (2015)

    Article  Google Scholar 

  284. Prieto, C., De Las Rivas, J.: APID: Agile protein interaction dataanalyzer. Nucleic Acids Res 34 (Web Server issue), W298–W302 (2006)

    Google Scholar 

  285. Qian, B., Goldstein, R.A.: Detecting distant homologs using phylogenetic tree-based HMMs. Proteins 52 (3), 446–453 (2003)

    Article  Google Scholar 

  286. Qin, W., Dion, S.L., Kutny, P.M., Zhang, Y., Cheng, A.W., Jillete, N.L., Malhotra, A., Geurts, A.M., Chen, Y.G., Wang, J.: Efficient CRISPR/Cas9-Mediated genome editing in mice by zygote electroporation of nuclease. Genetics 200 (2), 423–430 (2015)

    Article  Google Scholar 

  287. Radivojac, P., et al.: A large-scale evaluation of computational protein function prediction methods. Nat Methods 10 (3), 221–227 (2013)

    Article  Google Scholar 

  288. Rangwala, H., Karypis, G.: Profile-based direct kernels for remote homology detection and fold recognition. Bioinformatics 21 (23), 4239–4247 (2005)

    Article  Google Scholar 

  289. Rappoport, N., Karsenty, S., Stern, A., Linial, N., Linial, M.P.: ProtoNet 6.0: organizing 10 million protein sequences in a compact hierarchical family tree. Nucleic Acids Res 40 (Database Issue), D313–D320 (2012)

    Google Scholar 

  290. Rawlings, N.D., Barrett, A.J.: MEROPS: the peptidase database. Nucleic Acids Res 27 (1), 325–331 (1999)

    Article  Google Scholar 

  291. Raychaudari, S., Chang, J., Sutphin, P., Altman, R.: Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. Genome Research 12 (1), 203–214 (2002)

    Article  Google Scholar 

  292. Re, M., Valentini, G.: Simple ensemble methods are competitive with state-of-the-art data integration methods for gene function prediction. J Mach Learn Res 8, 98–111 (2010)

    Google Scholar 

  293. Remmert, M., Biegert, A., Hauser, A., Söding, J.: HHblits: lightning-fast iterative protein sequence searching by hmm-hmm alignment. Nat Methods 9 (2), 173–175 (2011)

    Article  Google Scholar 

  294. Renner, A., Aszodi, A.: High-throughput functional annotation of novel gene products using document clustering. In: Proc. Symp. Biocomputing (PSB), pp. 54–68 (2000)

    Google Scholar 

  295. Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J Mach Learn 5, 101–141 (2004)

    MathSciNet  MATH  Google Scholar 

  296. Riley, M.: Systems for categorizing functions of gene products. Curr Opin Struct Biol 8 (3), 388–392 (1998)

    Article  Google Scholar 

  297. Roch, K.G.L., et al.: Discovery of gene function by expression profiling of the malaria parasite life cycle. Science 301 (5639), 1503–1508 (2003)

    Article  Google Scholar 

  298. Rogen, P., Fain, B.: Automatic classification of protein structure by using gauss integrals. Proc. Natl. Acad. Sci. USA 100 (1), 119–124 (2003)

    Article  Google Scholar 

  299. Rost, B.: Enzyme function less conserved than anticipated. J Mol Biol 318, 595–608 (1999)

    Article  Google Scholar 

  300. Ruepp, A., et al.: The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res 32 (18), 5539–5545 (2004)

    Article  Google Scholar 

  301. Saini, A., Hou, J.: Progressive clustering based method for protein function prediction. Bulletin Math Biol 75 (2), 331–350 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  302. Samanta, M.P., Liang, S.: Predicting protein functions from redundancies in large-scale protein interaction networks. Proc Natl Acad Sci USA 100 (22), 12,579–12,583 (2003)

    Article  Google Scholar 

  303. Sander, J.D., Joung, J.K.: CRISPR-Cas systems for editing, regulating and targeting genomes. Nature Biotechnology 32 (4), 347–355 (2014)

    Article  Google Scholar 

  304. Sarac, O.S., Atalay, V., Cetin-Atalay, R.: GOPred: GO molecular function prediction by combined classifiers. PLoS One 5 (8), e12,382 (2010)

    Article  Google Scholar 

  305. Sasson, O., Linial, N., Linial, M.P.: The metric space of proteins-comparative study of clustering algorithms. Bioinformatics 18 (Suppl 1), S14–S21 (2002)

    Article  MathSciNet  Google Scholar 

  306. Sboner, A., Mu, X.J., Greenbaum, D., Auerbach, R.K., Gerstein, M.B.: The real cost of sequencing: higher than you think! Genome Biol 12 (8), 125–134 (2011)

    Article  Google Scholar 

  307. Schietgat, L., Vens, C., Struyf, J., Blockeel, H., Kocev, D., Dzeroski, S.: Predicting gene function using hierarchical multi-label decision tree ensembles. BMC Bioinf 11 (1), 2 (2010)

    Article  MATH  Google Scholar 

  308. Schnoes, A.M., Brown, S.D., Dodevski, I., Babbitt, P.C.: Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput Biol 5 (12), e1000,605 (2009)

    Article  Google Scholar 

  309. Schnoes, A.M., Ream, D.C., Thorman, A.W., Babbitt, P.C., Friedberg, I.: Biases in the experimental annotations of protein function and their effect on our understanding of protein function space. PLoS Comput Biol 9 (5), e1003,063 (2013)

    Article  Google Scholar 

  310. Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press (2002)

    Google Scholar 

  311. Schug, J.: Predicting gene ontology functions from ProDom and CDD protein domains. Genome Res 12 (4), 648–655 (2002)

    Article  Google Scholar 

  312. Schwikowski, B., Uetz, P., Fields, S.: A network of protein-protein interactions in yeast. Nat Biotechnol 18 (12), 1257–1261 (2000)

    Article  Google Scholar 

  313. Serres, M.H., Riley, M.: MultiFun, a multifunctional classification scheme for Escherichia coli K-12 gene products. Microb Comp Genomics 5 (4), 205–222 (2000)

    Article  Google Scholar 

  314. Servant, F., Bru, C., Carrere, S., et al.: ProDom: Automated clustering of homologous domains. Briefings in Bioinformatics 3 (3), 246–251 (2002)

    Article  Google Scholar 

  315. Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Mol Sys Biol 3 (1), 88 (2007)

    Google Scholar 

  316. Sherlock, G., et al.: The stanford microarray database. Nucleic Acid Res 29 (1), 152–155 (2001)

    Article  Google Scholar 

  317. Shi, X., et al.: BMRF-Net: a software tool for identification of protein interaction subnetworks by a bagging markov random field-based method. Bioinformatics 31 (14), 2412–2414 (2015)

    Article  Google Scholar 

  318. Shiga, M., Takigawa, I., Mamitsuka, H.: Annotating gene function by combining expression data with a modular gene network. Bioinformatics 23 (13), i468–i478 (2007)

    Article  Google Scholar 

  319. Shindyalov, I.N., Bourne, P.E.: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 11 (9), 739–747 (1998)

    Article  Google Scholar 

  320. Sierk, M.L., Pearson, W.R.: Sensitivity and selectivity in protein structure comparison. Protein Sci. 13 (3), 773–785 (2004)

    Article  Google Scholar 

  321. Sjolanderk, K.: Phylogenomic inference of protein molecular function: advances and challenges. Bioinformatics 20 (2), 170–179 (2004)

    Article  Google Scholar 

  322. Sliwoski, G., Kothiwale, S., Meiler, J., Lowe, E.W.: Computational method in drug discovery. Pharmacol Rev 66 (1), 334–395 (2014)

    Article  Google Scholar 

  323. Soding, J.: Protein homology detection by HMM-HMM comparison. Bioinformatics 21 (7), 951–960 (2005)

    Article  Google Scholar 

  324. Sokolov, A., Ben-Hur, A.: Hierarchical classification of gene ontology terms using the GOstruct method. J Bioinform Comput Biol 8 (2), 357–376 (2010)

    Article  Google Scholar 

  325. Song, J., Singh, M.: How and when should interactome-derived clusters be used to predict functional modules and protein function? Bioinformatics 25 (23), 3143–3150 (2009)

    Article  Google Scholar 

  326. Sonnenburg, S., Ratsch, G., Schafer, C., Scholkopf, B.: Large scale multiple kernel learning. journal of machine learning research. J Mach Learn Res 7, 1531–1565 (2006)

    MathSciNet  MATH  Google Scholar 

  327. Sonnhammer, E.L., Eddy, S.R., Birney, E., Bateman, A., Durbin, R.: Pfam: Multiple sequence alignments and HMM-profiles of protein domains. Nucl. Acids Res. 26 (1), 320–322 (1998)

    Article  Google Scholar 

  328. Sonnhammer, E.L., Eddy, S.R., Durbin, R.: Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins: Struct. Funct. Bioinf. 28 (3), 405–420 (1997)

    Article  Google Scholar 

  329. Sonnhammer, E.L., Eddy, S.R., Durbin, R.: Pfam: A comprehensive database of protein domain families based on seed alignments. Proteins 28 (3), 405–420 (1997)

    Article  Google Scholar 

  330. Spirin, V., Mirny, L.A.: Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci USA 100 (21), 12,123–12,128 (2003)

    Article  Google Scholar 

  331. Stark, A., Sunyaev, S., Russell, R.B.: A model for statistical significance of local similarities in structure. J. Mol. Biol. 326 (5), 1307–1316 (2003)

    Article  Google Scholar 

  332. Subbiah, S., Laurents, D.V., Levitt, M.: Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Curr Biol 3 (3), 141–148 (1993)

    Article  Google Scholar 

  333. Swift, S., Tucker, A., Vinciotti, V., Martin, N., Orengo, C., Liu, X., Kellam, P.: Consensus clustering and functional interpretation of gene-expression data. Genome Biol 5 (11), R94 (2004)

    Article  Google Scholar 

  334. Szklarczyk, D., et al.: STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43 (Database Issue), D447–D552 (2015)

    Article  Google Scholar 

  335. Tan, P., Kumar, V., Srivastava, J.: Selecting the right objective measure for association analysis. Information Systems 29, 293–313 (2004)

    Article  Google Scholar 

  336. Tanay, A., Sharan, R., Kupiec, M., Shamir, R.: Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. Proc Natl Acad Sci USA 101 (9), 2981–2986 (2004)

    Article  Google Scholar 

  337. Tang, L., Chen, J., Ye, J.: On multiple kernel learning with multiple labels. In: Intl Joint Conf Artif Intell (IJCAI), pp. 1255–1260 (2009)

    Google Scholar 

  338. Tang, M., et al.: Graphical models for protein function and structure prediction. In: M. Elloumi, A.Y. Zomaya (eds.) Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data, Wiley series on Bioinformatics: Computational Techniques nd Engineering, chap. 9, pp. 191–222. Wiley (2013)

    Google Scholar 

  339. Tarcea, V.G., et al.: Michigan molecular interactions r2: from interacting proteins to pathways. Nucleic Acids Res 37 (Database issue), D642–D646 (2009)

    Article  Google Scholar 

  340. Tatusov, R.L., Fedorova, N.D., Jackson, J.D., et al.: The COG database: an updated version includes eukaryotes. BMC Bioinf 4, 41 (2003)

    Article  Google Scholar 

  341. Tchagang, A.B., et al.: Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm. BMC Bioinf 13 (54), 2105–2154 (2012)

    Google Scholar 

  342. Tetko, I., Facius, A., Ruepp, A., Mewes, H.W.: Super paramagnetic clustering of protein sequences. BMC Bioinf 6 (1), 82 (2005)

    Article  Google Scholar 

  343. Thode, G., Garcia-Ranea, J.A., Jimenez, J.: Search for ancient patterns in protein sequences. J Mol Evol 42 (2), 224–233 (1996)

    Article  Google Scholar 

  344. Thomas, T.: Multidomain proteins. eLS pp. 1–8 (2014)

    Google Scholar 

  345. Thoren, A.: The PhylProm database - extending the use of phylogenetic profiles and their applications for membrane proteins. Master’s thesis, Stockholm University, Sweden (2000)

    Google Scholar 

  346. Tordai, H., Nagy, A., Farkas, K., Bányai, L., Patthy, L.: Modules, multidomain proteins and organismic complexity. FEBS J 272 (19), 5064–5078 (2005)

    Article  Google Scholar 

  347. Tornow, S., Mewes, H.W.: Functional modules by relating protein interaction networks and gene expression. Nucleic Acids Res 31 (21), 6283–6289 (2003)

    Article  Google Scholar 

  348. Troyanskaya, O.G., Dolinski, K., Owen, A.B., Altman, R.B., Botstein, D.: A bayesian framework for combining heterogeneous data sources for gene function prediction (in saccharomyces cerevisiae. Proc Natl Acad Sci USA 100 (4), 8348–8353 (2003)

    Article  Google Scholar 

  349. Tsai, C.J., Nussinov, R.: Hydrophobic folding units at protein-protein interfaces: implications to protein folding and to protein-protein association. Protein Sci 6 (7), 1426–1437 (1996)

    Article  Google Scholar 

  350. Uchiyama, I.: Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes. Nucleic Acids Res 34 (2), 647–658 (2006)

    Article  Google Scholar 

  351. Valastyan, J.S., Lindquist, S.: Mechanisms of protein-folding diseases at a glance. Disease Models and Mechanisms 7 (1), 9–14 (2014)

    Article  Google Scholar 

  352. Valentini, G.: True path hierarchical ensembles for genome-wide gene function prediction. IEEE Trans Comput Biol Bioinform 8 (3), 832–847 (2011)

    Article  MathSciNet  Google Scholar 

  353. van Noort, V., Snel, B., Huynen, M.A.: Predicting gene function by conserved co-expression. Trends Genet 19 (5), 238–242 (2003)

    Article  Google Scholar 

  354. Vanunu, O., Magger, O., Ruppin, E., Shlomi, T., Sharan, R.: Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol 6 (1), e1000,641 (2010)

    Article  MathSciNet  Google Scholar 

  355. Vazquez, A., Flammini, A., Maritan, A., Vespignani, A.: Global protein function prediction from protein-protein interaction networks. Nature Biotechnol 21 (6), 697–700 (2003)

    Article  Google Scholar 

  356. Veretnik, S., Gu, J., Wodak, S.: Identifying structural domains in proteins. In: J. Gu, P. Bourne (eds.) Structural Bioinformatics, 2 edn., chap. 20, pp. 487–515. John Wiley & Sons (2009)

    Google Scholar 

  357. Verleyen, W., Ballouz, S., Gillis, J.: Measuring the wisdom of the crowds in network-based gene function inference. Bioinformatics 31 (5), 745–752 (2015)

    Article  Google Scholar 

  358. Vert, J.: A tree kernel to analyze phylogenetic profiles. Bioinformatics 18 (Suppl 1), S276–S284 (2002)

    Article  Google Scholar 

  359. Vlahovicek, K., Murvai, J., Barta, E., Pongor, S.: The SBASE protein domain library and release 9.0: an online resource for protein domain identification. Nucleic Acids Res 30 (1), 273–275 (2002)

    Google Scholar 

  360. Vlahovicek, K., Pintar, A., Parthasarathi, L., Carugo, O., Pongor, S.: CX, DPX and PRIDE: WWW servers for the analysis and comparison of protein 3d structures. Nucleic Acids Res 33 (Web Server issue), W252–W254 (2005)

    Google Scholar 

  361. Vogel, C., Bashton, M., Kerrison, N.D., Chothia, C., Teichmann, S.A.: Structure, function and evolution of multidomain proteins. Curr Opin Struct Biol 14 (2), 208–216 (2004)

    Article  Google Scholar 

  362. Walker, M.G., Volkmuth, W., Sprinzak, E., Hodgson, D., Klingler, T.: Prediction of gene function by genome-scale expression analysis: prostate cancer-associated genes. Genome Res 9 (12), 1198–1203 (1999)

    Article  Google Scholar 

  363. Wang, D., Hou, J.: Explore the hidden treasure in protein-protein interaction networks - an iterative model for predicting protein functions. J Bioinf and Comput Biol 13 (1550026), 22 (2015)

    Google Scholar 

  364. Wang, M., Shang, X., Xie, D., Li, Z.: Mining frequent dense subgraphs based on extending vertices from unbalanced PPI networks. In: IEEE Intl Conf Bioinf Biomed Eng (ICBBE), pp. 1–7 (2009)

    Google Scholar 

  365. Wang, X., Schroeder, D., Dobbs, D., Honavar, V.: Automated data-driven discovery of motif-based protein function classifiers. Inf Sci 155 (1–2), 1–18 (2003)

    MathSciNet  Google Scholar 

  366. Wang, Z., Cao, R., Cheng, J.: Three-level prediction of protein function by combining profile-sequence search, profile-profile search, and domain co-occurrence networks. BMC Bioinf 14 (3), S3 (2013)

    Google Scholar 

  367. Wass, M.N., Barton, G., Sternberg, M.J.E.: Combfunc: predicting protein function using heterogeneous data sources. Nucleic Acids Res 40 (Web server issue), W466–W470 (2012)

    Google Scholar 

  368. Wass, M.N., Sternberg, M.J.: ConFunc-functional annotation in the twilight zone. Bioinformatics 24 (6), 798–806 (2007)

    Article  Google Scholar 

  369. Whisstock, J.C., Lesk, A.M.: Prediction of protein function from protein sequence and structure. Q Rev Biophys 36 (3), 307–340 (2003)

    Article  Google Scholar 

  370. Wohlers, I., Andonov, R., Klau, G.W.: Algorithm engineering for optimal alignment of protein structure distance matrices. Optimization Letters (2011). DOI 10.1007/s11590-011-0313-3. URL https://hal.inria.fr/inria-00586067

  371. Wohlers, I., Le Boudic-Jamin, M., Djidjev, H., Klau, G.W., Andonov, R.: Exact Protein Structure Classification Using the Maximum Contact Map Overlap Metric. In: 1st International Conference on Algorithms for Computational Biology, AlCoB 2014, pp. 262–273. Tarragona, Spain (2014). DOI 10.1007/978-3-319-07953-0_21. URL https://hal.inria.fr/hal-01093803

    Google Scholar 

  372. Wohlers, I., Malod-Dognin, N., Andonov, R., Klau, G.W.: CSA: Comprehensive comparison of pairwise protein structure alignments. Nucleic Acids Research pp. 303–309 (2012). URL https://hal.inria.fr/hal-00667920. Preprint, submitted to Nucleic Acids Research

  373. Wu, C., Berry, M., Shivakumar, S., McLarty, J.: Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition. Mach Learn 21 (1), 177–193 (1992)

    Google Scholar 

  374. Wu, C., Ermongkonchai, A., Chang, T.C.: Protein classification using a neural network proein database (nnpdb) system. In: Anal Neural Net Appl Conf, pp. 29–41 (1991)

    Google Scholar 

  375. Wu, C., Whitson, G., McLarty, J., Ermongkonchai, A., Chang, T.C.: Protein classification artificial neural system. Protein Sci 1 (5), 667–677 (1995)

    Article  Google Scholar 

  376. Wu, C.H., Whitson, G.M., Montllor, G.J.: PROCANS: a protein classification system using a neural network. Neural Networks 2, 91–96 (1990)

    Google Scholar 

  377. Wu, J., Kasif, S., DeLisi, C.: Identification of functional links between genes using phylogenetic profiles. Bioinformatics 19 (12), 1524–1530 (2003)

    Article  Google Scholar 

  378. Wu, L.F., Hughes, T.R., Davierwala, A.P., Robinson, M.D., Stoughton, R., Altschuler, S.J.: Large-scale prediction of saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat Genet 31 (3), 255–265 (2002)

    Article  Google Scholar 

  379. Xenarios, I., Rice, D.W., Salwinski, L., Baron, M.K., Marcotte, E.M., Eisenberg, D.: Dip: the database of interacting proteins. Nucleic Acids Res 28 (1), 289–291 (2000)

    Article  Google Scholar 

  380. Xie, H., Wasserman, A., Levine, Z., Novik, A., Grebinskiy, V., Shoshan, A., Mintz, L.: Large-scale protein annotation through gene ontology. Genome Res 12 (5), 785–794 (2002)

    Article  Google Scholar 

  381. Yahalom, R., Reshef, D., Wiener, A., Frankel, S., Kalisman, N., Lerner, B., Keasar, C.: Structure-based identification of catalytic residues. Proteins 79 (6), 1952–1963 (2011)

    Article  Google Scholar 

  382. Yan, Y., J., M.: Protein family clustering for structural genomics. J Mol Biol 353 (3), 744–759 (2005)

    Google Scholar 

  383. Yanai, I., Derti, A., DeLisi, C.: Genes linked by fusion events are generally of the same functional category: a systematic analysis of 30 microbial genomes. Proc Natl Acad Sci USA 98 (14), 7940–7945 (2001)

    Article  Google Scholar 

  384. Yang, J., Wang, H., Wang, W., Yu, P.: Enhanced biclustering on expression data. In: IEEE Symp Bioinf Bioeng (BIBE), pp. 321–327 (2003)

    Google Scholar 

  385. Yona, G., Linial, N., Linial, M.P.: ProtoMap: automatic classification of protein sequences and hierarchy of protein families. Nucleic Acids Res 28 (1), 49–55 (2000)

    Article  Google Scholar 

  386. Yu, G., Rangwala, H., Domeniconi, C., Zhang, G., Yu, Z.: Protein function prediction using multi-label ensemble classification. IEEE/ACM Trans Comput Biol Bioinform 10 (4), 1045–1057 (2013)

    Article  Google Scholar 

  387. Zemla, A.: LGA: a method for finding 3D similarities in protein structures. Nucl. Acids Res. 31 (13), 3370–3374 (2003)

    Article  Google Scholar 

  388. Zhang, W., et al.: The functional landscape of mouse gene expression. J Biol 3 (5), 21 (2004)

    Article  Google Scholar 

  389. Zhang, X., Dai, D.: A framework for incorporating functional interrelationships into protein function prediction algorithms. IEEE/ACM Trans Comput Biol Bioinform 9 (3), 740–753 (2012)

    Article  Google Scholar 

  390. Zhang, Y., Skolnick, J.: TM-align: a protein structure alignment algorithm based on the TM-score. Nucl. Acids Res. 33 (7), 2302–2309 (2005)

    Article  Google Scholar 

  391. Zhang, Z.H., Hwee, K.L., Mihalek, I.: Reduced representation of protein structure: implications on efficiency and scope of detection of structural similarity. BMC Bioinformatics 11, 155 (2010)

    Article  Google Scholar 

  392. Zheng, Y., Roberts, R.J., Kasif, S.: Genomic functional annotation using co-evolution profiles of gene clusters. Genome Biol 3 (11), research0060.1–0060.9 (2002)

    Google Scholar 

  393. Zhou, D., Bousquet, O., Lal, T., Weston, J., Schlkopf, B.: Learning with local and global consistency. In: Advances Neural Inform Processing Systems (NIPS), pp. 321–328 (2004)

    Google Scholar 

  394. Zhou, X., Kao, M.C., Wong, W.: Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci USA 99 (20), 12,783–12,788 (2002)

    Article  Google Scholar 

  395. Zhou, Y., Young, J.A., Santrosyan, A., Chen, K., Yan, S.F., Winzeler, E.A.: In silico gene function prediction using ontology-based pattern identification. Bioinformatics 21 (7), 1237–1245 (2005)

    Article  Google Scholar 

  396. Zhu, J., Zhang, M.Q.: SCPD: a promoter database of the yeast saccharomyces cerevisiae. Bionformatics 15 (7), 607–611 (1999)

    Article  Google Scholar 

  397. Zitnik, M., Zupan, B.: Data fusion by matrix factorization. IEEE Trans Pattern Anal Mach Intell 37 (1), 41–53 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

Funding for this work is provided in part by NSF-IIS1144106.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amarda Shehu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Shehu, A., Barbará, D., Molloy, K. (2016). A Survey of Computational Methods for Protein Function Prediction. In: Wong, KC. (eds) Big Data Analytics in Genomics. Springer, Cham. https://doi.org/10.1007/978-3-319-41279-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41279-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41278-8

  • Online ISBN: 978-3-319-41279-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics