MMRF for Proteome Annotation Applied to Human Protein Disease Prediction

  • Beatriz García-Jiménez
  • Agapito Ledezma
  • Araceli Sanchis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6489)


Biological processes where every gene and protein participates is an essential knowledge for designing disease treatments. Nowadays, these annotations are still unknown for many genes and proteins. Since making annotations from in-vivo experiments is costly, computational predictors are needed for different kinds of annotation such as metabolic pathway, interaction network, protein family, tissue, disease and so on. Biological data has an intrinsic relational structure, including genes and proteins, which can be grouped by many criteria. This hinders the possibility of finding good hypotheses when attribute-value representation is used. Hence, we propose the generic Modular Multi-Relational Framework (MMRF) to predict different kinds of gene and protein annotation using Relational Data Mining (RDM). The specific MMRF application to annotate human protein with diseases verifies that group knowledge (mainly protein-protein interaction pairs) improves the prediction, particularly doubling the area under the precision-recall curve.


Relational Data Mining Human Disease Annotation Multi-Class Relational Decision Tree First-Order Logic Structured Data 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

978-3-642-21295-6_11_MOESMa_ESM.pdf (626 kb)
Electronic Supplementary Material (627 KB)


  1. 1.
    Al-Shahrour, F., et al.: Babelomics: a systems biology perspective in the functional annotation of genome-scale experiments. Nucl. Acids Res. 34, W472–W476 (2006)CrossRefGoogle Scholar
  2. 2.
    Amberger, J., et al.: McKusick’s Online Mendelian Inheritance in Man (OMIM(R)). Nucl. Acids Res. 37, D793–D796 (2009)CrossRefGoogle Scholar
  3. 3.
    Blockeel, H., De Raedt, L.: Top-down induction of logical decision trees. Artificial Intelligence 101(1-2), 285–297 (1998)MathSciNetMATHCrossRefGoogle Scholar
  4. 4.
    Clare, A.: Machine learning and data mining for yeast functional genomics. PhD thesis, University of Wales, Aberystwyth (2003)Google Scholar
  5. 5.
    Dennis, G., et al.: DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biology 4(5), P3 (2003)CrossRefGoogle Scholar
  6. 6.
    García, B., et al.: Modular Multi-Relational Framework for Gene Group Function Prediction.. In: Online Proceedings ILP (2009)Google Scholar
  7. 7.
    García Jiménez, B., Ledezma, A., Sanchis, A.: S.cerevisiae complex function prediction with modular multi-relational framework. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds.) IEA/AIE 2010. LNCS, vol. 6098, pp. 82–91. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Gewehr, J., et al.: BioWeka extending the Weka framework for bioinformatics. Bioinformatics 23(5), 651–653 (2007)CrossRefGoogle Scholar
  9. 9.
    Goh, K., et al.: The human disease network. PNAS 104(21), 8685–8690 (2007)CrossRefGoogle Scholar
  10. 10.
    Jensen, J., et al.: Prediction of human protein function according to Gene Ontology categories. Bioinformatics 19(5), 635–642 (2003)CrossRefGoogle Scholar
  11. 11.
    Kelso, J., et al.: eVOC: A Controlled Vocabulary for Unifying Gene Expression Data. Genome Research 13(6a), 1222–1230 (2003)CrossRefGoogle Scholar
  12. 12.
    Lee, B., et al.: Identification of protein functions using a machine-learning approach based on sequence-derived properties. Proteome Science 7(1), 27 (2009)CrossRefGoogle Scholar
  13. 13.
    Lee, D., et al.: Predicting protein function from sequence and structure. Nature reviews. Molecular Cell Biology 8(12), 995–1005 (2007)CrossRefGoogle Scholar
  14. 14.
    Linghu, B., et al.: Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network. Genome Biology 10(9), R91 (2009)CrossRefGoogle Scholar
  15. 15.
    Matthews, L., et al.: Reactome knowledgebase of human biological pathways and processes. Nucl. Acids Res. 37, D619–D622 (2009)CrossRefGoogle Scholar
  16. 16.
    Peña-Castillo, L., et al.: A critical assessment of mus musculus gene function prediction using integrated genomic evidence. Genome Biology 9, S2 (2008)CrossRefGoogle Scholar
  17. 17.
    Smedley, D., et al.: BioMart-biological queries made easy. BMC Genomics 10 (2009)Google Scholar
  18. 18.
    Stark, C., et al.: BioGRID: a general repository for interaction datasets. Nucl. Acids Res. 34, 535–539 (2006)CrossRefGoogle Scholar
  19. 19.
    Trajkovski, I., et al.: Learning relational descriptions of differentially expressed gene groups. IEEE Transactions on Systems, Man, and Cybernetics 38(1), 16–25 (2008)CrossRefGoogle Scholar
  20. 20.
    Tran, T.N., Satou, K., Ho, T.-B.: Using inductive logic programming for predicting protein-protein interactions from multiple genomic data. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 321–330. Springer, Heidelberg (2005), CrossRefGoogle Scholar
  21. 21.
    Vens, C., et al.: Decision trees for hierarchical multi-label classification. Machine Learning 73(2), 185–214 (2008)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Beatriz García-Jiménez
    • 1
  • Agapito Ledezma
    • 1
  • Araceli Sanchis
    • 1
  1. 1.Universidad Carlos III de MadridLeganésSpain

Personalised recommendations