Skip to main content

Random Perturbations of Term Weighted Gene Ontology Annotations for Discovering Gene Unknown Functionalities

  • Conference paper
  • First Online:
Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2014)

Abstract

Computational analyses for biomedical knowledge discovery greatly benefit from the availability of the description of gene and protein functional features expressed through controlled terminologies and ontologies, i.e. of their controlled annotations. In the last years, several databases of such annotations have become available; yet, these annotations are incomplete and only some of them represent highly reliable human curated information. To predict and discover unknown or missing annotations existing approaches use unsupervised learning algorithms. We propose a new learning method that allows applying supervised algorithms to unsupervised problems, achieving much better annotation predictions. This method, which we also extend from our preceding work with data weighting techniques, is based on the generation of artificial labeled training sets through random perturbations of original data. We tested it on nine Gene Ontology annotation datasets; obtained results demonstrate that our approach achieves good effectiveness in novel annotation prediction, outperforming state of the art unsupervised methods.

This research is part of the “GenData 2020” project funded by the Italian MIUR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.cs.waikato.ac.nz/ml/weka/.

References

  1. GO Consortium, et al.: Creating the gene ontology resource: design and implementation. Genome Res. 11, 1425–1433 (2001)

    Google Scholar 

  2. Pandey, G., Kumar, V., Steinbach, M.: Computational approaches for protein function prediction: A survey. Technical report, Minneapolis, MN, USA (2006)

    Google Scholar 

  3. Domeniconi, G., Masseroli, M., Moro, G., Pinoli, P.: Discovering new gene functionalities from random perturbations of known gene ontological annotations. In: International Conference on Knowledge Discovery and Information Retrieval (KDIR 2014) (2014)

    Google Scholar 

  4. Canakoglu, A., Ghisalberti, G., Masseroli, M.: Integration of biomolecular interaction data in a genomic and proteomic data warehouse to support biomedical knowledge discovery. In: Biganzoli, E., Vellido, A., Ambrogi, F., Tagliaferri, R. (eds.) CIBB 2011. LNCS, vol. 7548, pp. 112–126. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  5. Tanoue, J., Yoshikawa, M., Uemura, S.: The genearound go viewer. Bioinformatics 18, 1705–1706 (2002)

    Article  Google Scholar 

  6. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  7. Domeniconi, G., Moro, G., Pasolini, R., Sartori, C.: Cross-domain text classification through iterative refining of target categories representations. In: Proceedings of the 6th International Conference on Knowledge Discovery and Information Retrieval (2014)

    Google Scholar 

  8. Pinoli, P., Chicco, D., Masseroli, M.: Weighting Scheme Methods for Enhanced Genomic Annotation Prediction. In: Formenti, E., Tagliaferri, R., Wit, E. (eds.) Computational Intelligence Methods for Bioinformatics and Biostatistics. LNCS (LNBI), vol. 8452, pp. 76–89. Springer, Heidelberg (2014)

    Google Scholar 

  9. Sparck Jones, K.: Document Retrieval Systems, pp. 132–142. Taylor Graham Publishing, London (1988)

    Google Scholar 

  10. Domeniconi, G., Moro, G., Pasolini, R., Sartori, C.: Iterative refining of category profiles for nearest centroid cross-domain text classification. In: Fred, A., et al. (eds.) IC3K 2014. CCIS, vol. 553, pp. 50–67. Springer, Heidelberg (2015)

    Google Scholar 

  11. Done, B., Khatri, P., Done, A., Draghici, S.: Semantic analysis of genome annotations using weighting schemes. In: IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, CIBCB 2007, pp. 212–218. IET (2007)

    Google Scholar 

  12. Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Proceedings of SAC-03, 18th ACM Symposium on Applied Computing, pp. 784–788. ACM Press (2003)

    Google Scholar 

  13. Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. Pattern Anal. Mach. Intell. 31, 721–735 (2009)

    Article  Google Scholar 

  14. King, O.D., Foulger, R.E., Dwight, S.S., White, J.V., Roth, F.P.: Predicting gene function from patterns of annotation. Genome Res. 13, 896–904 (2003)

    Article  Google Scholar 

  15. Tao, Y., Sam, L., Li, J., Friedman, C., Lussier, Y.A.: Information theory applied to the sparse gene ontology annotation network to predict novel gene function. Bioinformatics 23, i529–i538 (2007)

    Article  Google Scholar 

  16. Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label prediction of gene function. Bioinformatics 22, 830–836 (2006)

    Article  Google Scholar 

  17. Raychaudhuri, S., Chang, J.T., Sutphin, P.D., Altman, R.B.: Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. Genome Res. 12, 203–214 (2002)

    Article  Google Scholar 

  18. Pérez, A.J., Perez-Iratxeta, C., Bork, P., Thode, G., Andrade, M.A.: Gene annotation from scientific literature using mappings between keyword systems. Bioinformatics 20, 2084–2091 (2004)

    Article  Google Scholar 

  19. Khatri, P., Done, B., Rao, A., Done, A., Draghici, S.: A semantic analysis of the annotations of the human genome. Bioinformatics 21, 3416–3421 (2005)

    Article  Google Scholar 

  20. Done, B., Khatri, P., Done, A., Draghici, S.: Predicting novel human gene ontology annotations using semantic analysis. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 7, 91–99 (2010)

    Article  Google Scholar 

  21. Chicco, D., Tagliasacchi, M., Masseroli, M.: Genomic annotation prediction based on integrated information. In: Biganzoli, E., Vellido, A., Ambrogi, F., Tagliaferri, R. (eds.) CIBB 2011. LNCS, vol. 7548, pp. 238–252. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  22. Chicco, D., Masseroli, M.: A discrete optimization approach for SVD best truncation choice based on ROC curves. In: 2013 IEEE 13th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 1–4. IEEE (2013)

    Google Scholar 

  23. Dumais, S.T., Furnas, G.W., Landauer, T.K., Deerwester, S., Harshman, R.: Using latent semantic analysis to improve access to textual information. In: Proceedings of the SIGCHI Conference on Human factors in Computing Systems, pp. 281–285. ACM (1988)

    Google Scholar 

  24. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)

    Google Scholar 

  25. Masseroli, M., Chicco, D., Pinoli, P.: Probabilistic latent semantic analysis for prediction of gene ontology annotations. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2012)

    Google Scholar 

  26. Pinoli, P., Chicco, D., Masseroli, M.: Enhanced probabilistic latent semantic analysis with weighting schemes to predict genomic annotations. In: 2013 IEEE 13th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 1–4. IEEE (2013)

    Google Scholar 

  27. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  28. Bicego, M., Lovato, P., Oliboni, B., Perina, A.: Expression microarray classification using topic models. In: Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 1516–1520. ACM (2010)

    Google Scholar 

  29. Perina, A., Lovato, P., Murino, V., Bicego, M.: Biologically-aware latent Dirichlet allocation (BaLDA) for the classification of expression microarray. In: Dijkstra, T.M.H., Tsivtsivadze, E., Marchiori, E., Heskes, T. (eds.) PRIB 2010. LNCS, vol. 6282, pp. 230–241. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  30. Pinoli, P., Chicco, D., Masseroli, M.: Latent Dirichlet allocation based on gibbs sampling for gene function prediction. In: Proceedings of the International Conference on Computational Intelligence in Bioinformatics and Computational Biology, pp. 1–7. IEEE Computer Society (2014)

    Google Scholar 

  31. Griffiths, T.: Gibbs Sampling in the Generative Model of Latent Dirichlet Allocation, Technical report, Stanford University (2002)

    Google Scholar 

  32. Casella, G., George, E.I.: Explaining the gibbs sampler. Am. Stat. 46, 167–174 (1992)

    MathSciNet  Google Scholar 

  33. Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., Welling, M.: Fast collapsed gibbs sampling for latent Dirichlet allocation. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 569–577. ACM (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giacomo Domeniconi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Domeniconi, G., Masseroli, M., Moro, G., Pinoli, P. (2015). Random Perturbations of Term Weighted Gene Ontology Annotations for Discovering Gene Unknown Functionalities. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2014. Communications in Computer and Information Science, vol 553. Springer, Cham. https://doi.org/10.1007/978-3-319-25840-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25840-9_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25839-3

  • Online ISBN: 978-3-319-25840-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics