Abstract
Computational analyses for biomedical knowledge discovery greatly benefit from the availability of the description of gene and protein functional features expressed through controlled terminologies and ontologies, i.e. of their controlled annotations. In the last years, several databases of such annotations have become available; yet, these annotations are incomplete and only some of them represent highly reliable human curated information. To predict and discover unknown or missing annotations existing approaches use unsupervised learning algorithms. We propose a new learning method that allows applying supervised algorithms to unsupervised problems, achieving much better annotation predictions. This method, which we also extend from our preceding work with data weighting techniques, is based on the generation of artificial labeled training sets through random perturbations of original data. We tested it on nine Gene Ontology annotation datasets; obtained results demonstrate that our approach achieves good effectiveness in novel annotation prediction, outperforming state of the art unsupervised methods.
This research is part of the “GenData 2020” project funded by the Italian MIUR.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
GO Consortium, et al.: Creating the gene ontology resource: design and implementation. Genome Res. 11, 1425–1433 (2001)
Pandey, G., Kumar, V., Steinbach, M.: Computational approaches for protein function prediction: A survey. Technical report, Minneapolis, MN, USA (2006)
Domeniconi, G., Masseroli, M., Moro, G., Pinoli, P.: Discovering new gene functionalities from random perturbations of known gene ontological annotations. In: International Conference on Knowledge Discovery and Information Retrieval (KDIR 2014) (2014)
Canakoglu, A., Ghisalberti, G., Masseroli, M.: Integration of biomolecular interaction data in a genomic and proteomic data warehouse to support biomedical knowledge discovery. In: Biganzoli, E., Vellido, A., Ambrogi, F., Tagliaferri, R. (eds.) CIBB 2011. LNCS, vol. 7548, pp. 112–126. Springer, Heidelberg (2012)
Tanoue, J., Yoshikawa, M., Uemura, S.: The genearound go viewer. Bioinformatics 18, 1705–1706 (2002)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Domeniconi, G., Moro, G., Pasolini, R., Sartori, C.: Cross-domain text classification through iterative refining of target categories representations. In: Proceedings of the 6th International Conference on Knowledge Discovery and Information Retrieval (2014)
Pinoli, P., Chicco, D., Masseroli, M.: Weighting Scheme Methods for Enhanced Genomic Annotation Prediction. In: Formenti, E., Tagliaferri, R., Wit, E. (eds.) Computational Intelligence Methods for Bioinformatics and Biostatistics. LNCS (LNBI), vol. 8452, pp. 76–89. Springer, Heidelberg (2014)
Sparck Jones, K.: Document Retrieval Systems, pp. 132–142. Taylor Graham Publishing, London (1988)
Domeniconi, G., Moro, G., Pasolini, R., Sartori, C.: Iterative refining of category profiles for nearest centroid cross-domain text classification. In: Fred, A., et al. (eds.) IC3K 2014. CCIS, vol. 553, pp. 50–67. Springer, Heidelberg (2015)
Done, B., Khatri, P., Done, A., Draghici, S.: Semantic analysis of genome annotations using weighting schemes. In: IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, CIBCB 2007, pp. 212–218. IET (2007)
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Proceedings of SAC-03, 18th ACM Symposium on Applied Computing, pp. 784–788. ACM Press (2003)
Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. Pattern Anal. Mach. Intell. 31, 721–735 (2009)
King, O.D., Foulger, R.E., Dwight, S.S., White, J.V., Roth, F.P.: Predicting gene function from patterns of annotation. Genome Res. 13, 896–904 (2003)
Tao, Y., Sam, L., Li, J., Friedman, C., Lussier, Y.A.: Information theory applied to the sparse gene ontology annotation network to predict novel gene function. Bioinformatics 23, i529–i538 (2007)
Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label prediction of gene function. Bioinformatics 22, 830–836 (2006)
Raychaudhuri, S., Chang, J.T., Sutphin, P.D., Altman, R.B.: Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. Genome Res. 12, 203–214 (2002)
Pérez, A.J., Perez-Iratxeta, C., Bork, P., Thode, G., Andrade, M.A.: Gene annotation from scientific literature using mappings between keyword systems. Bioinformatics 20, 2084–2091 (2004)
Khatri, P., Done, B., Rao, A., Done, A., Draghici, S.: A semantic analysis of the annotations of the human genome. Bioinformatics 21, 3416–3421 (2005)
Done, B., Khatri, P., Done, A., Draghici, S.: Predicting novel human gene ontology annotations using semantic analysis. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 7, 91–99 (2010)
Chicco, D., Tagliasacchi, M., Masseroli, M.: Genomic annotation prediction based on integrated information. In: Biganzoli, E., Vellido, A., Ambrogi, F., Tagliaferri, R. (eds.) CIBB 2011. LNCS, vol. 7548, pp. 238–252. Springer, Heidelberg (2012)
Chicco, D., Masseroli, M.: A discrete optimization approach for SVD best truncation choice based on ROC curves. In: 2013 IEEE 13th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 1–4. IEEE (2013)
Dumais, S.T., Furnas, G.W., Landauer, T.K., Deerwester, S., Harshman, R.: Using latent semantic analysis to improve access to textual information. In: Proceedings of the SIGCHI Conference on Human factors in Computing Systems, pp. 281–285. ACM (1988)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)
Masseroli, M., Chicco, D., Pinoli, P.: Probabilistic latent semantic analysis for prediction of gene ontology annotations. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2012)
Pinoli, P., Chicco, D., Masseroli, M.: Enhanced probabilistic latent semantic analysis with weighting schemes to predict genomic annotations. In: 2013 IEEE 13th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 1–4. IEEE (2013)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Bicego, M., Lovato, P., Oliboni, B., Perina, A.: Expression microarray classification using topic models. In: Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 1516–1520. ACM (2010)
Perina, A., Lovato, P., Murino, V., Bicego, M.: Biologically-aware latent Dirichlet allocation (BaLDA) for the classification of expression microarray. In: Dijkstra, T.M.H., Tsivtsivadze, E., Marchiori, E., Heskes, T. (eds.) PRIB 2010. LNCS, vol. 6282, pp. 230–241. Springer, Heidelberg (2010)
Pinoli, P., Chicco, D., Masseroli, M.: Latent Dirichlet allocation based on gibbs sampling for gene function prediction. In: Proceedings of the International Conference on Computational Intelligence in Bioinformatics and Computational Biology, pp. 1–7. IEEE Computer Society (2014)
Griffiths, T.: Gibbs Sampling in the Generative Model of Latent Dirichlet Allocation, Technical report, Stanford University (2002)
Casella, G., George, E.I.: Explaining the gibbs sampler. Am. Stat. 46, 167–174 (1992)
Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., Welling, M.: Fast collapsed gibbs sampling for latent Dirichlet allocation. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 569–577. ACM (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Domeniconi, G., Masseroli, M., Moro, G., Pinoli, P. (2015). Random Perturbations of Term Weighted Gene Ontology Annotations for Discovering Gene Unknown Functionalities. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2014. Communications in Computer and Information Science, vol 553. Springer, Cham. https://doi.org/10.1007/978-3-319-25840-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-25840-9_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25839-3
Online ISBN: 978-3-319-25840-9
eBook Packages: Computer ScienceComputer Science (R0)