Abstract
The Gene Ontology (GO) is a controlled vocabulary of concepts (called GO Terms) structured on three main ontologies. Each GO Term contains a description of a biological concept that is associated to one or more gene products through a process also known as annotation. Each annotation may be derived using different methods and an Evidence Code (EC) takes into account of this process. The importance and the specificity of both GO terms and annotations are often measured by their Information Content (IC). Mining annotations and annotated data may extract meaningful knowledge from a biological stand point. For instance, the analysis of these annotated data using association rules provides evidence for the co-occurrence of annotations. Nevertheless classical association rules algorithms do not take into account the source of annotation nor the importance yielding to the generation of candidate rules with low IC. This paper presents a methodology for extracting Weighted Association Rules from GO implemented in a tool named GO-WAR (Gene Ontology-based Weighted Association Rules). It is able to extract association rules with a high level of IC without loss of Support and Confidence from a dataset of annotated data. A case study on using of GO WAR on publicly available GO annotation dataset is used to demonstrate that our method outperforms current state of the art approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Guzzi, P.H., Mina, M., Guerra, C., Cannataro, M.: Semantic similarity analysis of protein data: assessment with biological features and issues. Briefings in Bioinformatics 13(5), 569–585 (2012)
Harris, M.A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., et al.: The gene ontology (go) database and informatics resource. Nucleic Acids Res. 32(Database issue), 258–261 (2004)
Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., Apweiler, R.: The gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology. Nucl. Acids Res. 32(suppl_1), D262–D266 (2004)
Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining a general survey and comparison. ACM Sigkdd Explorations Newsletter 2(1), 58–64 (2000)
Guzzi, P.H., Milano, M., Cannataro, M.: Mining association rules from gene ontology and protein networks: Promises and challenges. Procedia Computer Science 29, 1970–1980 (2014)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W., et al.: New algorithms for fast discovery of association rules. In: KDD, vol. 97, pp. 283–286 (1997)
Cannataro, M., Guzzi, P.H., Sarica, A.: Data mining and life sciences applications on the grid. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery 3(3), 216–238 (2013)
Faria, D., Schlicker, A., Pesquita, C., Bastos, H., Ferreira, A.E.N., Albrecht, M., Falco, A.O.: Mining go annotations for improving annotation consistency. PLoS One 7(7), e40519 (2012)
Carmona-Saez, P., Chagoyen, M., Rodriguez, A., Trelles, O., Carazo, J.M., Pascual-Montano, A.: Integrated analysis of gene expression by association rules discovery. BMC Bioinformatics 7(1), 54 (2006)
Ponzoni, I., Nueda, M.J., Tarazona, S., Götz, S., Montaner, D., Dussaut, J.S., Dopazo, J., Conesa, A.: Pathway network inference from gene expression data. BMC Systems Biology 8(2), 1–17 (2014)
Tew, C., Giraud-Carrier, C., Tanner, K., Burton, S.: Behavior-based clustering and analysis of interestingness measures for association rule mining. Data Mining and Knowledge Discovery 28(4), 1004–1045 (2014)
Benites, F., Simon, S., Sapozhnikova, E.: Mining rare associations between biological ontologies. PloS One 9(1), e84475 (2014)
Manda, P., Ozkan, S., Wang, H., McCarthy, F., Bridges, S.M.: Cross-ontology multi-level association rule mining in the gene ontology. PloS One 7(10), e47411 (2012)
Nguyen, C.D., Gardiner, K.J., Cios, K.J.: Protein annotation from protein interaction networks and gene ontology. Journal of Biomedical Informatics 44(5), 824–829 (2011)
Manda, P., McCarthy, F., Bridges, S.M.: Interestingness measures and strategies for mining multi-ontology multi-level association rules from gene ontology annotations for the discovery of new go relationships. Journal of Biomedical Informatics 46(5), 849–856 (2013)
Naulaerts, S., Meysman, P., Bittremieux, W., Vu, T.N., Vanden Berghe, W., Goethals, B.: Kris Laukens. A primer to frequent itemset mining for bioinformatics. Briefings in Bioinformatics (2013)
Huttenhower, C., Hibbs, M.A., Myers, C.L., Caudy, A.A., Hess, D.C., Troyanskaya, O.G.: The impact of incomplete knowledge on evaluation: an experimental benchmark for protein function prediction. Bioinformatics 25(18), 2404–2410 (2009)
Alterovitz, G., Xiang, M., Hill, D.P., Lomax, J., Liu, J., Cherkassky, M., Dreyfuss, J., Mungall, C., Harris, M.A., Dolan, M.E., et al.: Ontology engineering. Nature Biotechnology 28(2), 128–130 (2010)
Harispe, S., Sánchez, D., Ranwez, S., Janaqi, S., Montmain, J.: A framework for unifying ontology-based semantic similarity measures: A study in the biomedical domain. Journal of Biomedical Informatics 48, 38–53 (2014)
Wang, W., Yang, J., Yu, P.S.: Efficient mining of weighted association rules (war). In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 270–274. ACM (2000)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J., Bernstein, P.A. (eds.) 2000 ACM SIGMOD Intl. Conference on Management of Data, pp. 1–12. ACM Press, May 2000
Borgelt, C.: Efficient implementations of apriori and eclat. In: Proc. 1st IEEE ICDM Workshop on Frequent Item Set Mining Implementations (FIMI 2003, Melbourne, FL). CEUR Workshop Proceedings 90 (2003)
du Plessis, L., Skunca, N., Dessimoz, C.: The what, where, how and why of gene ontology–a primer for bioinformaticians. Briefings in Bioinformatics 12(6), 723–735 (2011)
Sánchez, D., Batet, M., Isern, D.: Ontology-based information content computation. Knowledge-Based Systems 24(2), 297–303 (2011)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. SIGMOD Rec. 29(2), 1–12 (2000)
Hahsler, M., Grün, B., Hornik, K.: arules: Mining association rules and frequent itemsets (2006). http://cran.r-project.org/ , r package version. SIGKDD Explorations 2, 0–4 (2007)
Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, S.J.J., Hotz, H.-R.R., Ceric, G., Forslund, K., Eddy, S.R., Sonnhammer, E.L., Bateman, A.: The pfam protein families database. Nucleic Acids Research 36(database issue), D281–D288 (2008)
Cho, Y.-R., Mina, M., Lu, Y., Kwon, N., Guzzi, P.H.: M-finder: Uncovering functionally associated proteins from interactome data integrated with go annotations. Proteome Sci. 11(suppl. 1), S3 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Agapito, G., Cannataro, M., Guzzi, P.H., Milano, M. (2015). GO-WAR: A Tool for Mining Weighted Association Rules from Gene Ontology Annotations. In: DI Serio, C., Liò, P., Nonis, A., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2014. Lecture Notes in Computer Science(), vol 8623. Springer, Cham. https://doi.org/10.1007/978-3-319-24462-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-24462-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24461-7
Online ISBN: 978-3-319-24462-4
eBook Packages: Computer ScienceComputer Science (R0)