Advertisement

Finding Cross Genome Patterns in Annotation Graphs

  • Joseph Benik
  • Caren Chang
  • Louiqa Raschid
  • Maria-Esther Vidal
  • Guillermo Palma
  • Andreas Thor
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7348)

Abstract

Annotation graph datasets are a natural representation of scientific knowledge. They are common in the life sciences where concepts such as genes and proteins are annotated with controlled vocabulary terms from ontologies. Scientists are interested in analyzing or mining these annotations, in synergy with the literature, to discover patterns. Further, annotated datasets provide an avenue for scientists to explore shared annotations across genomes to support cross genome discovery. We present a tool, PAnG (Patterns in Annotation Graphs), that is based on a complementary methodology of graph summarization and dense subgraphs. The elements of a graph summary correspond to a pattern and its visualization can provide an explanation of the underlying knowledge. We present and analyze two distance metrics to identify related concepts in ontologies. We present preliminary results using groups of Arabidopsis and C. elegans genes to illustrate the potential benefits of cross genome pattern discovery.

Keywords

Link Prediction Distance Metrics Gene CRY2 Annotation Graph Graph Summary 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Anderson, P., Thor, A., Benik, J., Raschid, L., Vidal, M.E.: Pang - finding patterns in annotation graphs. In: Proceedings of the ACM Conference on the Management of Data (SIGMOD) (2012)Google Scholar
  2. 2.
    Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene Ontology: tool for the unification of biology. Natgenet 25(1), 25–29 (2000)Google Scholar
  3. 3.
    Bender, M.A., Farach-Colton, M., Pemmasani, G., Skiena, S., Sumazin, P.: Lowest common ancestors in trees and directed acyclic graphs. Journal of Algorithms 57(2), 75–94 (2005)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Bock, K., Honys, D., Ward, J., Padmanaban, S., Nawrocki, E., Hirschi, K., Twell, D., Sze, H.: Integrating membrane transport with male gametophyte development and function through transcriptomics. Plant Physiology 140(4), 1151–1168 (2006)CrossRefGoogle Scholar
  5. 5.
    Charikar, M.: Greedy Approximation Algorithms for Finding Dense Components in a Graph. In: Jansen, K., Khuller, S. (eds.) APPROX 2000. LNCS, vol. 1913, pp. 84–95. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  6. 6.
    Garcia-Hernandez, M., Berardini, T.Z., Chen, G., Crist, D., Doyle, A., Huala, E., Knee, E., Lambrecht, M., Miller, N., Mueller, L.A., Mundodi, S., Reiser, L., Rhee, S.Y., Scholl, R., Tacklind, J., Weems, D.C., Wu, Y., Xu, I., Yoo, D., Yoon, J., Zhang, P.: TAIR: a resource for integrated Arabidopsis data. Functional and Integrative Genomics 2(6), 239 (2002)CrossRefGoogle Scholar
  7. 7.
    Gene Ontology Consortium: The gene ontology project in 2008. Nucleic Acids Res. 36(Database Issue), D440–D444 (2008)Google Scholar
  8. 8.
    Goldberg, A.V.: Finding a maximum density subgraph. Tech. Rep. UCB/CSD-84-171, EECS Department, University of California, Berkeley (1984), http://www.eecs.berkeley.edu/Pubs/TechRpts/1984/5956.html
  9. 9.
  10. 10.
  11. 11.
    Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. CoRR cmp-lg/9709008 (1997)Google Scholar
  12. 12.
    Khuller, S., Saha, B.: On Finding Dense Subgraphs. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009. LNCS, vol. 5555, pp. 597–608. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Lawler, E.: Combinatorial optimization - networks and matroids. Holt, Rinehart and Winston, New York (1976)zbMATHGoogle Scholar
  14. 14.
    Lin, D.: An information-theoretic definition of similarity. In: ICML, pp. 296–304 (1998)Google Scholar
  15. 15.
    Navlakha, S., Rastogi, R., Shrivastava, N.: Graph summarization with bounded error. In: Proc. of Conference on Management of Data (SIGMOD) (2008)Google Scholar
  16. 16.
    Pekar, V., Staab, S.: Taxonomy learning - factoring the structure of a taxonomy into a semantic classification decision. In: COLING (2002)Google Scholar
  17. 17.
    Pesquita, C., Faria, D., Falcão, A., Lord, P., Couto, F.: Semantic similarity in biomedical ontologies. PLoS Computational Biology 5(7), e1000443 (2009)Google Scholar
  18. 18.
  19. 19.
    Reiser, L., Rhee, S.Y.: Using The Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes. Current Protocols in Bioinformatics, JWS (2005)Google Scholar
  20. 20.
    Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI, pp. 448–453 (1995)Google Scholar
  21. 21.
    Rhee, S.Y., Beavis, W., Berardini, T.Z., Chen, G., Dixon, D., Doyle, A., Garcia-Hernandez, M., Huala, E., Lander, G., Montoya, M., Miller, N., Mueller, L.A., Mundodi, S., Reiser, L., Tacklind, J., Weems, D.C., Wu, Y., Xu, I., Yoo, D., Yoon, J., Zhang, P.: The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to arabidopsis biology, research materials and community. Nucleic Acids Res. 31(1), 224–228 (2003)CrossRefGoogle Scholar
  22. 22.
    Saha, B., Hoch, A., Khuller, S., Raschid, L., Zhang, X.-N.: Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 456–472. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  23. 23.
    Sze, H., Chang, C., Raschid, L.: Go and po annotations for cation/h+ exchangers. Personal Communication (2011)Google Scholar
  24. 24.
    Sze, H., Padmanaban, S., Cellier, F., Honys, D., Cheng, N., Bock, K., Conejero, G., Li, X., Twell, D., Ward, J., Hirschi, K.: Expression pattern of a novel gene family, atchx, highlights their potential roles in osmotic adjustment and k+ homeostasis in pollen biology. Plant Physiology 1(136), 2532–2547 (2004)CrossRefGoogle Scholar
  25. 25.
    List of arabidopsis thaliana transporter genes on sze lab page, http://www.clfs.umd.edu/CBMG/faculty/sze/lab/AtTransporters.html
  26. 26.
    The Plant Ontology Consortium: The plant ontology consortium and plant ontologies. Comparative and Functional Genomics 3(2), 137–142 (2002), http://dx.doi.org/10.1002/cfg.154 Google Scholar
  27. 27.
    Thor, A., Anderson, P., Raschid, L., Navlakha, S., Saha, B., Khuller, S., Zhang, X.-N.: Link Prediction for Annotation Graphs Using Graph Summarization. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 714–729. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  28. 28.
    Wang, J.Z., Du, Z., Payattakool, R., Yu, P.S., Chen, C.F.: A new method to measure the semantic similarity of go terms. Bioinformatics 23(10), 1274–1281 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Joseph Benik
    • 1
  • Caren Chang
    • 1
  • Louiqa Raschid
    • 1
  • Maria-Esther Vidal
    • 2
  • Guillermo Palma
    • 2
  • Andreas Thor
    • 3
  1. 1.University of MarylandUSA
  2. 2.Universidad Simón BolívarVenezuela
  3. 3.University of LeipzigGermany

Personalised recommendations