Skip to main content

Exploiting Ontology Structure and Patterns of Annotation to Mine Significant Associations between Pairs of Controlled Vocabulary Terms

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5109))

Abstract

There is significant knowledge captured through annotations on the life sciences Web. In past research, we developed a methodology of support and confidence metrics from association rule mining, to mine the association bridge (of termlinks) between pairs of controlled vocabulary (CV) terms across two ontologies. Our (naive) approach did not exploit the following: implicit knowledge captured via the hierarchical is-a structure of ontologies, and patterns of annotation in datasets that may impact the distribution of parent/child or sibling CV terms. In this research, we consider this knowledge. We aggregate termlinks over the siblings of a parent CV term and use them as additional evidence to boost support and confidence scores in the associations of the parent CV term. A weight factor (α) reflects the contribution from the child CV terms; its value can be varied to reflect a variance of confidence values among the sibling CV terms of some parent CV term. We illustrate the benefits of exploiting this knowledge through experimental evaluation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Maglott, D.R., Ostell, J., Pruitt, K.D., Tatusova, T.: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research 35, D26–D31 (2007) (Database issue)

    Article  Google Scholar 

  2. Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A., McKusick, V.A.: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research 33, D514–D517 (2005) (Database issue)

    Article  Google Scholar 

  3. Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., Feolo, M., Geer, L.Y., Helmberg, W., Kapustin, Y., Khovayko, O., Landsman, D., Lipman, D.J., Madden, T.L., Maglott, D.R., Miller, V., Ostell, J., Pruitt, K.D., Schuler, G.D., Shumway, M., Sequeira, E., Sherry, S.T., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov, R.L., Tatusova, T.A., Wagner, L., Yaschenko, E.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 36, D13–D21 (2008) (Database issue)

    Article  Google Scholar 

  4. Wang, A.Y., Sable, J.H., Spackman, K.A.: The SNOMED Clinical Terms development process: refinement and analysis of content. In: AMIA 2002 Annual Symposium, San Antonio, Texas, USA, November 9-13, 2002, pp. 845–849 (2002)

    Google Scholar 

  5. Gene Ontology Consortium: The Gene Ontology (GO) project in 2006. Nucleic Acids Research 34, 322–326 (2006) (Database issue)

    Google Scholar 

  6. Savage, A.: Changes in MeSH data structure. Technical Report (313), NLM Technical Bulletin (March-April 2000)

    Google Scholar 

  7. Lee, W.J., Raschid, L., Srinivasan, P., Shah, N., Rubin, D., Noy, N.: Using annotations from controlled vocabularies to find meaningful associations. In: Cohen-Boulakia, S., Tannen, V. (eds.) DILS 2007. LNCS (LNBI), vol. 4544, pp. 247–263. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  8. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. SIGMOD Record 22(2), 207–216 (1993)

    Article  Google Scholar 

  9. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: Proceeding of the 20th International Conference on Very Large Data Bases (VLDB 1994), San Francisco, CA, USA, September 1994, pp. 487–499 (1994)

    Google Scholar 

  10. Day, C.P.: Personal communiction (2007)

    Google Scholar 

  11. Tseng, M.C., Lin, W.Y., Jeng, R.: Incremental maintenance of ontology-exploiting association rules. In: International Conference on Machine Learning and Cybernetics, Hong Kong, China, August 19-22, 2007, pp. 2280–2285 (2007)

    Google Scholar 

  12. Han, J., Fu, Y.: Discovery of multiple-level association rules from large databases. In: Proceeding of the 21th International Conference on Very Large Data Bases (VLDB 1995), Zürich, Switzerland, September 11-15, 1995, pp. 420–431 (1995)

    Google Scholar 

  13. Jiang, T., Tan, A.H., Wang, K.: Mining generalized associations of semantic relations from textual Web content. IEEE Transactions on Knowledge and Data Engineering 19(2), 164–179 (2007)

    Article  Google Scholar 

  14. Cheung, D.W.L., Ng, V.T.Y., Tam, B.W.: Maintenance of discovered knowledge: a case in multi-level association rules. In: Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), Portland, Oregon, USA, pp. 307–310 (1996)

    Google Scholar 

  15. Srikant, R., Agrawal, R.: Mining generalized association rules. Future Generation Computer Systems 13(2-3), 161–180 (1997)

    Article  Google Scholar 

  16. Wang, X., Ni, Z., Cao, H.: Research on association rules mining based-on ontology in e-commerce. In: International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM 2007), Shanghai, China, September 2007, pp. 3544–3547 (2007)

    Google Scholar 

  17. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1–7), 107–117 (1998)

    Article  Google Scholar 

  18. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)

    Google Scholar 

  19. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  20. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal: Very Large Data Bases 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  21. Hopcroft, J.E., Karp, R.M.: An n5/2 algorithm for maximum matchings in bipartite graphs. SIAM J. Comput. 2(4), 225–231 (1973)

    Article  MATH  MathSciNet  Google Scholar 

  22. Yu, C., Zavaljevski, N., Desai, V., Johnson, S., Stevens, F.J., Reifman, J.: The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation. BMC Bioinformatics 9(52) (January 2008)

    Google Scholar 

  23. Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research 32, D267–D270 (2004) (Database issue)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Amos Bairoch Sarah Cohen-Boulakia Christine Froidevaux

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, WJ., Raschid, L., Sayyadi, H., Srinivasan, P. (2008). Exploiting Ontology Structure and Patterns of Annotation to Mine Significant Associations between Pairs of Controlled Vocabulary Terms. In: Bairoch, A., Cohen-Boulakia, S., Froidevaux, C. (eds) Data Integration in the Life Sciences. DILS 2008. Lecture Notes in Computer Science(), vol 5109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69828-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69828-9_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69827-2

  • Online ISBN: 978-3-540-69828-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics