Exploiting Ontology Structure and Patterns of Annotation to Mine Significant Associations between Pairs of Controlled Vocabulary Terms

Lee, Woei-Jyh; Raschid, Louiqa; Sayyadi, Hassan; Srinivasan, Padmini

doi:10.1007/978-3-540-69828-9_6

Exploiting Ontology Structure and Patterns of Annotation to Mine Significant Associations between Pairs of Controlled Vocabulary Terms

Woei-Jyh Lee¹,
Louiqa Raschid¹,
Hassan Sayyadi¹ &
…
Padmini Srinivasan²

Conference paper

855 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5109))

Abstract

There is significant knowledge captured through annotations on the life sciences Web. In past research, we developed a methodology of support and confidence metrics from association rule mining, to mine the association bridge (of termlinks) between pairs of controlled vocabulary (CV) terms across two ontologies. Our (naive) approach did not exploit the following: implicit knowledge captured via the hierarchical is-a structure of ontologies, and patterns of annotation in datasets that may impact the distribution of parent/child or sibling CV terms. In this research, we consider this knowledge. We aggregate termlinks over the siblings of a parent CV term and use them as additional evidence to boost support and confidence scores in the associations of the parent CV term. A weight factor (α) reflects the contribution from the child CV terms; its value can be varied to reflect a variance of confidence values among the sibling CV terms of some parent CV term. We illustrate the benefits of exploiting this knowledge through experimental evaluation.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Maglott, D.R., Ostell, J., Pruitt, K.D., Tatusova, T.: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research 35, D26–D31 (2007) (Database issue)
Article Google Scholar
Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A., McKusick, V.A.: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research 33, D514–D517 (2005) (Database issue)
Article Google Scholar
Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., Feolo, M., Geer, L.Y., Helmberg, W., Kapustin, Y., Khovayko, O., Landsman, D., Lipman, D.J., Madden, T.L., Maglott, D.R., Miller, V., Ostell, J., Pruitt, K.D., Schuler, G.D., Shumway, M., Sequeira, E., Sherry, S.T., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov, R.L., Tatusova, T.A., Wagner, L., Yaschenko, E.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 36, D13–D21 (2008) (Database issue)
Article Google Scholar
Wang, A.Y., Sable, J.H., Spackman, K.A.: The SNOMED Clinical Terms development process: refinement and analysis of content. In: AMIA 2002 Annual Symposium, San Antonio, Texas, USA, November 9-13, 2002, pp. 845–849 (2002)
Google Scholar
Gene Ontology Consortium: The Gene Ontology (GO) project in 2006. Nucleic Acids Research 34, 322–326 (2006) (Database issue)
Google Scholar
Savage, A.: Changes in MeSH data structure. Technical Report (313), NLM Technical Bulletin (March-April 2000)
Google Scholar
Lee, W.J., Raschid, L., Srinivasan, P., Shah, N., Rubin, D., Noy, N.: Using annotations from controlled vocabularies to find meaningful associations. In: Cohen-Boulakia, S., Tannen, V. (eds.) DILS 2007. LNCS (LNBI), vol. 4544, pp. 247–263. Springer, Heidelberg (2007)
Chapter Google Scholar
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. SIGMOD Record 22(2), 207–216 (1993)
Article Google Scholar
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: Proceeding of the 20th International Conference on Very Large Data Bases (VLDB 1994), San Francisco, CA, USA, September 1994, pp. 487–499 (1994)
Google Scholar
Day, C.P.: Personal communiction (2007)
Google Scholar
Tseng, M.C., Lin, W.Y., Jeng, R.: Incremental maintenance of ontology-exploiting association rules. In: International Conference on Machine Learning and Cybernetics, Hong Kong, China, August 19-22, 2007, pp. 2280–2285 (2007)
Google Scholar
Han, J., Fu, Y.: Discovery of multiple-level association rules from large databases. In: Proceeding of the 21th International Conference on Very Large Data Bases (VLDB 1995), Zürich, Switzerland, September 11-15, 1995, pp. 420–431 (1995)
Google Scholar
Jiang, T., Tan, A.H., Wang, K.: Mining generalized associations of semantic relations from textual Web content. IEEE Transactions on Knowledge and Data Engineering 19(2), 164–179 (2007)
Article Google Scholar
Cheung, D.W.L., Ng, V.T.Y., Tam, B.W.: Maintenance of discovered knowledge: a case in multi-level association rules. In: Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), Portland, Oregon, USA, pp. 307–310 (1996)
Google Scholar
Srikant, R., Agrawal, R.: Mining generalized association rules. Future Generation Computer Systems 13(2-3), 161–180 (1997)
Article Google Scholar
Wang, X., Ni, Z., Cao, H.: Research on association rules mining based-on ontology in e-commerce. In: International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM 2007), Shanghai, China, September 2007, pp. 3544–3547 (2007)
Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1–7), 107–117 (1998)
Article Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)
Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Article MATH MathSciNet Google Scholar
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal: Very Large Data Bases 10(4), 334–350 (2001)
Article MATH Google Scholar
Hopcroft, J.E., Karp, R.M.: An n^5/2 algorithm for maximum matchings in bipartite graphs. SIAM J. Comput. 2(4), 225–231 (1973)
Article MATH MathSciNet Google Scholar
Yu, C., Zavaljevski, N., Desai, V., Johnson, S., Stevens, F.J., Reifman, J.: The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation. BMC Bioinformatics 9(52) (January 2008)
Google Scholar
Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research 32, D267–D270 (2004) (Database issue)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Maryland, College Park, MD 20742, USA
Woei-Jyh Lee, Louiqa Raschid & Hassan Sayyadi
The University of Iowa, Iowa City, IA 52242, USA
Padmini Srinivasan

Authors

Woei-Jyh Lee
View author publications
You can also search for this author in PubMed Google Scholar
Louiqa Raschid
View author publications
You can also search for this author in PubMed Google Scholar
Hassan Sayyadi
View author publications
You can also search for this author in PubMed Google Scholar
Padmini Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Amos Bairoch Sarah Cohen-Boulakia Christine Froidevaux

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, WJ., Raschid, L., Sayyadi, H., Srinivasan, P. (2008). Exploiting Ontology Structure and Patterns of Annotation to Mine Significant Associations between Pairs of Controlled Vocabulary Terms. In: Bairoch, A., Cohen-Boulakia, S., Froidevaux, C. (eds) Data Integration in the Life Sciences. DILS 2008. Lecture Notes in Computer Science(), vol 5109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69828-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-69828-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69827-2
Online ISBN: 978-3-540-69828-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics