Weighted Pseudo-distances for Categorization in Semantic Hierarchies
Ontologies, taxonomies, and other semantic hierarchies are increasingly necessary for organizing large quantities of data. We continue our development of knowledge discovery techniques based on combinatorial algorithms rooted in order theory by aiming to supplement the pseudo-distances previously developed as structural measures of vertical height in poset-based ontologies with quantitative measures of vertical distance based on additional statistical information. In this way, we seek to accommodate weighting of different portions of the underlying ontology according to this external information source. We also wish to improve on the deficiencies of existing such measures, in particular Resnik’s measure of semantic similarity in lexical databases such as Wordnet. We begin by recalling and developing some basic concepts for ordered data objects, including our pseudo-distances and the operation of probability distributions as weights on posets. We then discuss and critique Resnik’s measure before introducing our own sense of links weights and weighted normalized pseudo-distances among comparable nodes.
KeywordsGene Ontology Directed Acyclic Graph Semantic Similarity Information Gain Link Weight
Unable to display preview. Download preview PDF.
- 2.Bodenreider, O., Mitchell, J.A., McCray, A.T.: Evaluation of the UMLS As a Terminology and Knowledge Resource for Biomedical Informatics. In: AMIA 2002 Annual Symposium, pp. 61–65 (2002)Google Scholar
- 3.Davis, A.R.: Types and Constraints for Lexical Semantics and Linking, Cambridge, UP (2000)Google Scholar
- 5.Gene Ontology Consortium: Gene Ontology: Tool For the Unification of Biology. Nature Genetics 25(1), 25–29 (2000)Google Scholar
- 8.Joslyn, C., Oliverira, J., Scherrer, C.: Order Theoretical Knowledge Discovery: A White Paper, LAUR = 04-5812 (2004), ftp://ftp.c3.lanl.gov/pub/users/joslyn/white.pdf
- 9.Joslyn, C., Cohn, J.D., Verspoor, K.M., Mniszewski, S.M.: Automating Ontological Function Annotation: Towards a Common Methodological Framework. Submitted to 2005 Bio-Ontologies Meeting, ISMB 2005 (2005)Google Scholar
- 12.Knoblock, Todd, B., Rehof, J.: Type Elaboration and Subtype Completion for Java Bytecode. In: Proc. 27th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (2000)Google Scholar
- 15.Resnik, P.: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Int. Joint Conf. on Artificial Intelligence, pp. 448–452. Morgan Kaufmann, San Francisco (1995)Google Scholar
- 17.Verspoor, K., Cohn, J., Joslyn, C., Mniszewski, S.M., Rechtsteiner, A., Rocha, L.M., Simas, T.: Protein Annotation as Term Categorization in the Gene Ontology Using Word Proximity Networks. BMC Bioinformatics 6(suppl. 1) (2004)Google Scholar