Combining Web-Based Searching with Latent Semantic Analysis to Discover Similarity Between Phrases

  • Sean M. Falconer
  • Dmitri Maslov
  • Margaret-Anne Storey
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4275)


Determining semantic similarity between words, concepts and phrases is important in many areas within Artificial Intelligence. This includes the general areas of information retrieval, data mining, and natural language processing. Existing approaches have primarily focused on noun to noun synonym comparison. We propose a new technique for the comparison of general expressions that combines web searching with Latent Semantic Analysis. This technique is more general than previous approaches, as it is able to match similarities between multi-word expressions, abbreviations, and alpha-numeric phrases. Consequently, it can be applied to more complex comparison problems such as ontology alignment.


Singular Value Decomposition Semantic Similarity Product Rule Malignant Peripheral Nerve Sheath Tumor Pointwise Mutual Information 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    McGuinness, D., Rice, J., Fikes, R., Wilder, S.: An environment for merging and testing large ontologies (2000)Google Scholar
  2. 2.
    Doan, A., Domingos, P., Halevy, A.: Learning to match the schemas of data sources: A multistrategy approach. Machine Learning 50(3) Google Scholar
  3. 3.
    Dumais, S.: Enhancing performance in latent semantic indexing. Technical report (1990)Google Scholar
  4. 4.
    Lewis, M. (ed.): Readers digest. 158(932, 934, 935, 936, 937, 938, 939, 940), 159(944, 948) (2000-2001)Google Scholar
  5. 5.
    Ehrig, M., Staab, S.: QOM – quick ontology mapping. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 683–697. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Euzenat, J., Bach, T.L., Barrasa, P.B.J., De Bo, J., Dieng-Kuntz, R., Ehrig, M., Hauswirth, M., Jarrar, M., Lara, R., Maynard, D., Napoli, A., Stamou, G., Stuckenschmidt, H., Shvaiko, P., Tessaris, S., van Acker, S., Zaihrayeu, I.: State of the art on ontology. deliverable d2.2.3 (2004)Google Scholar
  7. 7.
    Gruber, T.R.: A translation approach to portable ontology specifications. Knowledge Acquisition 5(2), 23–28 (1993)CrossRefGoogle Scholar
  8. 8.
    Higgins, D.: Which statistics reflect semantics? rethinking synonymy and word similarity. In: International Conference on Linguistic Evidence (2004)Google Scholar
  9. 9.
    Jarmasz, M., Szpakowicz, S.: Roget’s thesaurus and semantic similarity. In: Proceedings of Conference on Recent Advances in Natural Language Processing (RANLP 2003), September 2003, pp. 212–219 (2003)Google Scholar
  10. 10.
    Landauer, T.K., Dumais, S.T.: A solution to plato’s problem: The latent semantic analysis: Theory of the acquisition, induction, and representation of knowledge. Psychological Review 104, 211–240 (1997)CrossRefGoogle Scholar
  11. 11.
    Levenstein, I.: Binary codes capable of correction deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)MathSciNetGoogle Scholar
  12. 12.
    Mork, P., Bernstein, P.A.: Adapting a generic match algorithm to align ontologies of human anatomy. In: 20th International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos (2004)Google Scholar
  13. 13.
    Nash, J.C.: Compact Numerical Methods for Computers: Linear Algebra and Function Minimization, 2nd edn. Adam Hilger, Bristol (1990)Google Scholar
  14. 14.
    Noy, N., Musen, M.: The prompt suite: Interactive tools for ontology merging and mapping. Technical reportGoogle Scholar
  15. 15.
    Oates, T., Bhat, V., Shanbhag, V., Nicholas, C.: Using latent semantic analysis to find different names for the same entity in free text. In: Proceedings of the 4th international workshop on Web information and data management (2002)Google Scholar
  16. 16.
    Pedersen, T., Patwardhan, S., Michelizzi, J.: Wordnet:similarity - measuring the relatedness of concepts (2004)Google Scholar
  17. 17.
    Porter, M.F.: Java implementation of porter’s algorithm (2000)Google Scholar
  18. 18.
    Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice-Hall, Upper Saddle River (1995)MATHGoogle Scholar
  19. 19.
    Shah, N.H., Rubin, D.L., Supekar, K.S., Musen, M.A.: Ontology-based annotation and query of tissue microarray data. In: AMIA 2006 (under review) (2006)Google Scholar
  20. 20.
    Sullivan, D.: Nielsen netratings search engine ratings (2006), (Last visited: 08-15-2006)
  21. 21.
    Tatsuki, D.: Basic 2000 words - synonym match 1 (1998)Google Scholar
  22. 22.
    Throop, D.R.: Reconciler: Matching terse english phrases (2004)Google Scholar
  23. 23.
    Turney, P.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS, vol. 2167, pp. 491–502. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  24. 24.
    Turney, P., Littman, M.L., Bigham, J., Shnayder, V.: Combining independent modules to solve multiple-choice synonym and analogy problems. In: Proceedings of International Conference on Recent Advances in Natural Language Processing (RANLP 2003) (September 2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sean M. Falconer
    • 1
  • Dmitri Maslov
    • 2
  • Margaret-Anne Storey
    • 1
  1. 1.University of VictoriaVictoriaCanada
  2. 2.University of WaterlooWaterlooCanada

Personalised recommendations