Relevance-Ranked Domain-Specific Synonym Discovery

  • Andrew Yates
  • Nazli Goharian
  • Ophir Frieder
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8416)


Interest in domain-specific search is growing rapidly, creating a need for domain-specific synonym discovery. The best-performing methods for this task rely on query logs and are thus difficult to use in many circumstances. We propose a method for domain-specific synonym discovery that requires only a domain-specific corpus. Our method substantially outperforms previously proposed methods in realistic evaluations. Due to the difficulty of identifying pairs of synonyms from among a large number of terms, methods have traditionally been evaluated by their ability to choose a target term’s synonym from a small set of candidate terms. We generalize this evaluation by evaluating methods’ performance when required to choose a target term’s synonym from progressively larger sets of candidate terms. We approach synonym discovery as a ranking problem and evaluate the methods’ ability to rank a target term’s candidate synonyms. Our results illustrate that while our proposed method substantially outperforms existing methods, synonym discovery is still a difficult task to automate and is best coupled with a human moderator.


Synonym discovery thesaurus construction domain-specific search 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alfonseca, E., et al.: Using context-window overlapping in synonym discovery and ontology extension. In: RANLP 2005 (2005)Google Scholar
  2. 2.
    Azzopardi, L., et al.: Search system requirements of patent analysts. In: SIGIR 2010 (2010)Google Scholar
  3. 3.
    Bollegala, D.: Measuring Semantic Similarity between Words Using Web Search Engines. In: WWW 2007 (2007)Google Scholar
  4. 4.
    Briscoe, T., et al.: The second release of the RASP system. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions (2006)Google Scholar
  5. 5.
    Brody, S., Lapata, M.: Good Neighbors Make Good Senses: Exploiting Distributional Similarity for Unsupervised WSD. In: COLING 2008 (2008)Google Scholar
  6. 6.
    Carrell, D., Baldwin, D.: PS1-15: A Method for Discovering Variant Spellings of Terms of Interest in Clinical Text. Clin. Med. Res. 8, 3–4 (2010)Google Scholar
  7. 7.
    Cartright, M.-A., et al.: Intentions and attention in exploratory health search. In: SIGIR 2011, p. 65 (2011)Google Scholar
  8. 8.
    Chen, L., et al.: Statistical relationship determination in automatic thesaurus construction. In: CIKM 2005 (2005)Google Scholar
  9. 9.
    Clements, M., et al.: Detecting synonyms in social tagging systems to improve content retrieval. In: SIGIR 2008 (2008)Google Scholar
  10. 10.
    Evans, D.A., et al.: E-discovery. In: CIKM 2008 (2008)Google Scholar
  11. 11.
    Ghosh, K.: Improving e-discovery using information retrieval. In: SIGIR 2012 (2012)Google Scholar
  12. 12.
    Grigonytė, G., et al.: Paraphrase alignment for synonym evidence discovery. In: COLING 2010 (2010)Google Scholar
  13. 13.
    Hagiwara, M.: A Supervised Learning Approach to Automatic Synonym Identification based on Distributional Features. In: HLT-SRWS 2008 (2008)Google Scholar
  14. 14.
    Hanbury, A.: Medical information retrieval. In: SIGIR 2012 (2012)Google Scholar
  15. 15.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: COLING 1992, p. 539 (1992)Google Scholar
  16. 16.
    Huang, J.X., et al.: Medical search and classification tools for recommendation. In: SIGIR 2010 (2010)Google Scholar
  17. 17.
    Joachims, T.: Optimizing search engines using clickthrough data. In: KDD 2002 (2002)Google Scholar
  18. 18.
    Kanerva, P., et al.: Random indexing of text samples for latent semantic analysis. In: CogSci 2000 (2000)Google Scholar
  19. 19.
    Lewis, D.D.: Information retrieval for e-discovery. In: SIGIR 2010 (2010)Google Scholar
  20. 20.
    Lin, D.: Automatic retrieval and clustering of similar words. In: ACL/COLING 1998 (1998)Google Scholar
  21. 21.
    Lupu, M.: Patent information retrieval. In: SIGIR 2012 (2012)Google Scholar
  22. 22.
    McCrae, J., Collier, N.: Synonym set extraction from the biomedical literature by lexical pattern discovery. BMC Bioinformatics 9 (2008)Google Scholar
  23. 23.
    Nanba, H., et al.: Automatic Translation of Scholarly Terms into Patent Terms Using Synonym Extraction Techniques. In: LREC 2012 (2012)Google Scholar
  24. 24.
    Pantel, P., et al.: Web-Scale Distributional Similarity and Entity Set Expansion. In: EMNLP 2009 (2009)Google Scholar
  25. 25.
    Van Der Las, L.: Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity. In: COLING-ACL 2006 (2006)Google Scholar
  26. 26.
    Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)CrossRefGoogle Scholar
  27. 27.
    Rybiński, H., Kryszkiewicz, M., Protaziuk, G., Jakubowski, A., Delteil, A.: Discovering Synonyms Based on Frequent Termsets. In: Kryszkiewicz, M., Peters, J.F., Rybiński, H., Skowron, A. (eds.) RSEISP 2007. LNCS (LNAI), vol. 4585, pp. 516–525. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  28. 28.
    Sahlgren, M., Karlgren, J.: Terminology mining in social media. In: CIKM 2009 (2009)Google Scholar
  29. 29.
    Solskinnsbakk, G., Gulla, J.A.: Mining tag similarity in folksonomies. In: SMUC 2011 (2011)Google Scholar
  30. 30.
    Strzalkowski, T.: Building a lexical domain map from text corpora. In: COLING 1994 (1994)Google Scholar
  31. 31.
    Terra, E., Clarke, C.L.A.: Frequency estimates for statistical word similarity measures. In: HLT-NAACL 2003 (2003)Google Scholar
  32. 32.
    Turney, P.D.: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)Google Scholar
  33. 33.
    Wei, X., et al.: Context sensitive synonym discovery for web search queries. In: CIKM 2009 (2009)Google Scholar
  34. 34.
    Yates, A., Goharian, N.: ADRTrace: Detecting Expected and Unexpected Adverse Drug Reactions from User Reviews on Social Media Sites. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 816–819. Springer, Heidelberg (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Andrew Yates
    • 1
  • Nazli Goharian
    • 1
  • Ophir Frieder
    • 1
  1. 1.Information Retrieval LabGeorgetown UniversityUSA

Personalised recommendations