Context-Sensitive Ranking Using Cross-Domain Knowledge for Chemical Digital Libraries

  • Benjamin Köhncke
  • Wolf-Tilo Balke
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8092)

Abstract

Today, entity-centric searches are common tasks for information gathering. But, due to the huge amount of available information the entity itself is often not sufficient for finding suitable results. Users are usually searching for entities in a specific search context which is important for their relevance assessment. Therefore, for digital library providers it is inevitable to also consider this search context to allow for high quality retrieval. In this paper we present an approach enabling context searches for chemical entities. Chemical entities play a major role in many specific domains, ranging from biomedical over biology to material science. Since most of the domain specific documents lack of suitable context annotations, we present a similarity measure using cross-domain knowledge gathered from Wikipedia. We show that structure-based similarity measures are not suitable for chemical context searches and introduce a similarity measure combining entity- and context similarity. Our experiments show that our measure outperforms structure-based similarity measures for chemical entities. We compare against two baseline approaches: a Boolean retrieval model and a model using statistical query expansion for the context term. We compared the measures computing mean average precision (MAP) using a set of queries and manual relevance assessments from domain experts. We were able to get a total increase of the MAP of 30% (from 31% to 61%). Furthermore, we show a personalized retrieval system which leads to another increase of around 10%.

Keywords

Chemical Digital Libraries Personalization Context Search 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Corbett, P., Murray-Rust, P.: High-throughput identification of chemistry in life science texts. In: Berthold, M., Glen, R.C., Fischer, I. (eds.) CompLife 2006. LNCS (LNBI), vol. 4216, pp. 107–118. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Sun, B., et al.: Identifying, Indexing, and Ranking Chemical Formulae and Chemical Names in Digital Documents. ACM Transactions on Information Systems 29 (2011)Google Scholar
  3. 3.
    Tönnies, S., Köhncke, B., Koepler, O., Balke, W.-T.: Exposing the Hidden Web for Chemical Digital Libraries. In: Proc. of the Joint Conf. on Digital Libraries (JCDL) (2010)Google Scholar
  4. 4.
    Tönnies, S., et al.: Taking Chemistry to the Task – Personalized Queries for Chemical Digital Libraries. In: Proc. of the Joint Conf. on Digital Libraries (JCDL) (2011)Google Scholar
  5. 5.
    Kraft, R., Zien, J.: Mining anchor text for query refinement. In: Proc. of the Int. Conf. on World Wide Web (WWW) (2004)Google Scholar
  6. 6.
    Kraft, R., Chang, C.C., Maghoul, F., Kumar, R.: Searching with context. In: Proc. of the Int. Conf. on World Wide Web (WWW) (2006)Google Scholar
  7. 7.
    Jiang, D., et al.: Context-aware search personalization with concept preference. In: Proc. of Conf. on Information and Knowledge Management (CIKM) (2011)Google Scholar
  8. 8.
    Haveliwala, T.: Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering 15 (2003)Google Scholar
  9. 9.
    Chen, L., Papakonstantinou, Y.: Context-sensitive ranking for document retrieval. In: Proc. of ACM SIGMOD Conf. (2011)Google Scholar
  10. 10.
    Degtyarenko, K., et al.: ChEBI: A database and ontology for chemical entities of biological interest. Nucleic Acids Research 36, Database issue (2008)Google Scholar
  11. 11.
    Köhncke, B., Balke, W.-T.: Using Wikipedia categories for compact representations of chemical documents. In: Proc. of Conf. on Information and Knowledge Management (CIKM) (2010)Google Scholar
  12. 12.
    Liu, C., Wu, S., Jiang, S., Tung, A.K.H.: Cross Domain Search by Exploiting Wikipedia. In: Int. Conf. on Data Engineering (ICDE) (2012)Google Scholar
  13. 13.
    Milne, D., Witten, I.H.: An open-source toolkit for mining Wikipedia. Artificial Intelligence 194 (2012)Google Scholar
  14. 14.
    Milne, D., Witten, I.: Learning to link with wikipedia. In: Proc. of Conf. on Information and Knowledge Management (CIKM) (2008)Google Scholar
  15. 15.
    Kendall, M.G.: A New Measure of Rank Correlation. Journal of Biometrika 30(1-2) (1938)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Benjamin Köhncke
    • 1
  • Wolf-Tilo Balke
    • 2
  1. 1.L3S Research CenterHannoverGermany
  2. 2.TU BraunschweigGermany

Personalised recommendations