International Conference on Database and Expert Systems Applications

DEXA 2015: Database and Expert Systems Applications pp 76-93 | Cite as

From General to Specialized Domain: Analyzing Three Crucial Problems of Biomedical Entity Disambiguation

  • Stefan Zwicklbauer
  • Christin Seifert
  • Michael Granitzer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9261)

Abstract

Entity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base. Most disambiguation systems focus on general purpose knowledge bases like DBpedia but leave out the question how those results generalize to more specialized domains. This is very important in the context of Linked Open Data, which forms an enormous resource for disambiguation. We implement a ranking-based (Learning To Rank) disambiguation system and provide a systematic evaluation of biomedical entity disambiguation with respect to three crucial and well-known properties of specialized disambiguation systems. These are (i) entity context, i.e. the way entities are described, (ii) user data, i.e. quantity and quality of externally disambiguated entities, and (iii) quantity and heterogeneity of entities to disambiguate, i.e. the number and size of different domains in a knowledge base. Our results show that (i) the choice of entity context that is used to attain the best disambiguation results strongly depends on the amount of available user data, (ii) disambiguation results with large-scale and heterogeneous knowledge bases strongly depend on the entity context, (iii) disambiguation results are robust against a moderate amount of noise in user data and (iv) some results can be significantly improved with a federated disambiguation approach that uses different entity contexts. Our results indicate that disambiguation systems must be carefully adapted when expanding their knowledge bases with special domain entities.

Keywords

Entity disambiguation Learning to rank Linked data Semantic web 

Notes

Acknowledgments

The presented work was developed within the EEXCESS project funded by the European Union Seventh Framework Programme FP7/2007-2013 under grant agreement number 600601.

References

  1. 1.
    Bunescu, R., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06), Trento, Italy, pp. 9–16 (2006)Google Scholar
  2. 2.
    Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on EMNLP and CoNLL, pp. 708–716. Association for Computational Linguistics, Prague, June 2007Google Scholar
  3. 3.
    Gottipati, S., Jiang, J.: Linking entities to a knowledge base with query expansion. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, pp. 804–813. ACL, Stroudsburg (2011)Google Scholar
  4. 4.
    Han, X., Sun, L.: A generative entity-mention model for linking entities with knowledge base. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, HLT 2011, pp. 945–954. ACL, Stroudsburg (2011)Google Scholar
  5. 5.
    Han, X., Sun, L.: An entity-topic model for entity linking. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, pp. 105–115. ACL, Stroudsburg (2012)Google Scholar
  6. 6.
    Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. J. Am. Stat. Assoc. 84(406), 414–420 (1989)CrossRefMATHGoogle Scholar
  7. 7.
    Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 133–142. ACM, New York (2002)Google Scholar
  8. 8.
    Kafkas, S., Lewin, I., Milward, D., van Mulligen, E., Kors, J., Hahn, U., Rebholz-Schuhmann, D.: Calbc: releasing the final corpora. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, May 2012Google Scholar
  9. 9.
    Kataria, S.S., Kumar, K.S., Rastogi, R.R., Sen, P., Sengamedu, S.H.: Entity disambiguation with hierarchical topic models. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011, pp. 1037–1045. ACM, New York (2011)Google Scholar
  10. 10.
    Li, Y., Wang, C., Han, F., Han, J., Roth, D., Yan, X.: Mining evidences for named entity disambiguation. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, pp. 1070–1078. ACM, New York (2013)Google Scholar
  11. 11.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATHGoogle Scholar
  12. 12.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, I-Semantics 2011, pp. 1–8. ACM, New York (2011)Google Scholar
  13. 13.
    Ogden, C., Richards, I.A.: The Meaning of Meaning: A Study of the Influence of Language Upon Thought and of the Science of Symbolism, 8th edn. Harcourt Brace Jovanovich, New York (1923)Google Scholar
  14. 14.
    Ramage, D., Manning, C.D., Dumais, S.: Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011, pp. 457–465. ACM, New York (2011)Google Scholar
  15. 15.
    Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: Proceedings of the Annual Meeting of the Association of Computational Linguistics (2011)Google Scholar
  16. 16.
    Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence IJCAI 1995, vol. 1, pp. 448–453. Morgan Kaufmann Publishers Inc., San Francisco (1995)Google Scholar
  17. 17.
    Sen, P.: Collective context-aware topic models for entity disambiguation. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012, pp. 729–738. ACM, New York (2012)Google Scholar
  18. 18.
    Shen, W., Wang, J., Luo, P., Wang, M.: Linden: linking named entities with knowledge base via semantic knowledge. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012, pp. 449–458. ACM, New York (2012)Google Scholar
  19. 19.
    Tian, L., Zhang, W., Bikakis, A., Wang, H., Yu, Y., Ni, Y., Cao, F.: Medetect: a lod-based system for collective entity annotation in biomedicine. In: IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013, vol. 1, pp. 233–240. IEEE (2013)Google Scholar
  20. 20.
    Usbeck, R., Ngonga Ngomo, A.-C., Röder, M., Gerber, D., Coelho, S.A., Auer, S., Both, A.: AGDISTIS - Graph-Based disambiguation of named entities using linked data. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 457–471. Springer, Heidelberg (2014) Google Scholar
  21. 21.
    Wang, X., Tsujii, J., Ananiadou, S.: Classifying relations for biomedical named entity disambiguation. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, vol. 3, pp. 1513–1522. ACL, Stroudsburg (2009)Google Scholar
  22. 22.
    Wang, X., Tsujii, J., Ananiadou, S.: Disambiguating the species of biomedical named entities using natural language parsers. Bioinformatics 26(5), 661–667 (2010)CrossRefGoogle Scholar
  23. 23.
    Zwicklbauer, S., Seifert, C., Granitzer, M.: Do we need entity-centric knowledge bases for entity disambiguation? In: Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies, i-Know 2013, pp. 4:1–4:8. ACM, New York (2013)Google Scholar
  24. 24.
    Zwicklbauer, S., Seifert, C., Granitzer, M.: Linking Biomedical Data to the Cloud. In: Holzinger, A., Röcker, C., Ziefle, M. (eds.) Smart Health. LNCS, vol. 8700, pp. 209–235. Springer, Heidelberg (2015) Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Stefan Zwicklbauer
    • 1
  • Christin Seifert
    • 1
  • Michael Granitzer
    • 1
  1. 1.University of PassauPassauGermany

Personalised recommendations