Named entity evolution recognition on the Blogosphere

Article

Abstract

Advancements in technology and culture lead to changes in our language. These changes create a gap between the language known by users and the language stored in digital archives. It affects user’s possibility to firstly find content and secondly interpret that content. In a previous work, we introduced our approach for named entity evolution recognition (NEER) in newspaper collections. Lately, increasing efforts in Web preservation have led to increased availability of Web archives covering longer time spans. However, language on the Web is more dynamic than in traditional media and many of the basic assumptions from the newspaper domain do not hold for Web data. In this paper we discuss the limitations of existing methodology for NEER. We approach these by adapting an existing NEER method to work on noisy data like the Web and the Blogosphere in particular. We develop novel filters that reduce the noise and make use of Semantic Web resources to obtain more information about terms. Our evaluation shows the potentials of the proposed approach.

Keywords

Named entity evolution Blogs  Semantic Web DBpedia 

References

  1. 1.
    Segerstad, Y.H.: Use and adaptation of written language to the conditions of computer-mediated communication. PhD thesis, University of Gothenburg (2002)Google Scholar
  2. 2.
    Tahmasebi, N., Gossen, G., Kanhabua, N., Holzmann, H., Risse, T.: Neer: an unsupervised method for named entity evolution recognition. In: Kay, M., Boitet, C. (eds.) Proceedings of the 24th International Conference on Computational Linguistics (Coling’12), pp. 2553–2568. Indian Institute of Technology Bombay, Mumbai. http://www.l3s.de/neer-dataset (2012). Accessed 10 Nov 2014
  3. 3.
    Tahmasebi, N., Gossen, G., Risse, T.: Which words do you remember? Temporal properties of language use in digital archives. In: Theory and Practice of Digital Libraries, vol. 7489, pp. 32–37. Springer, New York (2012)Google Scholar
  4. 4.
    Diplomatic Correspondent The Times. Menace to the Volga. In: London, England, vol. 49290, p. 3. Gale Doc. No. CS52116209 (1942)Google Scholar
  5. 5.
    The Times. Sestini’s benefit last night at the Opera-House was overflowing with the fashionable and gay. In: London, England, vol. 736, p. 3. Gale Doc. No. CS50726043 (1787)Google Scholar
  6. 6.
    Berberich, K., Bedathur, S.J., Sozio, M., Weikum, G.:Bridging the terminology gap in web archive search. In: WebDB (2009)Google Scholar
  7. 7.
    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia—a crystallization point for the Web of data. J. Web Semant. 7(3), 154–165 (2009). ISSN 1570–8268. doi:10.1016/j.websem.2009.07.002
  8. 8.
    Garcá-Silva, A., Szomszor, M., Alani, H., Corcho, O.: Preliminary results in tag disambiguation using DBpedia. In: Knowledge Capture (K-Cap 2009)-Workshop on Collective Knowledge Capturing and Representation-CKCaR (2009)Google Scholar
  9. 9.
    Technorati Inc. (2013). http://www.technorati.com. Accessed 05 June 2013
  10. 10.
    Ounis, I., Macdonald, C., Soboroff, I.: Overview of the trec-2008 blog track. In: Proceedings of TREC-2008 (2009)Google Scholar
  11. 11.
    Tahmasebi, N., Gossen, G., Kanhabua, N., Holzmann, H., Risse, T.: Named entitiy evolution dataset. http://www.l3s.de/neer-dataset (2012). Accessed 10 Nov 2014
  12. 12.
    Cornolti, M., Ferragina, P., Ciaramita, M.: A framework for benchmarking entity-annotation systems. In: WWW, pp. 249–260 (2013)Google Scholar
  13. 13.
    Mendes, P.N., Jakob, M., Garca-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: I-SEMANTICS, pp. 1–8 (2011)Google Scholar
  14. 14.
    Rails-Core-Team. Ruby on rails. http://rubyonrails.org. Accessed 29 Aug 2013
  15. 15.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: ACL, pp. 363–370 (2005)Google Scholar
  16. 16.
    Kleinberg, J.M.: Bursty and hierarchical structure in streams. Data Min. Knowl. Discov. 7(4), 373–397 (2003)Google Scholar
  17. 17.
    Porter, M.F.: An algorithm for suffix stripping. Program Electron. Libr. Inf. Syst. 14(3), 130–137 (1980)Google Scholar
  18. 18.
    Holzmann, H.: Webneer: towards named entity evolution recognition on the web. Master’s thesis, University of Hanover, L3S Research Center (2013)Google Scholar
  19. 19.
    Michel, J.-B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., Pickett, J.P., Holberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L., The Google Books Team.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2010)Google Scholar
  20. 20.
    Glaser, H., Jaffri, A., Millard, I.: Managing co-reference on the semantic web. In: LDOW (2009)Google Scholar
  21. 21.
    Sagi, E., Kaufmann, S., Clark, B.: Semantic density analysis: comparing word meaning across time and phonetic space. In: Proceedings of the Workshop on Geometrical Models of Natural Language Semantics (GEMS’09), pp. 104–111. ACL (2009)Google Scholar
  22. 22.
    Lau, J.H., Cook, P., McCarthy, D., Newman, D., Baldwin, T.: Word sense induction for novel sense detection. In: EACL, pp. 591–601 (2012)Google Scholar
  23. 23.
    Wijaya, D.T., Yeniterzi, R.: Understanding semantic change of words over centuries. In: Proceedings of the Int. Workshop on Detecting and Exploiting Cultural diversity on the Social Web (DETECT’11), pp. 35–40. ACM (2011)Google Scholar
  24. 24.
    Tahmasebi, N.: Models and algorithms for automatic detection of language evolution. Towards finding and interpreting of content in long-term archives. PhD thesis, Leibniz Universität Hannover (2013)Google Scholar
  25. 25.
    Kaluarachchi, A.C., Varde, A.S., Bedathur, S.J., Weikum, G., Peng, J., Feldman, A.: Incorporating terminology evolution for query translation in text retrieval with association rules. In: CIKM, pp. 1789–1792. ACM (2010)Google Scholar
  26. 26.
    Sagi, E.: Nouns are more stable than verbs: patterns of semantic change in 19th century English. In: The 32nd Annual Conference of the Cognitive Science Society (2010)Google Scholar
  27. 27.
    Kanhabua, N., Nørvåg, K.: Exploiting time-based synonyms in searching document archives. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries (JCDL’10), pp. 79–88. ACM, New York (2010)Google Scholar
  28. 28.
    Mazeika, A., Tylenda, T., Weikum, G.: Entity timelines: visual analytics and named entity evolution. In: CIKM, pp. 2585–2588 (2011)Google Scholar
  29. 29.
    Voorhees, E.M.: Query expansion using lexical-semantic relations. In: SIGIR, pp. 61–69 (1994)Google Scholar
  30. 30.
    Holzmann, H., Gossen, G., Tahmasebi, N.: Fokas: formerly known as—a search engine incorporating named entity evolution. In: Kay, M., Boitet, C. (eds.) Proceedings of the 24th International Conference on Computational Linguistics: Demonstration Papers (Coling’12), pp. 215–222. Indian Institute of Technology Bombay, Mumbai. http://www.l3s.de/neer-dataset/fokas.html (2012). Accessed 10 Nov 2014
  31. 31.
    Holzmann, H., Tahmasebi, N., Risse, T.: Blogneer: applying named entity evolution recognition on the blogosphere. In: Proceedings of the 3rd International Workshop on Semantic Digital Archives (SDA’13), in Conjunction with the 17th International Conference on Theory and Practice of Digital Libraries (TPDL’13), Valletta (2013)Google Scholar
  32. 32.
    Mendes, P.N., Jakob, M., Bizer, C.: Dbpedia: a multilingual cross-domain knowledge base. In: LREC, pp. 1813–1817 (2012)Google Scholar
  33. 33.
    Miller, G.A.: WordNet: A lexical database for english. Communications of the ACM, vol. 38, no. 11, pp. 39–41 (1995)Google Scholar
  34. 34.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)Google Scholar
  35. 35.
    Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD Conference, pp. 1247–1250 (2008)Google Scholar
  36. 36.
    McNamee, P., Dang, H.T., Simpson, H., Schone, P., Strassel, S.: An evaluation of technologies for knowledge base population. In: LREC (2010)Google Scholar
  37. 37.
    Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: CICLing, pp. 241–257 (2003)Google Scholar
  38. 38.
    Banerjee, S., Pedersen, T.: An adapted lesk algorithm for word sense disambiguation using wordnet. In: CICLing, pp. 136–145 (2002)Google Scholar
  39. 39.
    Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation (SIGDOC’86), pp. 24–26. ACM, New York (1986). ISBN 0-89791-224-1Google Scholar
  40. 40.
    Whang, S.E., Menestrina, D., Koutrika, G., Theobald, M., Garcia-Molina, H.: Entity resolution with iterative blocking. In: SIGMOD Conference, pp. 219–232 (2009)Google Scholar
  41. 41.
    Richardson, R., Smeaton, A.F., Murphy, J.: Using wordnet as a knowledge base for measuring semantic similarity between words. Technical report. In: Proceedings of AICS Conference (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.L3S Research CenterHannoverGermany
  2. 2.Språkbanken, Department of SwedishUniversity of GothenburgGothenburgSweden

Personalised recommendations