Abstract
Working with Web archives raises a number of issues caused by their temporal characteristics. Depending on the age of the content, additional knowledge might be needed to find and understand older texts. Especially facts about entities are subject to change. Most severe in terms of information retrieval are name changes. In order to find entities that have changed their name over time, search engines need to be aware of this evolution. We tackle this problem by analyzing Wikipedia in terms of entity evolutions mentioned in articles regardless the structural elements. We gathered statistics and automatically extracted minimum excerpts covering name changes by incorporating lists dedicated to that subject. In future work, these excerpts are going to be used to discover patterns and detect changes in other sources. In this work we investigate whether or not Wikipedia is a suitable source for extracting the required knowledge.
This work is partly funded by the European Research Council under ALEXANDRIA (ERC 339233).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web Journal (2014)
Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD Conference, pp. 1247–1250 (2008)
Miller, G.A.: Wordnet: A lexical database for english. Commun. ACM, 39–41 (1995)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge. In: 16th International World Wide Web Conference (WWW 2007). ACM Press, New York (2007)
Kanhabua, N., Nørvåg, K.: Exploiting time-based synonyms in searching document archives. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, JCDL 2010, pp. 79–88. ACM, New York (2010)
The Stanford Natural Language Processing Group. Stanford corenlp - a suite of core nlp tools (2010), http://nlp.stanford.edu/software/corenlp.shtml (accessed February 3, 2014)
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: Yago2: A spatially and temporally enhanced knowledge base from wikipedia. Artificial Intelligence 194, 28–61 (2013)
Anderka, M., Stein, B., Lipka, N.: Predicting quality flaws in user-generated content: the case of wikipedia. In: SIGIR, pp. 981–990 (2012)
Anderka, M., Stein, B.: Overview of the 1th international competition on quality flaw prediction in wikipedia. In: CLEF (Online Working Notes/Labs/Workshop) (2012)
Ferschke, O., Gurevych, I., Chebotar, Y.: Behind the article: Recognizing dialog acts in wikipedia talk pages. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (2012)
Ferschke, O., Zesch, T., Gurevych, I.: Wikipedia revision toolkit: Efficiently accessing wikipedia’s edit history. In: ACL (System Demonstrations), pp. 97–102 (2011)
Goldfarb, D., Arends, M., Froschauer, J., Merkl, W.: Art history on wikipedia, a macroscopic observation. In: Proceedings of the ACM WebSci 2012, pp. 163–168. ACM (2012)
Berberich, K., Bedathur, S.J., Sozio, M., Weikum, G.: Bridging the terminology gap in web archive search. In: WebDB (2009)
Kaluarachchi, A.C., Varde, A.S., Bedathur, S.J., Weikum, G., Peng, J., Feldman, A.: Incorporating terminology evolution for query translation in text retrieval with association rules. In: CIKM, pp. 1789–1792. ACM (2010)
Mazeika, A., Tylenda, T., Weikum, G.: Entity timelines: Visual analytics and named entity evolution. In: Proc. of the 20th ACM Int. Conference on Information and Knowledge Management, CIKM 2011, pp. 2585–2588. ACM, New York (2011)
Tahmasebi, N., Gossen, G., Kanhabua, N., Holzmann, H., Risse, T.: Neer: An unsupervised method for named entity evolution recognition. In: Proceedings of the 24th International Conference on Computational Linguistics (Coling 2012), Mumbai, India (2012)
Tahmasebi, N.: Models and Algorithms for Automatic Detection of Language Evolution. Towards Finding and Interpreting of Content in Long-Term Archives. PhD thesis, Leibniz Universität Hannover (2013)
Holzmann, H., Gossen, G., Tahmasebi, N.: fokas: Formerly known as - a search engine incorporating named entity evolution. In: Proceedings of the 24th International Conference on Computational Linguistics: Demonstration Papers (Coling 2012), Mumbai, India (2012)
Holzmann, H., Tahmasebi, N., Risse, T.: Blogneer: Applying named entity evolution recognition on the blogosphere. In: Proc. of the 3rd Int. Workshop on Semantic Digital Archives (SDA 2013), in Conjunction with TPDL 2013, Valetta, Malta (September 2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Holzmann, H., Risse, T. (2014). Insights into Entity Name Evolution on Wikipedia. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8787. Springer, Cham. https://doi.org/10.1007/978-3-319-11746-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-11746-1_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11745-4
Online ISBN: 978-3-319-11746-1
eBook Packages: Computer ScienceComputer Science (R0)