Abstract

As archives contain documents that span over a long period of time, the language used to create these documents and the language used for querying the archive can differ. This difference is due to evolution in both terminology and semantics and will cause a significant number of relevant documents being omitted. A static solution is to use query expansion based on explicit knowledge banks such as thesauri or ontologies. However as we are able to archive resources with more varied terminology, it will be infeasible to use only explicit knowledge for this purpose. There exist only few or no thesauri covering very domain specific terminologies or slang as used in blogs etc. In this Ph.D. thesis we focus on automatically detecting terminology evolution in a completely unsupervised manner as described in this technical paper.

Keywords

Automatic Detection Query Expansion Word Sense Computational Linguistics Cluster Evolution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berberich, K., Bedathur, S., Sozio, M., Wiekum, G.: Bridging the terminology gap in web archive search. In: WebDB (2009)Google Scholar
  2. 2.
    Deschacht, K., Francine Moens, M., Law, I.C.F.: Text analysis for automatic image annotation. In: Proc. of the 45 th Annual Meeting of the Association for Computational Linguistics. East Stroudsburg (2007)Google Scholar
  3. 3.
    Dorow, B.: A graph model for words and their meanings. PhD thesis, University of Stuttgart (2007)Google Scholar
  4. 4.
    Dorow, B., Widdows, D., Ling, K., Eckmann, J.P., Serqi, D., Moses, E.: Using curvature and Markov clustering in graphs for lexical acquisi tion and word sense discrimination. In: 2nd Workshop organized by the MEANING Project, Trento, Italy, February 3-4 (2005)Google Scholar
  5. 5.
    Ferret, O.: Discovering word senses from a network of lexical cooccurrences. In: Proc. of the 20th international conference on Computational Linguistics, Morristown, NJ, USA, ACL, p. 1326 (2004)Google Scholar
  6. 6.
    Lin, D.: Using syntactic dependency as local context to resolve word sense ambiguity. In: Proc. of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, Morristown, NJ, USA, ACL, pp. 64–71 (1997)Google Scholar
  7. 7.
    Lin, D.: Automatic retrieval and clustering of similar words. In: Proc. of the 17th international conference on Computational linguistics, Morristown, NJ, USA, ACL, pp. 768–774 (1998)Google Scholar
  8. 8.
    Lin, Y.R., Chi, Y., Zhu, S., Sundaram, H., Tseng, B.L.: Facetnet: a framework for analyzing communities and their evolutions in dynamic networks. In: Proc. of the 17th international conference on World Wide Web, pp. 685–694. ACM, New York (2008)CrossRefGoogle Scholar
  9. 9.
    Mei, Q., Zhai, C.: Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: Proc. of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp. 198–207. ACM Press, New York (2005)CrossRefGoogle Scholar
  10. 10.
    Miller, G.A.: Wordnet: A lexical database for english. Communications of the ACM 38, 39–41 (1995)CrossRefGoogle Scholar
  11. 11.
    Oyama, S., Shirasuna, K., Tanaka, K.: Identification of time-varying objects on the web. In: Proc. of the 8th ACM/IEEE-CS joint conference on Digital libraries, pp. 285–294. ACM, New York (2008)CrossRefGoogle Scholar
  12. 12.
    Palla, G., Barabasi, A.L., Vicsek, T.: Quantifying social group evolution. Nature 446(7136), 664–667 (2007)CrossRefGoogle Scholar
  13. 13.
    Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043), 814–818Google Scholar
  14. 14.
    Pantel, P., Lin, D.: Discovering word senses from text. In: Proc. of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 613–619 (2002)Google Scholar
  15. 15.
    Schütze, H.: Automatic word sense discrimination. Journal of Computational Linguistics 24, 97–123 (1998)Google Scholar
  16. 16.
    Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., Schult, R.: Monic: modeling and monitoring cluster transitions. In: Proc. of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 706–711. ACM, New York (2006)CrossRefGoogle Scholar
  17. 17.
    Tahmasebi, N., Iofciu, T., Risse, T., Niederee, C., Siberski, W.: Terminology evolution in web archiving: Open issues. In: Proc. of 8th International Web Archiving Workshop in conjunction with ECDL (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Nina Tahmasebi
    • 1
  1. 1.L3S Research CenterHannoverGermany

Personalised recommendations