Advertisement

Information Retrieval

, Volume 7, Issue 3–4, pp 347–368 | Cite as

Simple Semantics in Topic Detection and Tracking

  • Juha Makkonen
  • Helena Ahonen-Myka
  • Marko Salmenkivi
Article

Abstract

Topic Detection and Tracking (TDT) is a research initiative that aims at techniques to organize news documents in terms of news events. We propose a method that incorporates simple semantics into TDT by splitting the term space into groups of terms that have the meaning of the same type. Such a group can be associated with an external ontology. This ontology is used to determine the similarity of two terms in the given group. We extract proper names, locations, temporal expressions and normal terms into distinct sub-vectors of the document representation. Measuring the similarity of two documents is conducted by comparing a pair of their corresponding sub-vectors at a time. We use a simple perceptron to optimize the relative emphasis of each semantic class in the tracking and detection decisions. The results suggest that the spatial and the temporal similarity measures need to be improved. Especially the vagueness of spatial and temporal terms needs to be addressed.

topic detection and tracking retrieval model information extraction temporal expression geographical ontology 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allan J (2002a) Introduction to topic detection and tracking. In: Allan (2002b), pp. 1-16.Google Scholar
  2. Allan J (2002b), Ed. Topic Detection and Tracking: Event-based Information Organization. Kluwer Academic Publishers, Norvell, MA, USA.Google Scholar
  3. Allan J, Carbonell J, Doddington G, Yamron J and Yang Y (1998a) Topic detection and tracking pilot study: Final report. In: Proceedings of DARPA Broadcast News Transcription and Understanding Workshop. Lansdowne, VA, pp. 194–218.Google Scholar
  4. Allan J, Jin H, Rajman M, Wayne C, Gildea D, Lavrenko V, Hoberman R and Caputo D (1999) Topic-based novelty detection. Technical Report, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, Summer Workshop Final Report. http://www.clsp.jhu.edu/ws99/ (visited September 19th, 2003).Google Scholar
  5. Allan J, Lavrenko V and Jin H (2000) First story detection in TDT is hard. In: Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM). ACM Press, pp. 374-381.Google Scholar
  6. Allan J, Lavrenko V and Papka R (1998b) Event tracking. Technical Report IR-128, Department of Computer Science, University of Massachusetts.Google Scholar
  7. Allan J, Lavrenko V and Swan R (2002) Explorations within topic tracking and detection. In: Allan (2002b), pp. 197-224.Google Scholar
  8. Allan J, Papka R and Lavrenko V (1998c) On-line new event detection and tracking. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, pp. 37-45.Google Scholar
  9. Carthy J (2002) Lexical chains for topic tracking. PhD thesis, Department of Computer Science, National University of Dublin.Google Scholar
  10. Central Intelligence Agency, CIA (2003) TheWorld Factbook. http://www.cia.gov/cia/publications/factbook/ (visited September 19th, 2003).Google Scholar
  11. Cieri C, Strassel S, Graff D,Martey N, Rennert K and Liberman M (2002) Corpora for topic detection and tracking. In: Allan (2002b), pp. 33-66.Google Scholar
  12. Cutting DR, Karger DR, Pedersen JO and Tukey JW (1992) Scatter/Gather: A cluster-based approach to browsing large document collections. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, pp. 318-329.Google Scholar
  13. Falk P (1989) The past to come. Economy and Society, 17(3):374–394.Google Scholar
  14. Fiscus J and Doddington G (2002) Topic detection and tracking evaluation overview. In: Allan (2002b), pp. 17-31.Google Scholar
  15. Gerner DJ, Schrodt PA, Francisco R and Weddle JL (1994) The analysis of political events using machine coded data. International Studies Quarterly, 38:91–119.Google Scholar
  16. Goralwalla IA, Leontiev Y, Özsu MT, Szafron D and Combi C (2001) Temporal Granularity: Completing the Puzzle. Journal of Intelligent Information Systems, 16(1):41–63.Google Scholar
  17. Järvinen T and Tapanainen P (1997) A dependency parser for english. Technical Report TR-1, Department of General Linguistics, University of Helsinki.Google Scholar
  18. Joachims T (2002) Learning to Classify Text Using Support Vector Machines. Kluwer Academic Publishers, Boston.Google Scholar
  19. Krippendorff K (1995) On the reliability of unitizing continuous data. In: Marsden PV, Ed., Sociological Methodology. Blackwell, Cambridge, MA, pp. 47–76.Google Scholar
  20. Lavrenko V, Allan J, DeGuzman E, LaFlamme D, Pollard V and Thomas S (2002) Relevance models for topic detection and tracking. In: Proceedings of Human Language Technology Conference. San Diego, CA, pp. 104-110.Google Scholar
  21. Leek T, Schwartz R and Sista S (2002) Probabilistic approaches to topic detection and tracking. In: Allan (2002b), pp. 67-84.Google Scholar
  22. Makkonen J and Ahonen-Myka H (2003) Utilizing temporal information in topic detection and tracking. In: Koch T and Solveig IT, Eds., Proceedings of the 7th European Conference on Digital Libraries (ECDL). Springer-Verlag, pp. 393-404.Google Scholar
  23. Makkonen J, Ahonen-Myka H and Salmenkivi M (2002) Applying semantic classes in event detection and tracking. In: Sangal R and Bendre SM, Eds., Proceedings of International Conference on Natural Language Processing (ICON). Mumbai, India, pp. 175–183.Google Scholar
  24. Makkonen J, Ahonen-Myka H and Salmenkivi M (2003) Topic detection and tracking with spatio-temporal evidence. In: Sebastiani F, Ed., Proceedings of the 25th European Conference on Information Retrieval Research (ECIR). Springer-Verlag, Heidelberg, pp. 251–265.Google Scholar
  25. Miller GA (1995) WordNet: A lexical database for English. Communications of ACM, 38(11):39–41.Google Scholar
  26. Mitchell TM (1997) Machine Learning. McGraw-Hill. NIMA, National Imagery and Mapping Agency, Geographic Feature names. http://www.nima.mil/gns/html/ index.html (visited September 19th, 2003).Google Scholar
  27. Papka R (1999) On-line new event detection, clustering and tracking. PhD Thesis, Department of Computer Science, University of Massachusetts.Google Scholar
  28. Pons A, Berlanga R and Rumz-Shulcloper J (2002) Temporal-semantic clustering of newspaper articles for event detection. In: Proceedings of Pattern Recognition in Information Systems (PRIS2002). Ciudad Real, Spain, pp. 104-113.Google Scholar
  29. Salton G and Buckley C (1988) Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513–523.Google Scholar
  30. Schultz JM and Liberman MY (2002) Towards a “Universal Dictionary” for multi-language information retrieval applications. In: Allan (2002b), pp. 225-242.Google Scholar
  31. Sebastiani F (2002) Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1–47.Google Scholar
  32. Swan R and Allan J (1999) Extracting significant time varying features from text. In: Proceedings of the Eighth International Conference on Information and Knowledge Management (CIKM-99). ACM Press, pp. 38-45.Google Scholar
  33. Tilastokeskus (Statistics Finland) http://www.stat.fi (visited September 19th, 2003).Google Scholar
  34. Yamron JP, Gillick L, van Mulbregt P and Knecht S (2002) Statistical models of topical content. In: Allan (2002b), pp. 115-134.Google Scholar
  35. Yang Y, Ault T, Pierce T and Lattimer C (2000) Improving text categorization methods for event detection. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, pp. 65-72.Google Scholar
  36. Yang Y, Carbonell J, Brown R, Lafferty J, Pierce T and Ault T (2002a) Multi-strategy learning for TDT. In: Allan (2002b), pp. 85-114.Google Scholar
  37. Yang Y, Carbonell J, Brown R, Pierce T, Archibald BT and Liu X (1999) Learning approaches for detecting and tracking news events. IEEE Intelligent Systems Special Issue on Applications of Intelligent Information Retrieval, 14(4):32–43.Google Scholar
  38. Yang Y and Liu X (1999) A re-examination of text categorization methods. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, pp. 42-49.Google Scholar
  39. Yang Y, Zhang J, Carbonell J and Jin C (2002b) Topic-conditioned novelty detection. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, pp. 688-693.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Juha Makkonen
    • 1
  • Helena Ahonen-Myka
    • 1
  • Marko Salmenkivi
    • 1
  1. 1.Department of Computer ScienceUniversity of HelsinkiFIN-Finland

Personalised recommendations