Automatic Topic Detection with an Incremental Clustering Algorithm

  • Xiaoming Zhang
  • Zhoujun Li
Conference paper

DOI: 10.1007/978-3-642-16515-3_43

Volume 6318 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Zhang X., Li Z. (2010) Automatic Topic Detection with an Incremental Clustering Algorithm. In: Wang F.L., Gong Z., Luo X., Lei J. (eds) Web Information Systems and Mining. WISM 2010. Lecture Notes in Computer Science, vol 6318. Springer, Berlin, Heidelberg

Abstract

At present, most of the topic detection approaches are not accurate and efficient enough. In this paper, we proposed a new topic detection method (TPIC) based on an incremental clustering algorithm. It employs a self-refinement process of discriminative feature identification and a term reweighting algorithm to accurately cluster the given documents which discuss the same topic. To be efficient, the “aging” nature of topics is used to precluster stories. To automatically detect the true number of topics, Bayesian Information Criterion (BIC) is used to estimate the true number of topics. Experimental results on Linguistic Data Consortium (LDC) datasets TDT4 show that the proposed method can improve both the efficiency and accuracy, compared to other methods.

Keywords

TDT Topic Detection incremental clustering term reweighting 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Xiaoming Zhang
    • 1
  • Zhoujun Li
    • 1
  1. 1.School of computerBeihang UniversityBeijngChina