Skip to main content

A On-Line News Documents Clustering Method

  • Conference paper
Active Media Technology (AMT 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7669))

Included in the following conference series:

  • 2275 Accesses

Abstract

To improve the efficiency and accuracy of on-line news event detection (ONED) method, we select the words that their term frequency (TF) is greater than a threshold to create the vector space model of the news document, and propose a two-stage clustering method for ONED. This method divides the detection process into two stages. In the first stage, the similar documents collected in a certain period of time are clustered into micro-clusters. In the second stage, the micro-clusters are compared with previous event clusters. The experimental results show that the proposed method has fewer computation load, higher computing rate, and less loss of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allan, J., Papka, R., Lavrenko, V.: On-line news event detection and tracking. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 37–45. ACM Press, New York (1998)

    Chapter  Google Scholar 

  2. Papka, R., Allan, J.: On-line new event detection using single pass clustering TITLE2. Technical Report (1998)

    Google Scholar 

  3. Yang, Y., Pierce, T., Carbonell, J.: A study on Retrospective and On-Line Event detection. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 28–36. ACM Press, New York (1998)

    Chapter  Google Scholar 

  4. Lam, W., Meng, H., Wong, K., Yen, J.: Using contextual analysis for news event detection. Int’l Journal on Intelligent Systems 16(4), 525–546 (2001)

    Article  MATH  Google Scholar 

  5. Brants, T., Chen, F., Farahat, A.: A system for new event detection. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 330–337. ACM Press, New York (2003)

    Google Scholar 

  6. Nieola, S., Joe, C.: Combining semantic and syntactic document classifiers to improve first story detection. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 424–425. ACM Press, New York (2001)

    Google Scholar 

  7. Kumaran, G., Allan, J.: Text classification and named entities for new event detection. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 297–304. ACM Press, New York (2004)

    Google Scholar 

  8. Yang, Y., Zhang, J., Carbonell, J., Jin, C.: Topic conditioned Novelty Detection. In: Proceedings of the 8th ACM SIGKDD International Conference, pp. 688–693. ACM Press, New York (2002)

    Google Scholar 

  9. Juha, M., Helena, A.M., Marko, S.: Applying Semantic Classes in Event Detection and Tracking. In: Proceedings of International Conference on Natural Language Processing (ICON 2002), pp. 175–183 (2002)

    Google Scholar 

  10. Juha, M., Helena, A.M., Marko, S.: Simple Semantics in Topic Detection and Tracking. Information Retrieval 7(3-4), 347–368 (2004)

    Google Scholar 

  11. Kuo, Z., Juan-Zi, L., Gang, W.: A new event detection model based on term reweighting. Journal of Software 19(4), 817–828 (2008) (in Chinese)

    Google Scholar 

  12. Yan, F., Ming-quan, Z., Xue-song, W.: On-Line Event Detection from Web News Stream. Journal of Software 21(suppl.), 363–372 (2010) (in Chinese)

    Google Scholar 

  13. Hua-ping, Z., Qun, L.: Calculation of the Chinese lexical analysis system LCTCLAS. Institute of Computing. Chinese Academy of Sciences (2002) (in Chinese), http://sewm.pku.edu.cn/QA/reference/LCTCLAS/FreeICTCLAS/

  14. Xiao-yan, Z.: Research on the Representation Model and Technologies of Link Detection and Tracking on News Topic. National University of Defense Technology (2010) (in Chinese)

    Google Scholar 

  15. The linguistic data consortium, http://www.ldc.upenn.edu/

  16. NIST. The 2003 Topic Detection and Tracking Task Definition and Evaluation Plan. National Institute of Standards and Technology (NIST) (2003), http://www.itl.nist.gov/iaui/894.01/tests/tdt/tdt2003/evalplan.html

  17. Van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths, London (1979)

    Google Scholar 

  18. Rong-lu, L.: Chinese Text Classification Corpus, http://www.nlp.org.cn/docs/docredirect.php?doc_id=281

  19. SougouCA corpus, http://www.sogou.com/labs/dl/ca.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, H., Li, Gh., Xu, Xw. (2012). A On-Line News Documents Clustering Method. In: Huang, R., Ghorbani, A.A., Pasi, G., Yamaguchi, T., Yen, N.Y., Jin, B. (eds) Active Media Technology. AMT 2012. Lecture Notes in Computer Science, vol 7669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35236-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35236-2_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35235-5

  • Online ISBN: 978-3-642-35236-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics