Applied Intelligence

, Volume 30, Issue 2, pp 121–141 | Cite as

Personalized e-news monitoring agent system for tracking user-interested Chinese news events

Article

Abstract

Numerous paper-based newspapers have been transformed into a digital format and published on the Internet. Digital newspapers are gradually becoming a popular electronic media for conveying information immediately. Google developed a powerful news service, Google news alert, based on the Google news aggregator for tracking user-interested new events utilizing a keywords matching approach. However, this service only monitors and tracks news events using the keyword-matching scheme; consequently, the Google news alert retrieves many irrelevant news events and sends them to users. In other words, the current service cannot monitor news events via a specific news topic; although recall rate is high, the precision rate is low when tracking user-interested news events. Thus, this study presents a novel personalized e-news monitoring agent system that employs the topic-tracking-based approach, improving the flaw of the keyword-based approach, for tracking user-interested news events on Google News site. The proposed scheme simultaneously considers both similarities and the semantic relationships among news topics to track news events. Additionally, to further support the promotion of the accuracy rate in tracking user-interested Chinese news events, the Chinese word segmentation system ECScanner (An Extension Chinese Lexicon Scanner) with new word extension is proposed for the Chinese word segmentation process. Experimental results demonstrated that the proposed scheme, based on topic-based approach, is superior to the keyword-based approach used by Google news alert in terms of precision rate, and retains a high recall rate when tracking user-interested news events. Compared with the conventional Chinese word segmentation system CKIP (Chinese Knowledge Information Processing), experimental results also confirmed that using the proposed ECScanner with novel extension mechanism for new words improves the accuracy rate in tracking user-interested news events.

Keywords

News events News events monitoring agent system Information retrieval Intelligent agent 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cheung P-S, Huang R, Lam W (2004) Financial activity mining from online multilingual news. In: The international conference on information technology: coding and computing Google Scholar
  2. 2.
    Fung GPC, Yu JX, Lam W (2003) Stock prediction: integrating text mining approach using real-time news. In: IEEE international conference on computational intelligence for financial engineering, pp 395–402 Google Scholar
  3. 3.
    Mittermayer M-A (2004) Forecasting intraday stock price trends with text mining techniques. In: The 37th Hawaii international conference on system sciences, pp 1–10 Google Scholar
  4. 4.
    Wiithrich B, Permunetilleke D, Leung S, Cho V, Zhang J, Lam W (1998) Daily prediction of major stock indices from textual www data. In: Proceedings of the 4th international conference on knowledge discovery and data mining, KDD-98 Google Scholar
  5. 5.
    Fawcett T, Provost F (1999) Activity monitoring: noticing interesting changes in behavior. In: Chaudhuri, Madigan (eds) Proceedings on the fifth ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, pp 53–62 Google Scholar
  6. 6.
    Wuthrich B et al (1998) Daily stock market forecast from textual web data. In: IEEE International conference on systems, man, and cybernetics, pp 1–6 Google Scholar
  7. 7.
    Peramunetilleke D, Wong RK (2002) Currency exchange rate forecasting from news headlines. In: Proceedings of the thirteenth Australasian database conference Google Scholar
  8. 8.
    Nesbitt KV, Barrass S (2004) Finding trading patterns in stock market data. IEEE Comput Graph Appl 24(5):45–55 CrossRefGoogle Scholar
  9. 9.
    Kuo RJ, Chen CH, Hwang YC (2001) An intelligent stock trading decision support system through integration of genetic algorithm based fuzzy neural network and artificial neural network. Fuzzy Sets Syst 118(1):21–45 CrossRefMathSciNetGoogle Scholar
  10. 10.
    Shan NA, Elbahesh EM (2004) Topic-based clustering of news articles. In: Proceedings of the 42th annual southeast regional conference, pp 412–413 Google Scholar
  11. 11.
    Maria N, Silva MJ (2000) Theme-based retrieval of web news. In: SIGIR, July 2000, pp 354–356 Google Scholar
  12. 12.
    Kurtz AJ, Mostafa J (2003) Topic detection and interest tracking in a dynamic online news source. In: Proceedings of the 2003 joint conference on digital libraries Google Scholar
  13. 13.
    Lam W, Cheung P-S, Huang R (2004) Mining events and new name translations from online daily news. In: Proceedings of the 4th ACM/IEEE-CS joint conference on digital libraries, pp 287–295 Google Scholar
  14. 14.
    Allan J, Papka R, Lavrenko V (1998) On-line new event detection and tracking. In: SIGIR, pp 37–45 Google Scholar
  15. 15.
    Lee C-S, Jian Z-W, Huang L-K (2005) A fuzzy ontology and its application to news summarization. IEEE Trans Syst Man Cybern Part B: Cybern 35(5):859–880 CrossRefGoogle Scholar
  16. 16.
    Michael JAB, Gordon L (2004) Data mining techniques for marketing, sales, and customer relationship management. Indianapolis, Wiley Google Scholar
  17. 17.
  18. 18.
    Foo S, Li H (2004) Chinese word segmentation and its effect on information retrieval. Inf Process Manag 40:161–190 CrossRefGoogle Scholar
  19. 19.
    Chinese knowledge information processing (CKIP). Web available at http://140.109.19.112/
  20. 20.
    Ma W-Y, Chen K-J (2003) Introduction to CKIP Chinese word segmentation system for the first international Chinese word segmentation bakeoff. In: Proceedings of ACL, second SIGHAN workshop on Chinese language processing, pp 168–171 Google Scholar
  21. 21.
    ECScanner (An Extension Chinese Lexicon Scanner). Web available at http://dlll.nccu.edu.tw/~rank/ecscanner/
  22. 22.
    Google news. Web available from: http://www.google.com/press/descriptions.html#news
  23. 23.
    Google advanced search. Web available at http://www.google.com/press/descriptions.html#special
  24. 24.
    Caglayan A, Harrison C (1997) Agent sourcebook: a practical guide to introducing agent technology into your business applications. New York, Wiley Google Scholar
  25. 25.
    Yeh CL, Lee HJ (1991) Rule-based word identification for mandarin Chinese sentences—a unification approach. Comput Process Chin Oriental Lang 5:97–118 Google Scholar
  26. 26.
    Zhang M-Y, Lu Z-D, Zou C-Y (2004) A Chinese word segmentation based on language situation in processing ambiguous words. Inf Sci 162(3–4):275–285 MATHCrossRefGoogle Scholar
  27. 27.
    Chen KJ, Liu SH (1992) Word identification for mandarin Chinese sentences. In: Proceedings of COLING, pp 101–107 Google Scholar
  28. 28.
    Dee HM (1985) Introduction to natural language processing. Va.Reston, Reston Google Scholar
  29. 29.
    Huang CR, Chen KJ, Chang LL (1997) Segmentation standard for Chinese natural language processing. Int J Comput Linguist Chin Lang Process 2(2):47–62 Google Scholar
  30. 30.
    He S, Zhu J (2000) A bootstrap method for Chinese new words extraction. IEEE Int Conf Acoust Speech, Signal Process 1(7–11):581–584 Google Scholar
  31. 31.
    Nie JY, Brisebois M, Ren XB (1996) On Chinese text retrieval. In: Proceedings of SIGIR’96, pp 225–233 Google Scholar
  32. 32.
    Wu ZM, Tseng G (1993) Chinese text segmentation for text retrieval: achievements and problems. J Am Soc Inf Sci 44(9):532–542 CrossRefGoogle Scholar
  33. 33.
    Wu ZM, Tseng G (1995) ACTS: an automatic Chinese text segmentation system for full text retrieval. J Am Soc Inf Sci 46(2):83–96 CrossRefGoogle Scholar
  34. 34.
    Chowdhury GG (2004) Introduction to modern information retrieval Facet, London Google Scholar
  35. 35.
    CScanner (A Chinese Lexicon Scanner). Web available at http://technology.chtsai.org/cscanner/
  36. 36.
    Department of Chinese Literature of National Chengchi University. Web available at http://www.chinese.nccu.edu.tw/english/english06/index.htm
  37. 37.
    Dhillon IS, Modha DS (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42(1):143–175 MATHCrossRefGoogle Scholar
  38. 38.
    Taiwan version of Google news. Web available at http://news.google.com.tw/
  39. 39.
    Chen KJ, Ma WY (2002) Unknown word extraction for Chinese documents. In: Proceedings of COLING, pp 169–175 Google Scholar
  40. 40.
    Chinese word lexicon. Web available at http://www.aclclp.org.tw/use_rlssd_c.php

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Graduate Institute of Library, Information and Archival StudiesNational Chengchi UniversityTaipeiTaiwan
  2. 2.Graduate Institute of Learning TechnologyNational Hualien University of EducationHualienTaiwan

Personalised recommendations