Skip to main content

Clipping and Analyzing News Using Machine Learning Techniques

  • Conference paper
  • First Online:

Part of the Lecture Notes in Computer Science book series (LNAI,volume 2226)

Abstract

Generating press clippings for companies manually requires a considerable amount of resources.W e describe a system that monitors online newspapers and discussion boards automatically.The system extracts, classifies and analyzes messages and generates press clippings automatically, taking the specific needs of client companies into account. Key components of the system are a spider, an information extraction engine, a text classifier based on the Support Vector Machine that categorizes messages by subject, and a second classifier that analyzes which emotional state the author of a newsgroup posting was likely to be in. By analyzing large amount of messages, the system can summarize the main issues that are being reported on for given business sectors, and can summarize the emotional attitude of customers and shareholders towards companies.

Keywords

  • Support Vector Machine
  • Hide Markov Model
  • Information Extraction
  • Machine Learn Technique
  • News Story

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/3-540-45650-3_11
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   99.00
Price excludes VAT (USA)
  • ISBN: 978-3-540-45650-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   129.00
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Amrodt and E. Plaza. Cased-based reasoning: foundations, issues, methodological variations, and system approaches. AICOM, 7(1):39–59, 1994.

    Google Scholar 

  2. N. Belkin and W. Croft. Information filtering and information retrieval: Two sides of the same coin? Communications of the ACM, 35(12):29–38, 1992.

    CrossRef  Google Scholar 

  3. Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew K. McCallum, Tom M. Mitchell, Kamal Nigam, and Seán Slattery.Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence, 118(1–2):69–113, 2000.

    MATH  CrossRef  Google Scholar 

  4. L. Eikvil.Information extraction from the world wide web: a survey.Technical Report 945, Norwegian Computing Center, 1999.

    Google Scholar 

  5. P. Ekman, W. F riesen, and P. Ellsworth. Emotion in the human face: Guidelines for research and integration of findings.P ergamon Press, 1972.

    Google Scholar 

  6. G. Grieser, K. Jan tke, S. Lange, and B. Thomas. A unifying approach to html wrapper representation and learning.In Proceedings of the Third International Conference on Discovery Science, 2000.

    Google Scholar 

  7. Ralph Grishman and Beth Sundheim. Message understanding conference-6: A brief history.In Proceedings of the International Conference on Computational Linguistics, 1996.

    Google Scholar 

  8. N. Hsu and M. Dung. Generating finite-state transducers for semistructured data extraction from the web. Journal of Information Systems, Special Issue on Semistructured Data, 23(8), 1998.

    Google Scholar 

  9. C. Izard. The face of emotion.Appleton-Cen tury-Crofts, 1971.

    Google Scholar 

  10. T. Joachims. A probabilistic analysis of the rocchio algorithm with tfidf for text categorization.In Proceedings of the 14th International Conference on Machine Learning, 1997.

    Google Scholar 

  11. T. Joachims.Text categorization with support vector machines.In Proceedings of the European Conference on Machine Learning, 1998.

    Google Scholar 

  12. Thorsten Joachims, Dayne Freitag, and Tom Mitchell. WebWatcher: A tour guide for theWorld WideWeb.In Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI-97), pages 770–777, San Francisco, August 23–29 1997. Morgan Kaufmann Publishers.

    Google Scholar 

  13. J. Konstantin, B. Miller, D. Maltz, J. Herlo cker, L. Gordon, and J. Riedl. Grouplens: applying collaborative filtering to usenet news. Communications of the ACM, 40(3):77–87, 1997.

    CrossRef  Google Scholar 

  14. N. Kushmerick. Wrapper induction: efficiency and expressiveness. Artificial Intelligence, 118:15–68, 2000.

    MATH  CrossRef  MathSciNet  Google Scholar 

  15. John Lafferty, Fernando Pereira, and Andrew McCallum. Conditional random fields: Probabilistic models for segmenting and labeling sequence data.In Proceedings of the International Conference on Machine Learning, 2001.

    Google Scholar 

  16. P. Maes. Agents that reduce work and information overload. Communications of the ACM, 37(7), 1994.

    Google Scholar 

  17. Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy Markov models for information extraction and segmentation.In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.

    Google Scholar 

  18. A. Moukas. Amalthaea: Information discovery and filtering using a multiagent evolving ecosystem.In Proceedings of the Conference on Practical Application of Intelligent Agents and Multi-Agent Technology, 1996.

    Google Scholar 

  19. M Pazzani, J. Muramatsu, and D. Billsus. Syskill and webert: Identifying interesting web sites.In Proceedings of the National Conference on Artificial Intelligence, pages 54–61, 1996.

    Google Scholar 

  20. L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–285, 1989.

    CrossRef  Google Scholar 

  21. T. Scheffer, C. Decomain, and S. Wrobel. Active hidden markov models for information extraction.In Proceedings of the International Symposium on Intelligent Data Analysis, 2001.

    Google Scholar 

  22. T. Scheffer and T. Joachims. Expected error analysis for model selection. In Proceedings of the Sixteenth International Conference on Machine Learning, 1999.

    Google Scholar 

  23. Tobias Scheffer and Stefan Wrobel. Active learning of partially hidden markov models.In Proceedings of the ECML/PKDD Workshopon Instance Selection, 2001.

    Google Scholar 

  24. M. Sehami, M. Craven, T. Joachims, and A. McCallum, editors. Learning for Text Categorization, Proceedings of the ICML/AAAI Workshop.AAAI Press, 1998.

    Google Scholar 

  25. Kristie Seymore, Andrew McCallum, and Roni Rosenfeld. Learning hidden markov model structure for information extraction.In AAAI’99 Workshop on Machine Learning for Information Extraction, 1999.

    Google Scholar 

  26. B. Sheth.Newt: A learning approach to personalized information filtering.Master’s thesis, Department of Electiric Engineering and Computer Science, MIT, 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gründel, H., Naphtali, T., Wiech, C., Gluba, JM., Rohdenburg, M., Scheffer, T. (2001). Clipping and Analyzing News Using Machine Learning Techniques. In: Jantke, K.P., Shinohara, A. (eds) Discovery Science. DS 2001. Lecture Notes in Computer Science(), vol 2226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45650-3_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-45650-3_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42956-2

  • Online ISBN: 978-3-540-45650-6

  • eBook Packages: Springer Book Archive

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.