Skip to main content

Clipping and Analyzing News Using Machine Learning Techniques

  • Conference paper
  • First Online:
Discovery Science (DS 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2226))

Included in the following conference series:

Abstract

Generating press clippings for companies manually requires a considerable amount of resources.W e describe a system that monitors online newspapers and discussion boards automatically.The system extracts, classifies and analyzes messages and generates press clippings automatically, taking the specific needs of client companies into account. Key components of the system are a spider, an information extraction engine, a text classifier based on the Support Vector Machine that categorizes messages by subject, and a second classifier that analyzes which emotional state the author of a newsgroup posting was likely to be in. By analyzing large amount of messages, the system can summarize the main issues that are being reported on for given business sectors, and can summarize the emotional attitude of customers and shareholders towards companies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Amrodt and E. Plaza. Cased-based reasoning: foundations, issues, methodological variations, and system approaches. AICOM, 7(1):39–59, 1994.

    Google Scholar 

  2. N. Belkin and W. Croft. Information filtering and information retrieval: Two sides of the same coin? Communications of the ACM, 35(12):29–38, 1992.

    Article  Google Scholar 

  3. Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew K. McCallum, Tom M. Mitchell, Kamal Nigam, and SeĂ¡n Slattery.Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence, 118(1–2):69–113, 2000.

    Article  MATH  Google Scholar 

  4. L. Eikvil.Information extraction from the world wide web: a survey.Technical Report 945, Norwegian Computing Center, 1999.

    Google Scholar 

  5. P. Ekman, W. F riesen, and P. Ellsworth. Emotion in the human face: Guidelines for research and integration of findings.P ergamon Press, 1972.

    Google Scholar 

  6. G. Grieser, K. Jan tke, S. Lange, and B. Thomas. A unifying approach to html wrapper representation and learning.In Proceedings of the Third International Conference on Discovery Science, 2000.

    Google Scholar 

  7. Ralph Grishman and Beth Sundheim. Message understanding conference-6: A brief history.In Proceedings of the International Conference on Computational Linguistics, 1996.

    Google Scholar 

  8. N. Hsu and M. Dung. Generating finite-state transducers for semistructured data extraction from the web. Journal of Information Systems, Special Issue on Semistructured Data, 23(8), 1998.

    Google Scholar 

  9. C. Izard. The face of emotion.Appleton-Cen tury-Crofts, 1971.

    Google Scholar 

  10. T. Joachims. A probabilistic analysis of the rocchio algorithm with tfidf for text categorization.In Proceedings of the 14th International Conference on Machine Learning, 1997.

    Google Scholar 

  11. T. Joachims.Text categorization with support vector machines.In Proceedings of the European Conference on Machine Learning, 1998.

    Google Scholar 

  12. Thorsten Joachims, Dayne Freitag, and Tom Mitchell. WebWatcher: A tour guide for theWorld WideWeb.In Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI-97), pages 770–777, San Francisco, August 23–29 1997. Morgan Kaufmann Publishers.

    Google Scholar 

  13. J. Konstantin, B. Miller, D. Maltz, J. Herlo cker, L. Gordon, and J. Riedl. Grouplens: applying collaborative filtering to usenet news. Communications of the ACM, 40(3):77–87, 1997.

    Article  Google Scholar 

  14. N. Kushmerick. Wrapper induction: efficiency and expressiveness. Artificial Intelligence, 118:15–68, 2000.

    Article  MATH  MathSciNet  Google Scholar 

  15. John Lafferty, Fernando Pereira, and Andrew McCallum. Conditional random fields: Probabilistic models for segmenting and labeling sequence data.In Proceedings of the International Conference on Machine Learning, 2001.

    Google Scholar 

  16. P. Maes. Agents that reduce work and information overload. Communications of the ACM, 37(7), 1994.

    Google Scholar 

  17. Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy Markov models for information extraction and segmentation.In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.

    Google Scholar 

  18. A. Moukas. Amalthaea: Information discovery and filtering using a multiagent evolving ecosystem.In Proceedings of the Conference on Practical Application of Intelligent Agents and Multi-Agent Technology, 1996.

    Google Scholar 

  19. M Pazzani, J. Muramatsu, and D. Billsus. Syskill and webert: Identifying interesting web sites.In Proceedings of the National Conference on Artificial Intelligence, pages 54–61, 1996.

    Google Scholar 

  20. L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–285, 1989.

    Article  Google Scholar 

  21. T. Scheffer, C. Decomain, and S. Wrobel. Active hidden markov models for information extraction.In Proceedings of the International Symposium on Intelligent Data Analysis, 2001.

    Google Scholar 

  22. T. Scheffer and T. Joachims. Expected error analysis for model selection. In Proceedings of the Sixteenth International Conference on Machine Learning, 1999.

    Google Scholar 

  23. Tobias Scheffer and Stefan Wrobel. Active learning of partially hidden markov models.In Proceedings of the ECML/PKDD Workshopon Instance Selection, 2001.

    Google Scholar 

  24. M. Sehami, M. Craven, T. Joachims, and A. McCallum, editors. Learning for Text Categorization, Proceedings of the ICML/AAAI Workshop.AAAI Press, 1998.

    Google Scholar 

  25. Kristie Seymore, Andrew McCallum, and Roni Rosenfeld. Learning hidden markov model structure for information extraction.In AAAI’99 Workshop on Machine Learning for Information Extraction, 1999.

    Google Scholar 

  26. B. Sheth.Newt: A learning approach to personalized information filtering.Master’s thesis, Department of Electiric Engineering and Computer Science, MIT, 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

GrĂ¼ndel, H., Naphtali, T., Wiech, C., Gluba, JM., Rohdenburg, M., Scheffer, T. (2001). Clipping and Analyzing News Using Machine Learning Techniques. In: Jantke, K.P., Shinohara, A. (eds) Discovery Science. DS 2001. Lecture Notes in Computer Science(), vol 2226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45650-3_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-45650-3_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42956-2

  • Online ISBN: 978-3-540-45650-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics