Clipping and Analyzing News Using Machine Learning Techniques

Gründel, Hans; Naphtali, Tino; Wiech, Christian; Gluba, Jan-Marian; Rohdenburg, Maiken; Scheffer, Tobias

doi:10.1007/3-540-45650-3_11

Hans Gründel³,
Tino Naphtali³,
Christian Wiech³,
Jan-Marian Gluba³,
Maiken Rohdenburg³ &
…
Tobias Scheffer³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2226))

Included in the following conference series:

International Conference on Discovery Science

390 Accesses
2 Citations
3 Altmetric

Abstract

Generating press clippings for companies manually requires a considerable amount of resources.W e describe a system that monitors online newspapers and discussion boards automatically.The system extracts, classifies and analyzes messages and generates press clippings automatically, taking the specific needs of client companies into account. Key components of the system are a spider, an information extraction engine, a text classifier based on the Support Vector Machine that categorizes messages by subject, and a second classifier that analyzes which emotional state the author of a newsgroup posting was likely to be in. By analyzing large amount of messages, the system can summarize the main issues that are being reported on for given business sectors, and can summarize the emotional attitude of customers and shareholders towards companies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Amrodt and E. Plaza. Cased-based reasoning: foundations, issues, methodological variations, and system approaches. AICOM, 7(1):39–59, 1994.
Google Scholar
N. Belkin and W. Croft. Information filtering and information retrieval: Two sides of the same coin? Communications of the ACM, 35(12):29–38, 1992.
Article Google Scholar
Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew K. McCallum, Tom M. Mitchell, Kamal Nigam, and Seán Slattery.Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence, 118(1–2):69–113, 2000.
Article MATH Google Scholar
L. Eikvil.Information extraction from the world wide web: a survey.Technical Report 945, Norwegian Computing Center, 1999.
Google Scholar
P. Ekman, W. F riesen, and P. Ellsworth. Emotion in the human face: Guidelines for research and integration of findings.P ergamon Press, 1972.
Google Scholar
G. Grieser, K. Jan tke, S. Lange, and B. Thomas. A unifying approach to html wrapper representation and learning.In Proceedings of the Third International Conference on Discovery Science, 2000.
Google Scholar
Ralph Grishman and Beth Sundheim. Message understanding conference-6: A brief history.In Proceedings of the International Conference on Computational Linguistics, 1996.
Google Scholar
N. Hsu and M. Dung. Generating finite-state transducers for semistructured data extraction from the web. Journal of Information Systems, Special Issue on Semistructured Data, 23(8), 1998.
Google Scholar
C. Izard. The face of emotion.Appleton-Cen tury-Crofts, 1971.
Google Scholar
T. Joachims. A probabilistic analysis of the rocchio algorithm with tfidf for text categorization.In Proceedings of the 14th International Conference on Machine Learning, 1997.
Google Scholar
T. Joachims.Text categorization with support vector machines.In Proceedings of the European Conference on Machine Learning, 1998.
Google Scholar
Thorsten Joachims, Dayne Freitag, and Tom Mitchell. WebWatcher: A tour guide for theWorld WideWeb.In Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI-97), pages 770–777, San Francisco, August 23–29 1997. Morgan Kaufmann Publishers.
Google Scholar
J. Konstantin, B. Miller, D. Maltz, J. Herlo cker, L. Gordon, and J. Riedl. Grouplens: applying collaborative filtering to usenet news. Communications of the ACM, 40(3):77–87, 1997.
Article Google Scholar
N. Kushmerick. Wrapper induction: efficiency and expressiveness. Artificial Intelligence, 118:15–68, 2000.
Article MATH MathSciNet Google Scholar
John Lafferty, Fernando Pereira, and Andrew McCallum. Conditional random fields: Probabilistic models for segmenting and labeling sequence data.In Proceedings of the International Conference on Machine Learning, 2001.
Google Scholar
P. Maes. Agents that reduce work and information overload. Communications of the ACM, 37(7), 1994.
Google Scholar
Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy Markov models for information extraction and segmentation.In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.
Google Scholar
A. Moukas. Amalthaea: Information discovery and filtering using a multiagent evolving ecosystem.In Proceedings of the Conference on Practical Application of Intelligent Agents and Multi-Agent Technology, 1996.
Google Scholar
M Pazzani, J. Muramatsu, and D. Billsus. Syskill and webert: Identifying interesting web sites.In Proceedings of the National Conference on Artificial Intelligence, pages 54–61, 1996.
Google Scholar
L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–285, 1989.
Article Google Scholar
T. Scheffer, C. Decomain, and S. Wrobel. Active hidden markov models for information extraction.In Proceedings of the International Symposium on Intelligent Data Analysis, 2001.
Google Scholar
T. Scheffer and T. Joachims. Expected error analysis for model selection. In Proceedings of the Sixteenth International Conference on Machine Learning, 1999.
Google Scholar
Tobias Scheffer and Stefan Wrobel. Active learning of partially hidden markov models.In Proceedings of the ECML/PKDD Workshopon Instance Selection, 2001.
Google Scholar
M. Sehami, M. Craven, T. Joachims, and A. McCallum, editors. Learning for Text Categorization, Proceedings of the ICML/AAAI Workshop.AAAI Press, 1998.
Google Scholar
Kristie Seymore, Andrew McCallum, and Roni Rosenfeld. Learning hidden markov model structure for information extraction.In AAAI’99 Workshop on Machine Learning for Information Extraction, 1999.
Google Scholar
B. Sheth.Newt: A learning approach to personalized information filtering.Master’s thesis, Department of Electiric Engineering and Computer Science, MIT, 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

SemanticEdge, Kaiserin-Augusta-Allee 10-11, 10553, Berlin, Germany
Hans Gründel, Tino Naphtali, Christian Wiech, Jan-Marian Gluba, Maiken Rohdenburg & Tobias Scheffer

Authors

Hans Gründel
View author publications
You can also search for this author in PubMed Google Scholar
Tino Naphtali
View author publications
You can also search for this author in PubMed Google Scholar
Christian Wiech
View author publications
You can also search for this author in PubMed Google Scholar
Jan-Marian Gluba
View author publications
You can also search for this author in PubMed Google Scholar
Maiken Rohdenburg
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Scheffer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

DFKI GmbH Saarbrücken, 66123, Saarbrücken, Germany
Klaus P. Jantke
Department of Informatics, Kyushu University, 6-10-1 Hakozaki, Higashi-ku, 812-8581, Fukuoka, Japan
Ayumi Shinohara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gründel, H., Naphtali, T., Wiech, C., Gluba, JM., Rohdenburg, M., Scheffer, T. (2001). Clipping and Analyzing News Using Machine Learning Techniques. In: Jantke, K.P., Shinohara, A. (eds) Discovery Science. DS 2001. Lecture Notes in Computer Science(), vol 2226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45650-3_11

Download citation

DOI: https://doi.org/10.1007/3-540-45650-3_11
Published: 20 December 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42956-2
Online ISBN: 978-3-540-45650-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics