Abstract
Generating press clippings for companies manually requires a considerable amount of resources.W e describe a system that monitors online newspapers and discussion boards automatically.The system extracts, classifies and analyzes messages and generates press clippings automatically, taking the specific needs of client companies into account. Key components of the system are a spider, an information extraction engine, a text classifier based on the Support Vector Machine that categorizes messages by subject, and a second classifier that analyzes which emotional state the author of a newsgroup posting was likely to be in. By analyzing large amount of messages, the system can summarize the main issues that are being reported on for given business sectors, and can summarize the emotional attitude of customers and shareholders towards companies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Amrodt and E. Plaza. Cased-based reasoning: foundations, issues, methodological variations, and system approaches. AICOM, 7(1):39–59, 1994.
N. Belkin and W. Croft. Information filtering and information retrieval: Two sides of the same coin? Communications of the ACM, 35(12):29–38, 1992.
Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew K. McCallum, Tom M. Mitchell, Kamal Nigam, and SeĂ¡n Slattery.Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence, 118(1–2):69–113, 2000.
L. Eikvil.Information extraction from the world wide web: a survey.Technical Report 945, Norwegian Computing Center, 1999.
P. Ekman, W. F riesen, and P. Ellsworth. Emotion in the human face: Guidelines for research and integration of findings.P ergamon Press, 1972.
G. Grieser, K. Jan tke, S. Lange, and B. Thomas. A unifying approach to html wrapper representation and learning.In Proceedings of the Third International Conference on Discovery Science, 2000.
Ralph Grishman and Beth Sundheim. Message understanding conference-6: A brief history.In Proceedings of the International Conference on Computational Linguistics, 1996.
N. Hsu and M. Dung. Generating finite-state transducers for semistructured data extraction from the web. Journal of Information Systems, Special Issue on Semistructured Data, 23(8), 1998.
C. Izard. The face of emotion.Appleton-Cen tury-Crofts, 1971.
T. Joachims. A probabilistic analysis of the rocchio algorithm with tfidf for text categorization.In Proceedings of the 14th International Conference on Machine Learning, 1997.
T. Joachims.Text categorization with support vector machines.In Proceedings of the European Conference on Machine Learning, 1998.
Thorsten Joachims, Dayne Freitag, and Tom Mitchell. WebWatcher: A tour guide for theWorld WideWeb.In Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI-97), pages 770–777, San Francisco, August 23–29 1997. Morgan Kaufmann Publishers.
J. Konstantin, B. Miller, D. Maltz, J. Herlo cker, L. Gordon, and J. Riedl. Grouplens: applying collaborative filtering to usenet news. Communications of the ACM, 40(3):77–87, 1997.
N. Kushmerick. Wrapper induction: efficiency and expressiveness. Artificial Intelligence, 118:15–68, 2000.
John Lafferty, Fernando Pereira, and Andrew McCallum. Conditional random fields: Probabilistic models for segmenting and labeling sequence data.In Proceedings of the International Conference on Machine Learning, 2001.
P. Maes. Agents that reduce work and information overload. Communications of the ACM, 37(7), 1994.
Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy Markov models for information extraction and segmentation.In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.
A. Moukas. Amalthaea: Information discovery and filtering using a multiagent evolving ecosystem.In Proceedings of the Conference on Practical Application of Intelligent Agents and Multi-Agent Technology, 1996.
M Pazzani, J. Muramatsu, and D. Billsus. Syskill and webert: Identifying interesting web sites.In Proceedings of the National Conference on Artificial Intelligence, pages 54–61, 1996.
L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–285, 1989.
T. Scheffer, C. Decomain, and S. Wrobel. Active hidden markov models for information extraction.In Proceedings of the International Symposium on Intelligent Data Analysis, 2001.
T. Scheffer and T. Joachims. Expected error analysis for model selection. In Proceedings of the Sixteenth International Conference on Machine Learning, 1999.
Tobias Scheffer and Stefan Wrobel. Active learning of partially hidden markov models.In Proceedings of the ECML/PKDD Workshopon Instance Selection, 2001.
M. Sehami, M. Craven, T. Joachims, and A. McCallum, editors. Learning for Text Categorization, Proceedings of the ICML/AAAI Workshop.AAAI Press, 1998.
Kristie Seymore, Andrew McCallum, and Roni Rosenfeld. Learning hidden markov model structure for information extraction.In AAAI’99 Workshop on Machine Learning for Information Extraction, 1999.
B. Sheth.Newt: A learning approach to personalized information filtering.Master’s thesis, Department of Electiric Engineering and Computer Science, MIT, 1994.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
GrĂ¼ndel, H., Naphtali, T., Wiech, C., Gluba, JM., Rohdenburg, M., Scheffer, T. (2001). Clipping and Analyzing News Using Machine Learning Techniques. In: Jantke, K.P., Shinohara, A. (eds) Discovery Science. DS 2001. Lecture Notes in Computer Science(), vol 2226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45650-3_11
Download citation
DOI: https://doi.org/10.1007/3-540-45650-3_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42956-2
Online ISBN: 978-3-540-45650-6
eBook Packages: Springer Book Archive