Abstract
We present Ord i Dag, a new service that displays today’s most important keywords. These are extracted fully automatically from Norwegian online newspapers. Describing the complete process, we provide an entirely disclosed method for media monitoring and news summarization. For keyword extraction, a reference corpus serves as background about average language use, which is contrasted with the current day’s word frequencies. Having detected the most prominent keywords of a day, we introduce several ways of grouping and displaying them in intuitive ways. A discussion about possible applications concludes.
Up to now, the service is available for Norwegian and German. As only some shallow language-specific processing is needed, it can easily be set up for other languages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wayne, C.: Multilingual Topic Detection and Tracking: Successful Research Enabled by Corpora and Evaluation. In: Proceedings of LREC, pp. 1487–1494 (2000)
McKeown, K.R., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J.L., Sable, C., Schiffman, B., Sigelman, S.: Tracking and Summarizing News on a Daily Basis with Columbia’s Newsblaster. In: Proc. of the Human Language Technology Conference (2002)
Radev, D., Otterbacher, J., Winkel, A., Blair-Goldenson, A.: NewsInEssence: Summarizing Online News Topics. Communications of the ACM 48(10), 95–98 (2005)
Bederson, B.B., Shneiderman, B., Wattenberg, M.: Ordered and Quantum Treemaps: Making Effective Use of 2D Space to Display Hierarchies. ACM Transactions on Graphics (TOG) 21(4), 833–854 (2002)
Tufte, E.: Beautiful Evidence (to appear, 2006), Draft at: http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR&topic_id=1
Thomas, J.J., Cook, K.A. (eds.): Illuminating the Path: The Research and Development Agenda for Visual Analytics. IEEE Press, Los Alamitos (2005)
Hofland, K.: A Self-Expanding Corpus Based on Newspapers on the Web. In: Proceedings of the Second International Language Resources and Evaluation Conference. ELRA, Paris (2000)
Johannessen, J.-B., Hagen, K., Nøklestad, A.: A Constraint-based tagger for Norwegian. In: 17th Scandinavian Conference of Linguistics, Odense Working Papers in Language and Communication 19, University of Southern Denmark, Odense, vol. 1, pp. 31–47 (2000)
Witschel, H.F., Biemann, C.: Rigorous dimensionality reduction through linguistically motivated feature selection for text categorisation. In: Proceedings of NODALIDA, Joensuu, Finland (2005)
Quasthoff, U., Biemann, C., Wolff, C.: Named Entity Learning and Verification: Expectation Maximisation in Large Corpora. In: Proceedings of CoNNL 2002, Taipei, Taiwan, pp. 8–14 (2002)
Richter, M.: Analysis and Visualization for Daily Newspaper Corpora. In: Proceedings of RANLP, pp. 424–428 (2005)
Faulstich, L., Quasthoff, U., Schmidt, F., Wolff, C.: Concept Extractor - Ein flexibler und domänen-spezifischer Web Service zur Beschlagwortung von Texten. In: Proceedings of ISI 2002, Regensburg (2002)
Ahmad, K., Gillam, L., Tostevin, L.: Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER). In: Proceedings of TREC-8. Washington: National Institute of Standards and Technology, pp. 717–724 (2000)
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1) (1993)
Davidson, R., Harel, D.: Drawing graphs nicely using simulated annealing. ACM Transactions on Graphics 15(4), 301–331 (1996)
Biemann, C., Bordag, S., Heyer, G., Quasthoff, U., Wolff, C.: Language-Independent Methods for Compiling Monolingual Lexical Data. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 215–228. Springer, Heidelberg (2004)
Berry, M.W.: Survey of Text Mining: Clustering, Classification and Retrieval. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Eiken, U.C., Liseth, A.T., Witschel, H.F., Richter, M., Biemann, C. (2006). Ord i Dag: Mining Norwegian Daily Newswire. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds) Advances in Natural Language Processing. FinTAL 2006. Lecture Notes in Computer Science(), vol 4139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816508_51
Download citation
DOI: https://doi.org/10.1007/11816508_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37334-6
Online ISBN: 978-3-540-37336-0
eBook Packages: Computer ScienceComputer Science (R0)