Ord i Dag: Mining Norwegian Daily Newswire

Eiken, Unni Cathrine; Liseth, Anja Therese; Witschel, Hans Friedrich; Richter, Matthias; Biemann, Chris

doi:10.1007/11816508_51

Unni Cathrine Eiken²¹,
Anja Therese Liseth²¹,
Hans Friedrich Witschel²²,
Matthias Richter²² &
…
Chris Biemann²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4139))

Included in the following conference series:

International Conference on Natural Language Processing (in Finland)

1580 Accesses
5 Citations

Abstract

We present Ord i Dag, a new service that displays today’s most important keywords. These are extracted fully automatically from Norwegian online newspapers. Describing the complete process, we provide an entirely disclosed method for media monitoring and news summarization. For keyword extraction, a reference corpus serves as background about average language use, which is contrasted with the current day’s word frequencies. Having detected the most prominent keywords of a day, we introduce several ways of grouping and displaying them in intuitive ways. A discussion about possible applications concludes.

Up to now, the service is available for Norwegian and German. As only some shallow language-specific processing is needed, it can easily be set up for other languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wayne, C.: Multilingual Topic Detection and Tracking: Successful Research Enabled by Corpora and Evaluation. In: Proceedings of LREC, pp. 1487–1494 (2000)
Google Scholar
McKeown, K.R., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J.L., Sable, C., Schiffman, B., Sigelman, S.: Tracking and Summarizing News on a Daily Basis with Columbia’s Newsblaster. In: Proc. of the Human Language Technology Conference (2002)
Google Scholar
Radev, D., Otterbacher, J., Winkel, A., Blair-Goldenson, A.: NewsInEssence: Summarizing Online News Topics. Communications of the ACM 48(10), 95–98 (2005)
Article Google Scholar
Bederson, B.B., Shneiderman, B., Wattenberg, M.: Ordered and Quantum Treemaps: Making Effective Use of 2D Space to Display Hierarchies. ACM Transactions on Graphics (TOG) 21(4), 833–854 (2002)
Article Google Scholar
Tufte, E.: Beautiful Evidence (to appear, 2006), Draft at: http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR&topic_id=1
Thomas, J.J., Cook, K.A. (eds.): Illuminating the Path: The Research and Development Agenda for Visual Analytics. IEEE Press, Los Alamitos (2005)
Google Scholar
Hofland, K.: A Self-Expanding Corpus Based on Newspapers on the Web. In: Proceedings of the Second International Language Resources and Evaluation Conference. ELRA, Paris (2000)
Google Scholar
Johannessen, J.-B., Hagen, K., Nøklestad, A.: A Constraint-based tagger for Norwegian. In: 17th Scandinavian Conference of Linguistics, Odense Working Papers in Language and Communication 19, University of Southern Denmark, Odense, vol. 1, pp. 31–47 (2000)
Google Scholar
Witschel, H.F., Biemann, C.: Rigorous dimensionality reduction through linguistically motivated feature selection for text categorisation. In: Proceedings of NODALIDA, Joensuu, Finland (2005)
Google Scholar
Quasthoff, U., Biemann, C., Wolff, C.: Named Entity Learning and Verification: Expectation Maximisation in Large Corpora. In: Proceedings of CoNNL 2002, Taipei, Taiwan, pp. 8–14 (2002)
Google Scholar
Richter, M.: Analysis and Visualization for Daily Newspaper Corpora. In: Proceedings of RANLP, pp. 424–428 (2005)
Google Scholar
Faulstich, L., Quasthoff, U., Schmidt, F., Wolff, C.: Concept Extractor - Ein flexibler und domänen-spezifischer Web Service zur Beschlagwortung von Texten. In: Proceedings of ISI 2002, Regensburg (2002)
Google Scholar
Ahmad, K., Gillam, L., Tostevin, L.: Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER). In: Proceedings of TREC-8. Washington: National Institute of Standards and Technology, pp. 717–724 (2000)
Google Scholar
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1) (1993)
Google Scholar
Davidson, R., Harel, D.: Drawing graphs nicely using simulated annealing. ACM Transactions on Graphics 15(4), 301–331 (1996)
Article Google Scholar
Biemann, C., Bordag, S., Heyer, G., Quasthoff, U., Wolff, C.: Language-Independent Methods for Compiling Monolingual Lexical Data. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 215–228. Springer, Heidelberg (2004)
Chapter Google Scholar
Berry, M.W.: Survey of Text Mining: Clustering, Classification and Retrieval. Springer, Heidelberg (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

AKSIS, University of Bergen, Allégaten 27, 5007, Bergen, Norway
Unni Cathrine Eiken & Anja Therese Liseth
NLP Department, University of Leipzig, Augustusplatz 10/11, 04109, Leipzig, Germany
Hans Friedrich Witschel, Matthias Richter & Chris Biemann

Authors

Unni Cathrine Eiken
View author publications
You can also search for this author in PubMed Google Scholar
Anja Therese Liseth
View author publications
You can also search for this author in PubMed Google Scholar
Hans Friedrich Witschel
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Richter
View author publications
You can also search for this author in PubMed Google Scholar
Chris Biemann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Turku Centre for Computer Science (TUCS), Department of Information Technology, University of Turku, Joukahaisenkatu 3-5 B, FIN-20520, Turku, Finland
Tapio Salakoski
Turku Centre for Computer Science (TUCS) and Department of IT, University of Turku, Lemminkäisenkatu 14 A, 20520, Turku, Finland
Filip Ginter & Sampo Pyysalo &
Department of Information Technology, University of Turku, Lemminkäisenkatu 14–18 A, FIN-20520, Turku, Finland
Tapio Pahikkala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Eiken, U.C., Liseth, A.T., Witschel, H.F., Richter, M., Biemann, C. (2006). Ord i Dag: Mining Norwegian Daily Newswire. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds) Advances in Natural Language Processing. FinTAL 2006. Lecture Notes in Computer Science(), vol 4139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816508_51

Download citation

DOI: https://doi.org/10.1007/11816508_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37334-6
Online ISBN: 978-3-540-37336-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics