Skip to main content

Ord i Dag: Mining Norwegian Daily Newswire

  • Conference paper
Advances in Natural Language Processing (FinTAL 2006)

Abstract

We present Ord i Dag, a new service that displays today’s most important keywords. These are extracted fully automatically from Norwegian online newspapers. Describing the complete process, we provide an entirely disclosed method for media monitoring and news summarization. For keyword extraction, a reference corpus serves as background about average language use, which is contrasted with the current day’s word frequencies. Having detected the most prominent keywords of a day, we introduce several ways of grouping and displaying them in intuitive ways. A discussion about possible applications concludes.

Up to now, the service is available for Norwegian and German. As only some shallow language-specific processing is needed, it can easily be set up for other languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wayne, C.: Multilingual Topic Detection and Tracking: Successful Research Enabled by Corpora and Evaluation. In: Proceedings of LREC, pp. 1487–1494 (2000)

    Google Scholar 

  2. McKeown, K.R., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J.L., Sable, C., Schiffman, B., Sigelman, S.: Tracking and Summarizing News on a Daily Basis with Columbia’s Newsblaster. In: Proc. of the Human Language Technology Conference (2002)

    Google Scholar 

  3. Radev, D., Otterbacher, J., Winkel, A., Blair-Goldenson, A.: NewsInEssence: Summarizing Online News Topics. Communications of the ACM 48(10), 95–98 (2005)

    Article  Google Scholar 

  4. Bederson, B.B., Shneiderman, B., Wattenberg, M.: Ordered and Quantum Treemaps: Making Effective Use of 2D Space to Display Hierarchies. ACM Transactions on Graphics (TOG) 21(4), 833–854 (2002)

    Article  Google Scholar 

  5. Tufte, E.: Beautiful Evidence (to appear, 2006), Draft at: http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR&topic_id=1

  6. Thomas, J.J., Cook, K.A. (eds.): Illuminating the Path: The Research and Development Agenda for Visual Analytics. IEEE Press, Los Alamitos (2005)

    Google Scholar 

  7. Hofland, K.: A Self-Expanding Corpus Based on Newspapers on the Web. In: Proceedings of the Second International Language Resources and Evaluation Conference. ELRA, Paris (2000)

    Google Scholar 

  8. Johannessen, J.-B., Hagen, K., Nøklestad, A.: A Constraint-based tagger for Norwegian. In: 17th Scandinavian Conference of Linguistics, Odense Working Papers in Language and Communication 19, University of Southern Denmark, Odense, vol. 1, pp. 31–47 (2000)

    Google Scholar 

  9. Witschel, H.F., Biemann, C.: Rigorous dimensionality reduction through linguistically motivated feature selection for text categorisation. In: Proceedings of NODALIDA, Joensuu, Finland (2005)

    Google Scholar 

  10. Quasthoff, U., Biemann, C., Wolff, C.: Named Entity Learning and Verification: Expectation Maximisation in Large Corpora. In: Proceedings of CoNNL 2002, Taipei, Taiwan, pp. 8–14 (2002)

    Google Scholar 

  11. Richter, M.: Analysis and Visualization for Daily Newspaper Corpora. In: Proceedings of RANLP, pp. 424–428 (2005)

    Google Scholar 

  12. Faulstich, L., Quasthoff, U., Schmidt, F., Wolff, C.: Concept Extractor - Ein flexibler und domänen-spezifischer Web Service zur Beschlagwortung von Texten. In: Proceedings of ISI 2002, Regensburg (2002)

    Google Scholar 

  13. Ahmad, K., Gillam, L., Tostevin, L.: Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER). In: Proceedings of TREC-8. Washington: National Institute of Standards and Technology, pp. 717–724 (2000)

    Google Scholar 

  14. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1) (1993)

    Google Scholar 

  15. Davidson, R., Harel, D.: Drawing graphs nicely using simulated annealing. ACM Transactions on Graphics 15(4), 301–331 (1996)

    Article  Google Scholar 

  16. Biemann, C., Bordag, S., Heyer, G., Quasthoff, U., Wolff, C.: Language-Independent Methods for Compiling Monolingual Lexical Data. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 215–228. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  17. Berry, M.W.: Survey of Text Mining: Clustering, Classification and Retrieval. Springer, Heidelberg (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Eiken, U.C., Liseth, A.T., Witschel, H.F., Richter, M., Biemann, C. (2006). Ord i Dag: Mining Norwegian Daily Newswire. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds) Advances in Natural Language Processing. FinTAL 2006. Lecture Notes in Computer Science(), vol 4139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816508_51

Download citation

  • DOI: https://doi.org/10.1007/11816508_51

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37334-6

  • Online ISBN: 978-3-540-37336-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics