Ord i Dag: Mining Norwegian Daily Newswire

  • Unni Cathrine Eiken
  • Anja Therese Liseth
  • Hans Friedrich Witschel
  • Matthias Richter
  • Chris Biemann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4139)


We present Ord i Dag, a new service that displays today’s most important keywords. These are extracted fully automatically from Norwegian online newspapers. Describing the complete process, we provide an entirely disclosed method for media monitoring and news summarization. For keyword extraction, a reference corpus serves as background about average language use, which is contrasted with the current day’s word frequencies. Having detected the most prominent keywords of a day, we introduce several ways of grouping and displaying them in intuitive ways. A discussion about possible applications concludes.

Up to now, the service is available for Norwegian and German. As only some shallow language-specific processing is needed, it can easily be set up for other languages.


Reduction Rule Keyword Extraction Reference Corpus Association Graph News Topic 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wayne, C.: Multilingual Topic Detection and Tracking: Successful Research Enabled by Corpora and Evaluation. In: Proceedings of LREC, pp. 1487–1494 (2000)Google Scholar
  2. 2.
    McKeown, K.R., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J.L., Sable, C., Schiffman, B., Sigelman, S.: Tracking and Summarizing News on a Daily Basis with Columbia’s Newsblaster. In: Proc. of the Human Language Technology Conference (2002)Google Scholar
  3. 3.
    Radev, D., Otterbacher, J., Winkel, A., Blair-Goldenson, A.: NewsInEssence: Summarizing Online News Topics. Communications of the ACM 48(10), 95–98 (2005)CrossRefGoogle Scholar
  4. 4.
    Bederson, B.B., Shneiderman, B., Wattenberg, M.: Ordered and Quantum Treemaps: Making Effective Use of 2D Space to Display Hierarchies. ACM Transactions on Graphics (TOG) 21(4), 833–854 (2002)CrossRefGoogle Scholar
  5. 5.
    Tufte, E.: Beautiful Evidence (to appear, 2006), Draft at:
  6. 6.
    Thomas, J.J., Cook, K.A. (eds.): Illuminating the Path: The Research and Development Agenda for Visual Analytics. IEEE Press, Los Alamitos (2005)Google Scholar
  7. 7.
    Hofland, K.: A Self-Expanding Corpus Based on Newspapers on the Web. In: Proceedings of the Second International Language Resources and Evaluation Conference. ELRA, Paris (2000)Google Scholar
  8. 8.
    Johannessen, J.-B., Hagen, K., Nøklestad, A.: A Constraint-based tagger for Norwegian. In: 17th Scandinavian Conference of Linguistics, Odense Working Papers in Language and Communication 19, University of Southern Denmark, Odense, vol. 1, pp. 31–47 (2000)Google Scholar
  9. 9.
    Witschel, H.F., Biemann, C.: Rigorous dimensionality reduction through linguistically motivated feature selection for text categorisation. In: Proceedings of NODALIDA, Joensuu, Finland (2005)Google Scholar
  10. 10.
    Quasthoff, U., Biemann, C., Wolff, C.: Named Entity Learning and Verification: Expectation Maximisation in Large Corpora. In: Proceedings of CoNNL 2002, Taipei, Taiwan, pp. 8–14 (2002)Google Scholar
  11. 11.
    Richter, M.: Analysis and Visualization for Daily Newspaper Corpora. In: Proceedings of RANLP, pp. 424–428 (2005)Google Scholar
  12. 12.
    Faulstich, L., Quasthoff, U., Schmidt, F., Wolff, C.: Concept Extractor - Ein flexibler und domänen-spezifischer Web Service zur Beschlagwortung von Texten. In: Proceedings of ISI 2002, Regensburg (2002)Google Scholar
  13. 13.
    Ahmad, K., Gillam, L., Tostevin, L.: Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER). In: Proceedings of TREC-8. Washington: National Institute of Standards and Technology, pp. 717–724 (2000)Google Scholar
  14. 14.
    Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1) (1993)Google Scholar
  15. 15.
    Davidson, R., Harel, D.: Drawing graphs nicely using simulated annealing. ACM Transactions on Graphics 15(4), 301–331 (1996)CrossRefGoogle Scholar
  16. 16.
    Biemann, C., Bordag, S., Heyer, G., Quasthoff, U., Wolff, C.: Language-Independent Methods for Compiling Monolingual Lexical Data. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 215–228. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  17. 17.
    Berry, M.W.: Survey of Text Mining: Clustering, Classification and Retrieval. Springer, Heidelberg (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Unni Cathrine Eiken
    • 1
  • Anja Therese Liseth
    • 1
  • Hans Friedrich Witschel
    • 2
  • Matthias Richter
    • 2
  • Chris Biemann
    • 2
  1. 1.AKSISUniversity of BergenBergenNorway
  2. 2.NLP DepartmentUniversity of LeipzigLeipzigGermany

Personalised recommendations