Skip to main content

Automatic Keyword Extraction for News Finder

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3094))

Abstract

Newspapers are one of the most challenging domains for information retrieval systems: new articles appear everyday written in different languages, with multimedia contents and the news repositories may be updated in a matter of hours so information extraction is crucial to the metadata contents of the news. Further approaches of ”smart retrieval” have to cope with multimedia and multilingual features as well as have to obtain really good precision features in order to reach a high degree of user satisfaction with the retrieved documents. The paper focus is the description of the automatic keyword extraction (AKE) process for news characterization that uses several linguistic techniques to improve the current state of the text-based information retrieval. The first prototype implemented focusing in the AKE process (www.omnipaper.org) is described and some relevant performance features are included. Finally, some conclusions and comments are given regarding the role of the linguistic engineering in the web era.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The Association of Computational Linguistic, http://www1.cs.columbia.edu/~acl/home.html ELRA (European Language Resources Association), http://www.icp.grenet.fr/ ELRA ELSNET (European Network in Language and Speech), http://www.elsnet.org The Spanish association for NLP (SEPLN), http://www.sepln.org/

  2. The Porter Stemming Algorithm page maintained by Martin Porter www.tartarus.org/~martin/PorterStemmer/

  3. Free Translation, www.freetranslation.com Ergane Translation Dictionaries, http://dictionaries.travlang.com Xerox, http://www.xrce.xerox.com/competencies/content-analysis/toolhome.en.html 9 A topic is a subject (e.g. finances, banking, business,...). In the most generic sense, a subject is any thing whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever (ISO/IEC 13250:1999(E) Topic Maps)

  4. Cole, R., Mariani, J., Uskoreit, H., Zaenen, A., Zue, V. (eds.): Survey of the State of the Art in Human Language Technology. Studies in Natural Language Processing. Cambridge University Press, Cambridge (1997)

    Google Scholar 

  5. Miller, G.A.: WordNet: A lexical database for English. Communications of the ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  6. WordNet, http://www.cogsci.princeton.edu/~wn/ Eurowordnet: building a multilingual database with wordnets for several European languages, http://www.illc.uva.nl/EuroWordNet/ The Global WordNet Association, http://www.globalwordnet.org/

  7. García-Serrano, A., Martínez, P., Ruiz, A.: Knowledge Engineering contribution to the enlargement of the quality in web-browsing. IEEE International Workshop on Natural Language Processing and Knowledge Engineering (NLPKE) Tucson, Arizona (2001)

    Google Scholar 

  8. Lopez-Ostenero, F., Gonzalo, J., Peñas, A., Verdejo, F.: Interactive Cross-Language Searching: phrases are better than terms for query formulation and refinement. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  9. Martínez y, P., García-Serrano, A.: The Role of Knowledge-based Technology in Language Applications Development. Expert Systems With Applications Journal 19, 31–44 (2000)

    Article  Google Scholar 

  10. Pazienza, M.T. (ed.): SCIE 2003. LNCS (LNAI), vol. 2700, pp. 92–128. Springer, Heidelberg (2003)

    Book  MATH  Google Scholar 

  11. García-Serrano, A., Martínez, P., Rodrigo, L.: Adapting and extending lexical resources in a dialogue system. In: Proceedings 39th Annual Meeting and 10th Conference of the European Chapter of the Association for Computational Linguistics (Workshop on Human Language Technology and Knowledge Management), Morgan Kaufmann Publishers, Toulouse (2001)

    Google Scholar 

  12. Jones y, K.S., Willet, P. (eds.): Readings in Information Retrieval. Morgan Kaufmann Publishers, Inc, San Francisco (1997)

    Google Scholar 

  13. Lewis, D.: Representation and learning in information retrieval. Technical Report UM-CS- 1991-093. Department of Computer Science, Univ. of Massachusetts, Amherst, MA

    Google Scholar 

  14. Cohen, J.: Highlights: Language and domain independent automatic indexing terms for abstracting. JASIS 46(3), 162–174 (1995)

    Article  Google Scholar 

  15. Chade-Meng, T., Yuan-Fang, W., Chan-Doo, L.: The use of bigrams to enhace text categorization. Information Processing and Management 38, 529–546 (2002)

    Article  MATH  Google Scholar 

  16. Luhn, H.: A statistical approach to the mechanized encoding and searching of literary information. IBM Journal of Research and Development 1, 309–317 (1957)

    Article  MathSciNet  Google Scholar 

  17. Sparck Jones, K.: Index term weighting. Informa. Storage and Retrieval 9, 619–633 (1973)

    Article  Google Scholar 

  18. Salton, G., Yang, C.: On the specification of terms values in automatic indexing. Journal of documentation 29, 351–372 (1973)

    Article  Google Scholar 

  19. Greengrass, E.: Information Retrieval: A survey (November 2000)

    Google Scholar 

  20. Jones, S., Paynter, G.W.: Automatic Extraction of Document Keyphrases for Use in Digital Libraries: Evaluation and Applications. JASIS 53(8), 653–677 (2002)

    Article  Google Scholar 

  21. Baeza-Yates and Riveiro-Meto, Modern Information Retrieval, Addison Wesley (1999)

    Google Scholar 

  22. Mendes Pereira T.S., Baptista A.A., The OmniPaper metadata rdf/xml prototype.In: Implementation ELPUB Conference 2003 (2003)

    Google Scholar 

  23. DAEDALUS – Data, Decisions and Language, S.A., www.daedalus.es

  24. Martínez, J.L., Villena Román, J., Fombella, J., García-Serrano, A., Ruiz, A., Martínez, P., Goñi, J.M., González, J.C.: Evaluation of MIRACLE approach results for CLEF, Working Notes of CLEF Workshop, Trodheim, Norway (August 2003)

    Google Scholar 

  25. Clough, P., Sanderson, M.: The CLEF 2003 Cross Language Image Retrieval Task. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 581–593. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  26. Villena, J., Martínez, J.L., Fombella, J., García Serrano, A., Ruiz, A., Martínez, P., Goñi, J.M., González, J.C.: Evaluation of MIRACLE results for Image CLEF 2003. In: Working Notes for the CLEF 2003 Workshop, vol. 1, Trondheim, Norway (August 21-22, 2003)

    Google Scholar 

  27. CLEF, http://clef.iei.pi.cnr.it:2002/

  28. Hernández, J.Z., Garcia-Serrano, A.: Intelligent e-assistance: a multiparadigm approach, eChallenges. IOS Press, Amsterdam (2003)

    Google Scholar 

  29. Semantic Web, http://www.w3.org/2001/sw/

  30. Van Hemel S., Paepen B., Engelen J., Smart search in newspaper archives using topic maps. In: ELPUB Conference 2003 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Martínez-Fernández, J.L., García-Serrano, A., Martínez, P., Villena, J. (2004). Automatic Keyword Extraction for News Finder. In: Nürnberger, A., Detyniecki, M. (eds) Adaptive Multimedia Retrieval. AMR 2003. Lecture Notes in Computer Science, vol 3094. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25981-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-25981-7_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22163-0

  • Online ISBN: 978-3-540-25981-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics