Automatic Keyword Extraction for News Finder

Martínez-Fernández, José Luis; García-Serrano, Ana; Martínez, Paloma; Villena, Julio

doi:10.1007/978-3-540-25981-7_7

Automatic Keyword Extraction for News Finder

José Luis Martínez-Fernández¹⁷,
Ana García-Serrano¹⁸,
Paloma Martínez¹⁷ &
…
Julio Villena¹⁹

Conference paper

322 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3094))

Abstract

Newspapers are one of the most challenging domains for information retrieval systems: new articles appear everyday written in different languages, with multimedia contents and the news repositories may be updated in a matter of hours so information extraction is crucial to the metadata contents of the news. Further approaches of ”smart retrieval” have to cope with multimedia and multilingual features as well as have to obtain really good precision features in order to reach a high degree of user satisfaction with the retrieved documents. The paper focus is the description of the automatic keyword extraction (AKE) process for news characterization that uses several linguistic techniques to improve the current state of the text-based information retrieval. The first prototype implemented focusing in the AKE process (www.omnipaper.org) is described and some relevant performance features are included. Finally, some conclusions and comments are given regarding the role of the linguistic engineering in the web era.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

The Association of Computational Linguistic, http://www1.cs.columbia.edu/~acl/home.html ELRA (European Language Resources Association), http://www.icp.grenet.fr/ ELRA ELSNET (European Network in Language and Speech), http://www.elsnet.org The Spanish association for NLP (SEPLN), http://www.sepln.org/
The Porter Stemming Algorithm page maintained by Martin Porter www.tartarus.org/~martin/PorterStemmer/
Free Translation, www.freetranslation.com Ergane Translation Dictionaries, http://dictionaries.travlang.com Xerox, http://www.xrce.xerox.com/competencies/content-analysis/toolhome.en.html 9 A topic is a subject (e.g. finances, banking, business,...). In the most generic sense, a subject is any thing whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever (ISO/IEC 13250:1999(E) Topic Maps)
Cole, R., Mariani, J., Uskoreit, H., Zaenen, A., Zue, V. (eds.): Survey of the State of the Art in Human Language Technology. Studies in Natural Language Processing. Cambridge University Press, Cambridge (1997)
Google Scholar
Miller, G.A.: WordNet: A lexical database for English. Communications of the ACM 38(11), 39–41 (1995)
Article Google Scholar
WordNet, http://www.cogsci.princeton.edu/~wn/ Eurowordnet: building a multilingual database with wordnets for several European languages, http://www.illc.uva.nl/EuroWordNet/ The Global WordNet Association, http://www.globalwordnet.org/
García-Serrano, A., Martínez, P., Ruiz, A.: Knowledge Engineering contribution to the enlargement of the quality in web-browsing. IEEE International Workshop on Natural Language Processing and Knowledge Engineering (NLPKE) Tucson, Arizona (2001)
Google Scholar
Lopez-Ostenero, F., Gonzalo, J., Peñas, A., Verdejo, F.: Interactive Cross-Language Searching: phrases are better than terms for query formulation and refinement. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, Springer, Heidelberg (2003)
Chapter Google Scholar
Martínez y, P., García-Serrano, A.: The Role of Knowledge-based Technology in Language Applications Development. Expert Systems With Applications Journal 19, 31–44 (2000)
Article Google Scholar
Pazienza, M.T. (ed.): SCIE 2003. LNCS (LNAI), vol. 2700, pp. 92–128. Springer, Heidelberg (2003)
Book MATH Google Scholar
García-Serrano, A., Martínez, P., Rodrigo, L.: Adapting and extending lexical resources in a dialogue system. In: Proceedings 39th Annual Meeting and 10th Conference of the European Chapter of the Association for Computational Linguistics (Workshop on Human Language Technology and Knowledge Management), Morgan Kaufmann Publishers, Toulouse (2001)
Google Scholar
Jones y, K.S., Willet, P. (eds.): Readings in Information Retrieval. Morgan Kaufmann Publishers, Inc, San Francisco (1997)
Google Scholar
Lewis, D.: Representation and learning in information retrieval. Technical Report UM-CS- 1991-093. Department of Computer Science, Univ. of Massachusetts, Amherst, MA
Google Scholar
Cohen, J.: Highlights: Language and domain independent automatic indexing terms for abstracting. JASIS 46(3), 162–174 (1995)
Article Google Scholar
Chade-Meng, T., Yuan-Fang, W., Chan-Doo, L.: The use of bigrams to enhace text categorization. Information Processing and Management 38, 529–546 (2002)
Article MATH Google Scholar
Luhn, H.: A statistical approach to the mechanized encoding and searching of literary information. IBM Journal of Research and Development 1, 309–317 (1957)
Article MathSciNet Google Scholar
Sparck Jones, K.: Index term weighting. Informa. Storage and Retrieval 9, 619–633 (1973)
Article Google Scholar
Salton, G., Yang, C.: On the specification of terms values in automatic indexing. Journal of documentation 29, 351–372 (1973)
Article Google Scholar
Greengrass, E.: Information Retrieval: A survey (November 2000)
Google Scholar
Jones, S., Paynter, G.W.: Automatic Extraction of Document Keyphrases for Use in Digital Libraries: Evaluation and Applications. JASIS 53(8), 653–677 (2002)
Article Google Scholar
Baeza-Yates and Riveiro-Meto, Modern Information Retrieval, Addison Wesley (1999)
Google Scholar
Mendes Pereira T.S., Baptista A.A., The OmniPaper metadata rdf/xml prototype.In: Implementation ELPUB Conference 2003 (2003)
Google Scholar
DAEDALUS – Data, Decisions and Language, S.A., www.daedalus.es
Martínez, J.L., Villena Román, J., Fombella, J., García-Serrano, A., Ruiz, A., Martínez, P., Goñi, J.M., González, J.C.: Evaluation of MIRACLE approach results for CLEF, Working Notes of CLEF Workshop, Trodheim, Norway (August 2003)
Google Scholar
Clough, P., Sanderson, M.: The CLEF 2003 Cross Language Image Retrieval Task. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 581–593. Springer, Heidelberg (2004)
Chapter Google Scholar
Villena, J., Martínez, J.L., Fombella, J., García Serrano, A., Ruiz, A., Martínez, P., Goñi, J.M., González, J.C.: Evaluation of MIRACLE results for Image CLEF 2003. In: Working Notes for the CLEF 2003 Workshop, vol. 1, Trondheim, Norway (August 21-22, 2003)
Google Scholar
CLEF, http://clef.iei.pi.cnr.it:2002/
Hernández, J.Z., Garcia-Serrano, A.: Intelligent e-assistance: a multiparadigm approach, eChallenges. IOS Press, Amsterdam (2003)
Google Scholar
Semantic Web, http://www.w3.org/2001/sw/
Van Hemel S., Paepen B., Engelen J., Smart search in newspaper archives using topic maps. In: ELPUB Conference 2003 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Universidad Carlos III de Madrid, Avda. Universidad 30, 28911, Leganés, Madrid, Spain
José Luis Martínez-Fernández & Paloma Martínez
Computer Science Department, Technical University of Madrid, Campus de Montegancedo s/n, Boadilla del Monte, 28660, Spain
Ana García-Serrano
Department of Telematic Engineering, Universidad Carlos III de Madrid, Avda. Universidad 30, 28911, Leganés, Madrid, Spain
Julio Villena

Authors

José Luis Martínez-Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Ana García-Serrano
View author publications
You can also search for this author in PubMed Google Scholar
Paloma Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Julio Villena
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Knowledge and Language Engineering, Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, D-39106, Magdeburg, Germany
Andreas Nürnberger
Laboratoire d’Informatique de Paris 6,
Marcin Detyniecki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martínez-Fernández, J.L., García-Serrano, A., Martínez, P., Villena, J. (2004). Automatic Keyword Extraction for News Finder. In: Nürnberger, A., Detyniecki, M. (eds) Adaptive Multimedia Retrieval. AMR 2003. Lecture Notes in Computer Science, vol 3094. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25981-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-25981-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22163-0
Online ISBN: 978-3-540-25981-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics