Abstract
Newspapers are one of the most challenging domains for information retrieval systems: new articles appear everyday written in different languages, with multimedia contents and the news repositories may be updated in a matter of hours so information extraction is crucial to the metadata contents of the news. Further approaches of ”smart retrieval” have to cope with multimedia and multilingual features as well as have to obtain really good precision features in order to reach a high degree of user satisfaction with the retrieved documents. The paper focus is the description of the automatic keyword extraction (AKE) process for news characterization that uses several linguistic techniques to improve the current state of the text-based information retrieval. The first prototype implemented focusing in the AKE process (www.omnipaper.org) is described and some relevant performance features are included. Finally, some conclusions and comments are given regarding the role of the linguistic engineering in the web era.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
The Association of Computational Linguistic, http://www1.cs.columbia.edu/~acl/home.html ELRA (European Language Resources Association), http://www.icp.grenet.fr/ ELRA ELSNET (European Network in Language and Speech), http://www.elsnet.org The Spanish association for NLP (SEPLN), http://www.sepln.org/
The Porter Stemming Algorithm page maintained by Martin Porter www.tartarus.org/~martin/PorterStemmer/
Free Translation, www.freetranslation.com Ergane Translation Dictionaries, http://dictionaries.travlang.com Xerox, http://www.xrce.xerox.com/competencies/content-analysis/toolhome.en.html 9 A topic is a subject (e.g. finances, banking, business,...). In the most generic sense, a subject is any thing whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever (ISO/IEC 13250:1999(E) Topic Maps)
Cole, R., Mariani, J., Uskoreit, H., Zaenen, A., Zue, V. (eds.): Survey of the State of the Art in Human Language Technology. Studies in Natural Language Processing. Cambridge University Press, Cambridge (1997)
Miller, G.A.: WordNet: A lexical database for English. Communications of the ACM 38(11), 39–41 (1995)
WordNet, http://www.cogsci.princeton.edu/~wn/ Eurowordnet: building a multilingual database with wordnets for several European languages, http://www.illc.uva.nl/EuroWordNet/ The Global WordNet Association, http://www.globalwordnet.org/
García-Serrano, A., Martínez, P., Ruiz, A.: Knowledge Engineering contribution to the enlargement of the quality in web-browsing. IEEE International Workshop on Natural Language Processing and Knowledge Engineering (NLPKE) Tucson, Arizona (2001)
Lopez-Ostenero, F., Gonzalo, J., Peñas, A., Verdejo, F.: Interactive Cross-Language Searching: phrases are better than terms for query formulation and refinement. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, Springer, Heidelberg (2003)
Martínez y, P., García-Serrano, A.: The Role of Knowledge-based Technology in Language Applications Development. Expert Systems With Applications Journal 19, 31–44 (2000)
Pazienza, M.T. (ed.): SCIE 2003. LNCS (LNAI), vol. 2700, pp. 92–128. Springer, Heidelberg (2003)
García-Serrano, A., Martínez, P., Rodrigo, L.: Adapting and extending lexical resources in a dialogue system. In: Proceedings 39th Annual Meeting and 10th Conference of the European Chapter of the Association for Computational Linguistics (Workshop on Human Language Technology and Knowledge Management), Morgan Kaufmann Publishers, Toulouse (2001)
Jones y, K.S., Willet, P. (eds.): Readings in Information Retrieval. Morgan Kaufmann Publishers, Inc, San Francisco (1997)
Lewis, D.: Representation and learning in information retrieval. Technical Report UM-CS- 1991-093. Department of Computer Science, Univ. of Massachusetts, Amherst, MA
Cohen, J.: Highlights: Language and domain independent automatic indexing terms for abstracting. JASIS 46(3), 162–174 (1995)
Chade-Meng, T., Yuan-Fang, W., Chan-Doo, L.: The use of bigrams to enhace text categorization. Information Processing and Management 38, 529–546 (2002)
Luhn, H.: A statistical approach to the mechanized encoding and searching of literary information. IBM Journal of Research and Development 1, 309–317 (1957)
Sparck Jones, K.: Index term weighting. Informa. Storage and Retrieval 9, 619–633 (1973)
Salton, G., Yang, C.: On the specification of terms values in automatic indexing. Journal of documentation 29, 351–372 (1973)
Greengrass, E.: Information Retrieval: A survey (November 2000)
Jones, S., Paynter, G.W.: Automatic Extraction of Document Keyphrases for Use in Digital Libraries: Evaluation and Applications. JASIS 53(8), 653–677 (2002)
Baeza-Yates and Riveiro-Meto, Modern Information Retrieval, Addison Wesley (1999)
Mendes Pereira T.S., Baptista A.A., The OmniPaper metadata rdf/xml prototype.In: Implementation ELPUB Conference 2003 (2003)
DAEDALUS – Data, Decisions and Language, S.A., www.daedalus.es
Martínez, J.L., Villena Román, J., Fombella, J., García-Serrano, A., Ruiz, A., Martínez, P., Goñi, J.M., González, J.C.: Evaluation of MIRACLE approach results for CLEF, Working Notes of CLEF Workshop, Trodheim, Norway (August 2003)
Clough, P., Sanderson, M.: The CLEF 2003 Cross Language Image Retrieval Task. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 581–593. Springer, Heidelberg (2004)
Villena, J., Martínez, J.L., Fombella, J., García Serrano, A., Ruiz, A., Martínez, P., Goñi, J.M., González, J.C.: Evaluation of MIRACLE results for Image CLEF 2003. In: Working Notes for the CLEF 2003 Workshop, vol. 1, Trondheim, Norway (August 21-22, 2003)
Hernández, J.Z., Garcia-Serrano, A.: Intelligent e-assistance: a multiparadigm approach, eChallenges. IOS Press, Amsterdam (2003)
Semantic Web, http://www.w3.org/2001/sw/
Van Hemel S., Paepen B., Engelen J., Smart search in newspaper archives using topic maps. In: ELPUB Conference 2003 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Martínez-Fernández, J.L., García-Serrano, A., Martínez, P., Villena, J. (2004). Automatic Keyword Extraction for News Finder. In: Nürnberger, A., Detyniecki, M. (eds) Adaptive Multimedia Retrieval. AMR 2003. Lecture Notes in Computer Science, vol 3094. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25981-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-25981-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22163-0
Online ISBN: 978-3-540-25981-7
eBook Packages: Springer Book Archive