Advertisement

BUAP-UPV TPIRS: A System for Document Indexing Reduction at WebCLEF

  • David Pinto
  • Héctor Jiménez-Salazar
  • Paolo Rosso
  • Emilio Sanchis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4022)

Abstract

In this paper we present the results of BUAP/UPV universities in WebCLEF, a particular task of CLEF 2005. Particularly, we evaluate our information retrieval system at the bilingual “English to Spanish” task. Our system uses a term reduction process based on the Transition Point technique. Our results show that it is possible to reduce the number of terms to index, thereby improving the performance of our system. We evaluate different percentages of reduction over a subset of EuroGOV, in order to determine the best one. We observed that after reducing the 82.55% of the corpus, a Mean Reciprocal Rank of 0.0844 was obtained, compared with 0.0465 of such evaluation with full documents.

Keywords

Information Retrieval Transition Point Vector Space Model Information Retrieval System Boolean Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Artiles, J., Peinado, V., Peñas, A., Verdejo, F.: UNED at WebCLEF 2005, Extended abstract in Working notes of CLEF 2005, Viena (2005)Google Scholar
  2. 2.
    Booth, A.: A Law of Ocurrences for Words of Low Frequency. Information and control (1967)Google Scholar
  3. 3.
    Bueno, C., Pinto, D., Jimenez, H.: El párrafo virtual en la generación de extractos. Research on Computing Science Journal (2005) ISSN 1665-9899Google Scholar
  4. 4.
    Cabrera, R., Pinto, D., Jimenez, H., Vilariño, D.: Una nueva ponderación para el modelo de espacio vectorial de recuperación de información. Research on Computing Science Journal (2005) ISSN 1665-9899Google Scholar
  5. 5.
    CLEF 2005: Cross-Language Evaluation Forum (2005), http://www.clef-campaign.org/
  6. 6.
    Croft, W.B.: Language Modeling for Information Retrieval. Lafferty, John (eds.). The Information Retrieval Series, vol. 13 (2003)Google Scholar
  7. 7.
    Jimenez, H., Pinto, D., Rosso, P.: Selección de Términos No Supervisada para Agrupamiento de Resúmenes. In: Proceedings of Workshop on Human Language, ENC 2005 (2005)Google Scholar
  8. 8.
    Martínez, T., Noguera, E., Muñoz, R., Llopis, F.: Web Track for CLEF2005 at ALICANTE UNIVERSTITY, Extended abstract in Working notes of CLEF 2005, Viena (2005)Google Scholar
  9. 9.
    Moyotl, E., Jimenez, H.: An Analysis on Frequency of Terms for Text Categorization. In: Proceedings of XX Conference of Spanish Natural Language Processing Society (SEPLN 2004) (2004)Google Scholar
  10. 10.
    Pinto, D., Pérez, F.: Una Técnica para la Identificación de Términos Multipalabra. In: Proceedings of 2nd. National Conference on Computer Science, Mexico (2004)Google Scholar
  11. 11.
    Pinto, D., Jiménez-Salazar, H., Rosso, P., Sanchis, E.: TPIRS: A System for Document Indexing Reduction on WebCLEF, Extended abstract in Working notes of CLEF 2005, Viena (2005)Google Scholar
  12. 12.
    Reyes-Aguirre, B., Moyotl-Hernández, E., Jiménez-Salazar, H.: Reducción de Términos Indice Usando el Punto de Transición. In: Proceedings of Facultad de Ciencias de Computación XX Anniversary Conferences, BUAP (2003)Google Scholar
  13. 13.
    Sigurbjörnsson, B., Kamps, J., de Rijke, M.: EuroGOV: Engineering a Multilingual Web Corpus. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 825–836. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Sigurbjörnsson, B., Kamps, J., de Rijke, M.: WebCLEF 2005: Cross-Lingual Web Retrieval. In: Proceedings of CLEF 2005 (2005)Google Scholar
  15. 15.
    TextCat: Language identification tool (2005), http://odur.let.rug.nl/~vannord/TextCat/
  16. 16.
    Tovar, M., Carrillo, M., Pinto, D., Jimenez, H.: Combining Keyword Identification Techniques. Research on Computing Science Journal (2005) ISSN 1665-9899Google Scholar
  17. 17.
    Urbizagástegui, R.: Las posibilidades de la Ley de Zipf en la indización automática, Research report of the California Riverside University (1999)Google Scholar
  18. 18.
    Zipf, G.K.: Human Behavior and the Principle of Least-Effort. Addison-Wesley, Cambridge (1949)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • David Pinto
    • 1
    • 2
  • Héctor Jiménez-Salazar
    • 2
  • Paolo Rosso
    • 1
  • Emilio Sanchis
    • 1
  1. 1.Department of Information Systems and ComputationUPVValenciaSpain
  2. 2.Faculty of Computer Science, BUAPCiudad UniversitariaPueblaMexico

Personalised recommendations