Language Resources and Evaluation

, Volume 44, Issue 1–2, pp 159–180

An efficient any language approach for the integration of phrases in document retrieval

Article
  • 94 Downloads

Abstract

In this paper, we address the problem of the exploitation of text phrases in a multilingual context. We propose a technique to benefit from multi-word units in adhoc document retrieval, whatever the language of the document collection. We present principles to optimize the performance improvement obtained through this approach. The work is validated through retrieval experiments conducted on Chinese, Japanese, Korean and English.

Keywords

Multiword expressions Document retrieval Endogenous resources 

References

  1. Ahonen-Myka, H. (1999). Finding all frequent maximal sequences in text. In D. Mladenic & M. Grobelnik (Eds.), Proceedings of the 16th international conference on machine learning ICML-99 workshop on machine learning in text data analysis, Ljubljana, Slovenia, pp. 11–17.Google Scholar
  2. Ahonen-Myka, H., & Doucet, A. (2005). Data mining meets collocations discovery. In Inquiries into words, constraints and contexts, pp. 194–203.Google Scholar
  3. Doucet, A., & Ahonen-Myka, H. (2004). Non-contiguous word sequences for information retrieval. In Proceedings of ACL-2004, workshop on multiword expressions: Integrating processing. Barcelona, Spain, pp. 88–95.Google Scholar
  4. Doucet, A., & Ahonen-Myka, H. (2006). Fast extraction of discontiguous sequences in text: A new approach based on maximal frequent sequences. In Proceedings of IS-LTC 2006, information society—language technologies conference. Ljubljana, Slovenia, pp. 186–191.Google Scholar
  5. Fagan, J. L. (1989). The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval. Journal of the American Society for Information Science, 40, 115–132.CrossRefGoogle Scholar
  6. Jones, S., & Sinclair, J. M. H. (1974). English lexical collocations: A study in computational linguistics. Cahiers de Lexicologie, 24, 15–61.Google Scholar
  7. Lee, J. H. (1995). Combining multiple evidence from different properties of weighting schemes. In Proceedings of the 18th annual international ACM SIGIR conference on research and development in information retrieval, pp. 180–188.Google Scholar
  8. Lewis, D. D. (1992) Representation and learning in information retrieval. Ph.D. thesis, University of Massachusetts at Amherst.Google Scholar
  9. Mitra, M., Buckley, C., Singhal, A., & Cardie, C. (1997). An analysis of statistical and syntactic phrases. In Proceedings of RIAO97, computer-assisted information searching on the internet, pp. 200–214.Google Scholar
  10. Robertson, S. E., Zaragoza, H., & Taylor, M. (2003). Microsoft Cambridge at TREC-12: HARD track. In: TREC. pp. 418–425.Google Scholar
  11. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management: An International Journal, 24(5), 513–523.CrossRefGoogle Scholar
  12. Salton, G., Yang, C., & Yu, C. T. (1975). A theory of term importance in automatic text analysis. Journal of the American Society for Information Science, 26(1), 33–44.CrossRefGoogle Scholar
  13. Strzalkowski, T., & Carballo, J. P. (1996). Natural language information retrieval: TREC-4 report. In Text REtrieval Conference, pp. 245–258.Google Scholar
  14. Turpin, A., & Moffat, A. (1999). Statistical phrases for vector-space information retrieval. In Proceedings of the 22nd ACM SIGIR conference on research and development in information retrieval, pp. 309–310.Google Scholar
  15. Vechtomova, O. (2005). The role of multi-word units in interactive information retrieval. In Proceedings of the 27th ECIR, Spain, pp. 403–420.Google Scholar
  16. Williams, H. E., Zobel, J., & Bahle, D. (2004). Fast phrase querying with combined indexes. ACM Transactions on Information Systems, 22(4), 573–594.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of CaenCaenFrance
  2. 2.Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland

Personalised recommendations