Expressiveness and Performance of Full-Text Search Languages

  • Chavdar Botev
  • Sihem Amer-Yahia
  • Jayavel Shanmugasundaram
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3896)

Abstract

We study the expressiveness and performance of full-text search languages. Our motivation is to provide a formal basis for comparing full-text search languages and to develop a model for full-text search that can be tightly integrated with structured search. We design a model based on the positions of tokens (words) in the input text, and develop a full-text calculus (FTC) and a full-text algebra (FTA) with equivalent expressive power; this suggests a notion of completeness for full-text search languages. We show that existing full-text languages are incomplete and identify a practical subset of the FTC and FTA that is more powerful than existing languages, but which can still be evaluated efficiently.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baeza-Yates, R., Ribiero-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
  2. 2.
    Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword Searching and Browsing in Databases using BANKS. ICDE (2002)Google Scholar
  3. 3.
    Botev, C., Amer-Yahia, S., Shanmugasundaram, J.: ”On the Completeness of Full-Text Search Languages”. Technical Report, Cornell University (2005), http://www.cs.cornell.edu/database/TeXQuery/Expressiveness.pdf
  4. 4.
    Bremer, J.M., Gertz, M.: XQuery/IR: Integrating XML Document and Data Retrieval. In: WebDB (2002)Google Scholar
  5. 5.
    Brown, E.W.: Fast Evaluation of Structured Queries for Information Retrieval. SIGIR (1995)Google Scholar
  6. 6.
    Chinenyanga, T.T., Kushmerick, N.: Expressive and Efficient Ranked Querying of XML Data. WebDB (2001)Google Scholar
  7. 7.
    Clarke, C., Cormack, G., Burkowski, F.: An Algebra for Structured Text Search and a Framework for its Implementation. Comput. J. 38(1), 43–56 (1995)Google Scholar
  8. 8.
    Codd, E.F.: Relational Completeness of Database Sublanguages. In: Rustin, R. (ed.) Database Systems (1972)Google Scholar
  9. 9.
    Cohen, S., et al.: XSEarch: A Semantic Search Engine for XML. In: VLDB (2003)Google Scholar
  10. 10.
    Consens, M.P., Milo, T.: Algebras for Querying Text Regions: Expressive Power and Optimization. J. Comput. Syst. Sci. 57(3), 272–288 (1998)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Fagin, R., Lotem, A., Naor, M.: Optimal Aggregation Algorithms for Middleware. J. of Comp. and Syst. Sciences 66 (2003)Google Scholar
  12. 12.
    Florescu, D., Kossmann, D., Manolescu, I.: Integrating Keyword Search into XML Query Processing. WWW (2000)Google Scholar
  13. 13.
    Fuhr, N., Grossjohann, K.: XIRQL: An Extension of XQL for Information Retrieval. SIGIR (2000)Google Scholar
  14. 14.
    Fuhr, N., Rölleke, T.: A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems. ACM TOIS 15(1) (1997)Google Scholar
  15. 15.
    Hayashi, Y., Tomita, J., Kikui, G.: Searching Text-rich XML Documents with Relevance Ranking. In: SIGIR Workshop on XML and Information Retrieval (2000)Google Scholar
  16. 16.
    Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-Style Keyword Search over Relational Databases. In: VLDB (2003)Google Scholar
  17. 17.
    Jaakkola, J., Kilpelinen, P.: Nested Text-Region Algebra Report C-1999-2, Dept. of Computer Science, University of Helsinki (January 1999)Google Scholar
  18. 18.
    Melton, J., Eisenberg, A.: SQL Multimedia and Application Packages (SQL/MM). SIGMOD Record 30(4) (2001)Google Scholar
  19. 19.
    Myaeng, S.-H., Jang, D.-H., Kim, M.-S., Zhoo, Z.-C.: A FlexibleModel for Retrieval of SGML Documents. In: SIGIR (1998)Google Scholar
  20. 20.
    Navarro, G., Baeza-Yates, R.: Proximal Nodes: a Model to Query Document Databases by Content and Structure. ACM Trans. Inf. Syst. 15(4) (1997)Google Scholar
  21. 21.
    Salminen, A.: A Relational Model for Unstructured Documents. In: SIGIR 1987 (1987)Google Scholar
  22. 22.
    Salminen, A., Tompa, F.: PAT Expressions: an Algebra for Text Search. Acta Linguistica Hungar 41(1-4) (1992)Google Scholar
  23. 23.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983); Expressiveness and Performance of Full-Text Search Languages 367MATHGoogle Scholar
  24. 24.
    Salton, G.: Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)Google Scholar
  25. 25.
    Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance ranking. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 477. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  26. 26.
    Vardi, M.: The Complexity of Relational Query Languages. STOC (1982)Google Scholar
  27. 27.
    Witten, I., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishing, San Francisco (1999)Google Scholar
  28. 28.
    Young-Lai, M., Tompa, F.: One-pass Evaluation of Region Algebra Expressions. Inf. Syst. 28(3) (2003)Google Scholar
  29. 29.
    Zhang, C., Naughton, J., DeWitt, D., Luo, Q., Lohman, G.: On Supporting Containment Queries in Relational Database Management Systems. SIGMOD (2001)Google Scholar
  30. 30.
    Zimanyi, E.: Query Evaluations in Probabilistic Relational Databases. Theoretical Computer Science (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Chavdar Botev
    • 1
  • Sihem Amer-Yahia
    • 2
  • Jayavel Shanmugasundaram
    • 1
  1. 1.Cornell UniversityIthacaUSA
  2. 2.AT&T Labs–ResearchFlorham ParkUSA

Personalised recommendations