Efficient Phrase Querying with Common Phrase Index

  • Matthew Chang
  • Chung Keung Poon
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3936)


In this paper, we propose a common phrase index as an efficient index structure to support phrase queries in a very large text database. Our structure is an extension of previous index structures for phrases and achieves better query efficiency with negligible extra storage cost. In our experimental evaluation, a common phrase index has 5% and 20% improvement in query time for the overall and large queries (queries of long phrases) respectively over an auxiliary nextword index. Moreover, it uses only 1% extra storage cost. Compared with an inverted index, our improvement is 40% and 72% for the overall and large queries respectively.


Query Time Query Term Common Word Information Retrieval System Query Evaluation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anh, V., Moffat, A.: Compressed inverted files with reduced decoding overheads. In: Proc. of the 21th Annual SIGIR Conf. on Research and Development in information retrieval, pp. 290–297 (1998)Google Scholar
  2. 2.
    Anh, V., Moffat, A.: Vector-space ranking with effective early termination. In: Proc. of the 24th Annual SIGIR Conf. on Research and Development in information retrieval, pp. 35–42 (2001)Google Scholar
  3. 3.
    Bahle, D., Williams, H.E., Zobel, J.: Compaction techniques for nextword indexes. In: Proc. 8th International Symposium on String Processing and Information Retrieval (SPIRE 2001), pp. 33–45 (2001)Google Scholar
  4. 4.
    Bahle, D., Williams, H.E., Zobel, J.: Efficient phrase querying with an auxiliary index. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval (2002)Google Scholar
  5. 5.
    Chaudhuri, S., Gravano, L.: Optimizing queries over multimedia repositories, pp. 91–102 (1996)Google Scholar
  6. 6.
    Croft, W.B., Turtle, H.R., Lewis, D.D.: The use of phrases and structured queries in information retrieval. In: Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Chicago, Illinois, USA, October 13-16 (Special Issue of the SIGIR Forum), pp. 32–45 (1991)Google Scholar
  7. 7.
    de Lima, E.F., Pedersen, J.O.: Phrase recognition and expansion for short, precision-biased queries based on a query log. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 145–152 (1999)Google Scholar
  8. 8.
    Fagan, J.L.: Experiments in automatic phrase indexing for document retrieval: a comparison of syntactic and non-syntactic methods. PhD thesis, Cornell University (1987)Google Scholar
  9. 9.
    Kowalski, G.: Information Retrieval Systems: Theory and Implementation. Kluwer Academic Publishers, Dordrecht (1997)MATHGoogle Scholar
  10. 10.
    Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems 14(4), 349–379 (1996)CrossRefGoogle Scholar
  11. 11.
    Paynter, G.W., Witten, I.H., Cunningham, S.J., Buchanan, G.: Scalable browsing for large collections: A case study. In: Proceedings of the Fifth ACM International Conference on Digital Libraries (2000)Google Scholar
  12. 12.
    Persin, M., Zobel, J., Sacks-Davis, R.: Filtered document retrieval with frequency-sorted indexes. Journal of the American Society of Information Science 47(10), 749–764 (1996)CrossRefGoogle Scholar
  13. 13.
    Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1998)Google Scholar
  14. 14.
    Spink, A., Wolfram, D., Jansen, B., Saracevic, T.: Searching the web: The public and their queries. Journal of the American Society for Information Science 52(3), 226–234 (2001)CrossRefGoogle Scholar
  15. 15.
    Williams, H.E., Zobel, J., Anderson, P.: What’s next? - index structures for efficient phrase querying. In: Proc. Australasian Database Conference, pp. 141–152 (1999)Google Scholar
  16. 16.
    Zobel, J., Moffat, A.: Exploring the similarity space. In: ACM SIGIR Forum, pp. 18–34 (1998)Google Scholar
  17. 17.
    Zobel, J., Moffat, A., Ramamohanarao, K.: Inverted files versus signature files for text indexing. ACM Transactions on Database Systems 23(4), 453–490 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Matthew Chang
    • 1
  • Chung Keung Poon
    • 1
  1. 1.Dept. of Computer ScienceCity U. of Hong KongChina

Personalised recommendations