Advertisement

Fast Intersection Algorithms for Sorted Sequences

  • Ricardo Baeza-Yates
  • Alejandro Salinger
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6060)

Abstract

This paper presents and analyzes a simple intersection algorithm for sorted sequences that is fast on average. It is related to the multiple searching problem and to merging. We present the worst and average case analysis, showing that in the former, the complexity nicely adapts to the smallest list size. In the latter case, it performs less comparisons than the total number of elements on both inputs, n and m, when n = αm (α> 1), achieving O(m log(n/m)) complexity. The algorithm is motivated by its application to fast query processing in Web search engines, where large intersections, or differences, must be performed fast. In this case we experimentally show that the algorithm is faster than previous solutions.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baeza-Yates, R.A.: Efficient Text Searching. PhD thesis, Dept. of Computer Science. University of Waterloo (May 1989); Also as Research Report CS-89-17Google Scholar
  2. 2.
    Baeza-Yates, R.A., Bradford, P.G., Culberson, J.C., Rawlins, G.J.E.: The Complexity of Multiple Searching (1993) (unpublished manuscript)Google Scholar
  3. 3.
    Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval, 513 pages. ACM Press/Addison-Wesley, England (1999)Google Scholar
  4. 4.
    Baeza-Yates, R.A., Saint-Jean, F.: A three level search engine index based in query log distribution. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 56–65. Springer, Heidelberg (2003)Google Scholar
  5. 5.
    Baeza-Yates, R.A.: Query usage mining in search engines. In: Scime, A. (ed.) Web Mining: Applications and Techniques. Idea Group, USA (2004)Google Scholar
  6. 6.
    Baeza-Yates, R.A.: A fast set intersection algorithm for sorted sequences. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusöz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 400–408. Springer, Heidelberg (2004)Google Scholar
  7. 7.
    Baeza-Yates, R.A., Salinge, A.: Experimental analysis of a fast intersection algorithm for sorted sequences. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 13–24. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Barbay, J., Kenyon, C.: Adaptive Intersection and t-Threshold Problems. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, CA, January 2002, pp. 390–399 (2002)Google Scholar
  9. 9.
    Barbay, J., López-Ortiz, A., Lu, T., Salinger, A.: An experimental investigation of set intersection algorithms for text searching. Journal of Experimental Algorithms (JEA) 14(3), 7–24 (2009)Google Scholar
  10. 10.
    Bentley, J.L., Yao, A.C.-C.: An Almost Optimal Algorithm for Unbounded Searching. Information Processing Letters 5, 82–87 (1976)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. In: 7th WWW Conference, Brisbane, Australia (April 1998)Google Scholar
  12. 12.
    Culpepper, J., Moffat, A.: Compact set representation for information retrieval. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 137–148. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  13. 13.
    Demaine, E.D., López-Ortiz, A., Munro, J.I.: Adaptive set intersections, unions, and differences. In: Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, CA, January 2000, pp. 743–752 (2000)Google Scholar
  14. 14.
    Demaine, E.D., López-Ortiz, A., Munro, J.I.: Experiments on adaptive set intersections for text retrieval systems. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, pp. 91–104. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  15. 15.
    Dietz, P., Mehlhorn, K., Raman, R., Uhrig, C.: Lower Bounds for Set Intersection Queries. In: Proceedings of the 4th Annual Symposium on Discrete Algorithms, pp. 194–201 (1993)Google Scholar
  16. 16.
    Dobkin, D., Lipton, R.: On the Complexity of Computations Under Varying Sets of Primitives. Journal of Computer and Systems Sciences 18, 86–91 (1979)zbMATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Fernandez de la Vega, W., Kannan, S., Santha, M.: Two probabilistic results on merging. SIAM J. on Computing 22(2), 261–271 (1993)zbMATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Fernandez de la Vega, W., Frieze, A.M., Santha, M.: Average case analysis of the merging algorithm of Hwang and Lin. Algorithmica 22(4), 483–489 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Hwang, F.K., Lin, S.: A Simple algorithm for merging two disjoint linearly ordered lists. SIAM J. on Computing 1, 31–39 (1972)zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Rawlins, G.J.E.: Compared to What?: An Introduction the the Analysis of Algorithms. Computer Science Press/W.H. Freeman (1992)Google Scholar
  21. 21.
    Sanders, P., Transier, F.: Intersection in integer inverted indices. In: ALENEX 2007, pp. 71–83 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Ricardo Baeza-Yates
    • 1
    • 2
  • Alejandro Salinger
    • 3
  1. 1.Yahoo! ResearchBarcelonaSpain
  2. 2.Yahoo! ResearchSantiagoChile
  3. 3.Dept. of Computer ScienceUniv. of WaterlooCanada

Personalised recommendations