Efficient Lazy Algorithms for Minimal-Interval Semantics

  • Paolo Boldi
  • Sebastiano Vigna
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4209)


Minimal-interval semantics [3] associates with each query over a document a set of intervals, called witnesses, that are incomparable with respect to inclusion (i.e., they form an antichain): witnesses define the minimal regions of the document satisfying the query. Minimal-interval semantics makes it easy to define and compute several sophisticated proximity operators, provides snippets for user presentation, and can be used to rank documents: thus, computing efficiently the antichains obtained by operations such as logic conjunction and disjunction is a basic issue. In this paper we provide the first algorithms for computing such operators that are linear in the number of intervals and logarithmic in the number of input antichains. The space used is linear in the number of antichains. Moreover, the algorithms are lazy — they do not assume random access to the input antichains. These properties make the usage of our algorithms feasible in large-scale web search engines.


Priority Queue Minimal Interval Priority Order Block Operator Empty Interval 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Birkhoff, G.: Lattice Theory, 3rd (new) edn. AMS Colloquium Publications, vol. XXV. American Mathematical Society (1970)Google Scholar
  2. 2.
    Charles, L., Clarke, A., Cormack, G.V.: Shortest-substring retrieval and ranking. ACM Trans. Inf. Syst. 18(1), 44–78 (2000)CrossRefGoogle Scholar
  3. 3.
    Clarke, C.L.A., Cormack, G.V., Burkowski, F.J.: An algebra for structured text search and a framework for its implementation. Comput. J. 38(1), 43–56 (1995)Google Scholar
  4. 4.
    Crampton, J., Loizou, G.: The completion of a poset in a lattice of antichains. International Mathematical Journal 1(3), 223–238 (2001)MathSciNetGoogle Scholar
  5. 5.
    Gonnet, G.H.: PAT 3.1: An efficient text searching system. User’s manual. Technical report, Center for the New Oxford English Dictionary. University of Waterloo, Waterloo, Canada (1987)Google Scholar
  6. 6.
    Jaakkola, J., Kilpeläinen, P.: Nested text-region algebra. Technical Report C-1999-2, Department of Computer Science, University of Helsinki (1999)Google Scholar
  7. 7.
    Jourdan, G.-V., Rampon, J.-X., Jard, C.: Computing on-line the lattice of maximal antichains of posets. Order 11(3), 197–210 (1994)MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Navarro, G., Baeza-Yates, R.: A class of linear algorithms to process sets of segments. In: Proc. CLEI 1996, vol. 2, pp. 671–682 (1996)Google Scholar
  9. 9.
    Nievergelt, J., Preparata, F.P.: Plane-sweep algorithms for intersecting geometric figures. Comm. ACM 25(10), 739–747 (1982)MATHCrossRefGoogle Scholar
  10. 10.
    The Lemur Project. Indri, http://www.lemurproject.org/indri/
  11. 11.
    Sadakane, K., Imai, H.: Fast algorithms for k-word proximity search. IEICE Trans. Fundamentals E84-A(9) (September 2001)Google Scholar
  12. 12.
    Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann Publishers, Los Altos (1999)Google Scholar
  13. 13.
    Young-Lai, M., Tompa, F.W.: One-pass evaluation of region algebra expressions. Inf. Syst. 28(3), 159–168 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Paolo Boldi
    • 1
  • Sebastiano Vigna
    • 1
  1. 1.Dipartimento di Scienze dell’InformazioneUniversità degli Studi di Milano 

Personalised recommendations