Theory of Computing Systems

, 46:104

Maximal Intersection Queries in Randomized Input Models

  • Benjamin Hoffmann
  • Mikhail Lifshits
  • Yury Lifshits
  • Dirk Nowotka
Article

DOI: 10.1007/s00224-008-9154-6

Cite this article as:
Hoffmann, B., Lifshits, M., Lifshits, Y. et al. Theory Comput Syst (2010) 46: 104. doi:10.1007/s00224-008-9154-6
  • 47 Downloads

Abstract

Consider a family of sets and a single set, called the query set. How can one quickly find a member of the family which has a maximal intersection with the query set? Time constraints on the query and on a possible preprocessing of the set family make this problem challenging. Such maximal intersection queries arise in a wide range of applications, including web search, recommendation systems, and distributing on-line advertisements. In general, maximal intersection queries are computationally expensive. We investigate two well-motivated distributions over all families of sets and propose an algorithm for each of them. We show that with very high probability an almost optimal solution is found in time which is logarithmic in the size of the family. Moreover, we point out a threshold phenomenon on the probabilities of intersecting sets in each of our two input models which leads to the efficient algorithms mentioned above.

Keywords

Nearest neighbor problem Randomized input models Zipf’s law Maximal intersection problem Algorithms for large data sets 

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Benjamin Hoffmann
    • 1
  • Mikhail Lifshits
    • 2
  • Yury Lifshits
    • 3
  • Dirk Nowotka
    • 1
  1. 1.Universität StuttgartStuttgartGermany
  2. 2.St. Petersburg State UniversitySt. PetersburgRussia
  3. 3.California Institute of TechnologyPasadenaUSA

Personalised recommendations