COCA Filters: Co-occurrence Aware Bloom Filters

  • Kamran Tirdad
  • Pedram Ghodsnia
  • J. Ian Munro
  • Alejandro López-Ortiz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7024)

Abstract

We propose an indexing data structure based on a novel variation of Bloom filters. Signature files have been proposed in the past as a method to index large text databases though they suffer from a high false positive error problem. In this paper we introduce COCA Filters, a new type of Bloom filters which exploits the co-occurrence probability of words in documents to reduce the false positive error. We show experimentally that by using this technique we can reduce the false positive error by up to 21.6 times for the same index size. Furthermore Bloom filters can be replaced by COCA filters wherever the co-occurrence of any two members of the universe is identifiable.

Keywords

Information Retrieval Bloom Filters Signature Files Locality Sensitive Hash Functions 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Kamran Tirdad
    • 1
  • Pedram Ghodsnia
    • 1
  • J. Ian Munro
    • 1
  • Alejandro López-Ortiz
    • 1
  1. 1.Cheriton School of Computer ScienceUniversity of WaterlooCanada

Personalised recommendations