A Comparison of Top-k Temporal Keyword Querying over Versioned Text Collections

  • Wenyu Huo
  • Vassilis J. Tsotras
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7447)


As the web evolves over time, the amount of versioned text collections increases rapidly. Most web search engines will answer a query by ranking all known documents at the (current) time the query is posed. There are applications however (for example customer behavior analysis, crime investigation, etc.) that would need to efficiently query these sources as of some past time, that is, retrieve the results as if the user was posing the query in a past time instant, thus accessing data known as of that time. Ranking and searching over versioned documents considers not only keyword constraints but also the time dimension, most commonly, a time point or time range of interest. In this paper, we deal with top-k query evaluations with both keyword and temporal constraints over versioned textual documents. In addition to considering previous solutions, we propose novel data organization and indexing solutions: the first one partitions data along ranking positions, while the other maintains the full ranking order through the use of a multiversion ordered list. We present an experimental comparison for both time point and time interval constraints. For time-interval constraints, different querying definitions, such as aggregation functions and consistent top-k queries are evaluated. Experimental evaluations on large real world datasets demonstrate the advantages of the newly proposed data organization and indexing approaches.


Temporal Constraint Query Time Inverted List Data Page Internet Archive 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Internet Archive,
  3. 3.
    European Archive,
  4. 4.
  5. 5.
    Anand, A., Bedathur, S., Berberich, K., Schenkel, R.: Efficient Temporal Keyword Queries over Versioned Text. In: CIKM (2010)Google Scholar
  6. 6.
    Anand, A., Bedathur, S., Berberich, K., Schenkel, R.: Temporal Index Sharding for Space-Time Efficiency in Archive Search. In: SIGIR (2011)Google Scholar
  7. 7.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley (1999)Google Scholar
  8. 8.
    Becker, B., Gschwind, S., Ohler, T., Seeger, B., Widmayer, P.: An asymptotically optimal multiversion B-tree. VLDB Journal (1996)Google Scholar
  9. 9.
    Berberich, K., Bedathur, S., Neumann, T., Weikum, G.: A Time Machine for Text Search. In: SIGIR (2007)Google Scholar
  10. 10.
    Berberich, K., Bedathur, S., Weikum, G.: Efficient Time-Travel on Versioned Text Collections. In: BTW (2007)Google Scholar
  11. 11.
    Fagin, R., Lotem, A., Naor, M.: Optimal Aggregation Algorithms for Middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    He, J., Suel, T.: Faster Temporal Range Queries over Versioned Text. In: SIGIR (2011)Google Scholar
  13. 13.
    Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: SIGIR (1998)Google Scholar
  14. 14.
    Robertson, S.E., Walker, S.: Okapi/keenbow at TREC-8. In: TREC (1999)Google Scholar
  15. 15.
    Tsotras, V.J., Kangelaris, N.: The Snapshot Index: an I/O Optimal Access Method for Snapshot Queries. Information System 20(3), 237–260 (1995)CrossRefGoogle Scholar
  16. 16.
    U, L.H., Mamoulis, N., Berberich, K., Bedathur, S.: Durable Top-k Search in Document Archives. In: SIGMOD (2010) Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Wenyu Huo
    • 1
  • Vassilis J. Tsotras
    • 1
  1. 1.Department of Computer Science and EngineeringUniversity of CaliforniaRiversideUSA

Personalised recommendations