Capacity-Constrained Query Formulation

  • Matthias Hagen
  • Benno Maria Stein
Conference paper

DOI: 10.1007/978-3-642-15464-5_38

Volume 6273 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Hagen M., Stein B.M. (2010) Capacity-Constrained Query Formulation. In: Lalmas M., Jose J., Rauber A., Sebastiani F., Frommholz I. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2010. Lecture Notes in Computer Science, vol 6273. Springer, Berlin, Heidelberg

Abstract

Given a set of keyphrases, we analyze how Web queries with these phrases can be formed that, taken altogether, return a specified number of hits. The use case of this problem is a plagiarism detection system that searches the Web for potentially plagiarized passages in a given suspicious document. For the query formulation problem we develop a heuristic search strategy based on co-occurrence probabilities. Compared to the maximal termset strategy [3], which can be considered as the most sensible non-heuristic baseline, our expected savings are on average 50% when queries for 9 or 10 phrases are to be constructed.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Matthias Hagen
    • 1
  • Benno Maria Stein
    • 1
  1. 1.Faculty of MediaBauhaus UniversityWeimarGermany