Capacity-Constrained Query Formulation
- Cite this paper as:
- Hagen M., Stein B.M. (2010) Capacity-Constrained Query Formulation. In: Lalmas M., Jose J., Rauber A., Sebastiani F., Frommholz I. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2010. Lecture Notes in Computer Science, vol 6273. Springer, Berlin, Heidelberg
Given a set of keyphrases, we analyze how Web queries with these phrases can be formed that, taken altogether, return a specified number of hits. The use case of this problem is a plagiarism detection system that searches the Web for potentially plagiarized passages in a given suspicious document. For the query formulation problem we develop a heuristic search strategy based on co-occurrence probabilities. Compared to the maximal termset strategy , which can be considered as the most sensible non-heuristic baseline, our expected savings are on average 50% when queries for 9 or 10 phrases are to be constructed.
Unable to display preview. Download preview PDF.