Optimization of Bounded Continuous Search Queries Based on Ranking Distributions
A common search problem in the World Wide Web concerns finding information if it is not known when the sources of information appear and how long sources will be available on the Web, as e.g. sales offers for products or news reports. Continuous queries are a means to monitor the Web over a specific period of time. Main problems concerning the optimization of such queries are to provide high quality and up-to-date results and to control the amount of information returned by a continuous query engine. In this paper we present a new method to realize such search queries which is based on the extraction of the distribution of ranking values and a new strategy to select relevant data objects in a stream of documents. The new method provides results of significantly higher quality if ranking distributions may be modeled by Gaussian distributions. This is usually the case if a larger number of information sources on the Web and higher quality candidates are considered.
Unable to display preview. Download preview PDF.
- 1.Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study: Final report. In: Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 194–218 (1998)Google Scholar
- 3.Collins-Thompson, K., Ogilvie, P., Zhang, Y., Callan, J.: Information filtering, novelty detection, and named-page finding. In: TREC 2002, Gaithersburg (2002)Google Scholar
- 4.DeGroot, M.H: Optimal Statistical Decisions. Wiley Classics Library (2004)Google Scholar
- 7.Google alert (2006), http://www.googlealert.com
- 9.Kekalainen, J., Jarvelin, K.: Using graded relevance assessments in IR evaluation. J. of the American Society for Information Science and Technology 53(13) (2002)Google Scholar
- 10.Kukulenz, D., Ntoulas, A.: Answering bounded continuous search queries in the world wide web. In: Proc. of WWW-07, World Wide Web Conf., ACM Press, Banff, Canada (2007)Google Scholar
- 13.Windows live alerts (2006), http://alerts.live.com/Alerts/Default.aspx
- 16.Shiryaev, A., Peskir, G.: Optimal Stopping and Free-Boundary Problems (Lectures in Mathematics. ETH Zürich). Birkhauser (2006)Google Scholar
- 17.Stewart, T.J.: Optimal selection from a random sequence with learning of the underlying distribution. Journal of the American Statistical Association 73(364) (1978)Google Scholar
- 18.Text retrieval conf. (TREC) (2006), http://trec.nist.gov/
- 19.Yang, Y.: A study on thresholding strategies for text categorization. In: Proc. of SIGIR-2001, Int. Conf. on Research and Development in IR, New Orleans, US, pp. 137–145. ACM Press, New York (2001)Google Scholar