Optimization of Bounded Continuous Search Queries Based on Ranking Distributions

  • Dirk Kukulenz
  • Nils Hoeller
  • Sven Groppe
  • Volker Linnemann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4831)

Abstract

A common search problem in the World Wide Web concerns finding information if it is not known when the sources of information appear and how long sources will be available on the Web, as e.g. sales offers for products or news reports. Continuous queries are a means to monitor the Web over a specific period of time. Main problems concerning the optimization of such queries are to provide high quality and up-to-date results and to control the amount of information returned by a continuous query engine. In this paper we present a new method to realize such search queries which is based on the extraction of the distribution of ranking values and a new strategy to select relevant data objects in a stream of documents. The new method provides results of significantly higher quality if ranking distributions may be modeled by Gaussian distributions. This is usually the case if a larger number of information sources on the Web and higher quality candidates are considered.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study: Final report. In: Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 194–218 (1998)Google Scholar
  2. 2.
    Arampatzis, A., van Hameran, A.: The score-distributional threshold optimization for adaptive binary classification tasks. In: SIGIR 2001. Proc. of ACM SIGIR conf. on Research and development in IR, pp. 285–293. ACM Press, New York, NY, USA (2001)CrossRefGoogle Scholar
  3. 3.
    Collins-Thompson, K., Ogilvie, P., Zhang, Y., Callan, J.: Information filtering, novelty detection, and named-page finding. In: TREC 2002, Gaithersburg (2002)Google Scholar
  4. 4.
    DeGroot, M.H: Optimal Statistical Decisions. Wiley Classics Library (2004)Google Scholar
  5. 5.
    Gilbert, J.P., Mosteller, F.: Recognizing the maximum of a sequence. Journal of the American Statistical Association 61(313), 35–73 (1966)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Glasser, K.S., Holzsager, R., Barron, A.: The d choice secretary problem. Comm. Statist. -Sequential Anal. 2(3), 177–199 (1983)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Google alert (2006), http://www.googlealert.com
  8. 8.
    Kadison, R.V.: Strategies in the secretary problem. Expo. Math. 12(2), 125–144 (1994)MATHMathSciNetGoogle Scholar
  9. 9.
    Kekalainen, J., Jarvelin, K.: Using graded relevance assessments in IR evaluation. J. of the American Society for Information Science and Technology 53(13) (2002)Google Scholar
  10. 10.
    Kukulenz, D., Ntoulas, A.: Answering bounded continuous search queries in the world wide web. In: Proc. of WWW-07, World Wide Web Conf., ACM Press, Banff, Canada (2007)Google Scholar
  11. 11.
    Liu, L., Pu, C., Tang, W.: Continual queries for internet scale event-driven information delivery. Knowledge and Data Engineering 11(4), 610–628 (1999)CrossRefGoogle Scholar
  12. 12.
    Liu, R.-L., Lin, W.-J.: Adaptive sampling for thresholding in document filtering and classification. Inf. Process. Manage. 41(4), 745–758 (2005)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Windows live alerts (2006), http://alerts.live.com/Alerts/Default.aspx
  14. 14.
    Praeter, J.: On multiple choice secretary problems. Mathematics of Operations Research 19(3), 597–602 (1994)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Salton, G., Buckle, C.: Term-weighting approaches in automatic text retrieval. Information Processing Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  16. 16.
    Shiryaev, A., Peskir, G.: Optimal Stopping and Free-Boundary Problems (Lectures in Mathematics. ETH Zürich). Birkhauser (2006)Google Scholar
  17. 17.
    Stewart, T.J.: Optimal selection from a random sequence with learning of the underlying distribution. Journal of the American Statistical Association 73(364) (1978)Google Scholar
  18. 18.
    Text retrieval conf. (TREC) (2006), http://trec.nist.gov/
  19. 19.
    Yang, Y.: A study on thresholding strategies for text categorization. In: Proc. of SIGIR-2001, Int. Conf. on Research and Development in IR, New Orleans, US, pp. 137–145. ACM Press, New York (2001)Google Scholar
  20. 20.
    Yang, Y., Pierce, T., Carbonell, J.: A study of retrospective and on-line event detection. In: SIGIR 1998. Proc. of the ACM SIGIR conf. on Research and development in IR, pp. 28–36. ACM Press, New York, NY, USA (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Dirk Kukulenz
    • 1
  • Nils Hoeller
    • 1
  • Sven Groppe
    • 1
  • Volker Linnemann
    • 1
  1. 1.Luebeck University, Institute of Information Systems, Ratzeburger Allee 160, 23538 LuebeckGermany

Personalised recommendations