Efficient Deep Web Crawling Using Reinforcement Learning

  • Lu Jiang
  • Zhaohui Wu
  • Qian Feng
  • Jun Liu
  • Qinghua Zheng
Conference paper

DOI: 10.1007/978-3-642-13657-3_46

Part of the Lecture Notes in Computer Science book series (LNCS, volume 6118)
Cite this paper as:
Jiang L., Wu Z., Feng Q., Liu J., Zheng Q. (2010) Efficient Deep Web Crawling Using Reinforcement Learning. In: Zaki M.J., Yu J.X., Ravindran B., Pudi V. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2010. Lecture Notes in Computer Science, vol 6118. Springer, Berlin, Heidelberg

Abstract

Deep web refers to the hidden part of the Web that remains unavailable for standard Web crawlers. To obtain content of Deep Web is challenging and has been acknowledged as a significant gap in the coverage of search engines. To this end, the paper proposes a novel deep web crawling framework based on reinforcement learning, in which the crawler is regarded as an agent and deep web database as the environment. The agent perceives its current state and selects an action (query) to submit to the environment according to Q-value. The framework not only enables crawlers to learn a promising crawling strategy from its own experience, but also allows for utilizing diverse features of query keywords. Experimental results show that the method outperforms the state of art methods in terms of crawling capability and breaks through the assumption of full-text search implied by existing methods.

Keywords

Hidden Web Deep Web Crawling Reinforcement Learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Lu Jiang
    • 1
  • Zhaohui Wu
    • 1
  • Qian Feng
    • 1
  • Jun Liu
    • 1
  • Qinghua Zheng
    • 1
  1. 1.MOE KLINNS Lab and SKLMS LabXi’an Jiaotong UniversityXi’anP.R.China

Personalised recommendations