Focused Crawling Using Temporal Difference-Learning

Grigoriadis, Alexandros; Paliouras, Georgios

doi:10.1007/978-3-540-24674-9_16

Focused Crawling Using Temporal Difference-Learning

Alexandros Grigoriadis^18,19 &
Georgios Paliouras¹⁸

Conference paper

1408 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3025))

Abstract

This paper deals with the problem of constructing an intelligent Focused Crawler, i.e. a system that is able to retrieve documents of a specific topic from the Web. The crawler must contain a component which assigns visiting priorities to the links, by estimating the probability of leading to a relevant page in the future. Reinforcement Learning was chosen as a method that fits this task nicely, as it provides a method for rewarding intermediate states to the goal. Initial results show that a crawler trained with Reinforcement Learning is able to retrieve relevant documents after a small number of steps.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C., Al-Garawi, F., Yu, P.: Intelligent Crawling on the World Wide Web with Arbitrary Predicates. In: Proceedings of the 10th International WWW Conference, Hong Kong, May 2001, pp. 96–105 (2001)
Google Scholar
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: The Proceedings of the Seventh International WWW Conference, Brisbane, April 1998, pp. 107–117 (1998)
Google Scholar
Chakrabarti, S., van den Berg, M., Dom, B.: Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery. In: Proceedings of the 8th International WWW Conference, Toronto, Canada, May 1999, pp. 545–562 (1999)
Google Scholar
CROSS-lingual Multi Agent Retail Comparison, http://www.iit.demokritos.gr/skel/crossmarc
Karkaletsis, V., Paliouras, G., Stamatakis, K., Pazienza, M.-T., Stellato, A., Vindigni, M., Grover, C., Horlock, J., Curran, J., Dingare, S.: Report on the techniques used for the collection of product descriptions, CROSSMARC Project Deliverable D1.3 (2003)
Google Scholar
De Bra, P., Houben, G., Kornatzky, Y., Post, R.: Information Retrieval in Distributed Hypertexts. In: Proceedings of the 4th RIAO Conference, New York, pp. 481–491 (1994)
Google Scholar
Diligenti, M., Coetzee, F.M., Lawrence, S., Giles, C.L., Gori, M.: Focused Crawling Using Context Graphs. In: VLDB 2000, Cairo, Egypt, pp. 527–534 (2000)
Google Scholar
Hersovici, M., Jacovi, M., Maarek, Y., Pelleg, D., Shtalhaim, M., Sigalit, U.: The Shark-Search Algorithm - An Application: Tailored Web Site Mapping. In: Proceedings of the Seventh International WWW Conference, Brisbane, Australia (April 1998)
Google Scholar
McCallum, A., Nigam, K., Rennie, J., Seymore, K.: Building Domain-Specific Search Engines with Machine Learning Techniques. In: AAAI Spring Symposium on Intelligent Agents in Cyberspace, Stanford University, USA (March 1999)
Google Scholar
Stamatakis, K., Karkaletsis, V., Paliouras, G., Horlock, J., Grover, C., Curran, J.R., Dingare, S.: Domain-specific Web Site Identification: The CROSSMARC Focused Web Crawler. In: Proceedings of the Second International Workshop on Web Document Analysis (WDA 2003), Edinburgh, Scotland, August 3-6, pp. 75–78 (2003)
Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning. An Introduction. MIT Press, Cambridge (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Software and Knowledge Engineering Laboratory, Institute of Informatics and Telecommunications, National Centre for Scientific Research “Demokritos”, 153 10, Ag. Paraskevi, Athens, Greece
Alexandros Grigoriadis & Georgios Paliouras
Language Technology Group, Human Communication Research Centre, University of Edinburgh, Edinburgh, UK
Alexandros Grigoriadis

Authors

Alexandros Grigoriadis
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Paliouras
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Info and Communication Systems Eng, Aegean University, 83200, Karlovassi, Samos, Greece
George A. Vouros
Department of Informatics, University of Piraeus, Piraeus, Greece
Themistoklis Panayiotopoulos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grigoriadis, A., Paliouras, G. (2004). Focused Crawling Using Temporal Difference-Learning. In: Vouros, G.A., Panayiotopoulos, T. (eds) Methods and Applications of Artificial Intelligence. SETN 2004. Lecture Notes in Computer Science(), vol 3025. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24674-9_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-24674-9_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21937-8
Online ISBN: 978-3-540-24674-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics