PDist-RIA Crawler: A Peer-to-Peer Distributed Crawler for Rich Internet Applications

  • Seyed M. Mirtaheri
  • Gregor V. Bochmann
  • Guy-Vincent Jourdan
  • Iosif Viorel Onut
Conference paper

DOI: 10.1007/978-3-319-11746-1_26

Part of the Lecture Notes in Computer Science book series (LNCS, volume 8787)
Cite this paper as:
Mirtaheri S.M., Bochmann G.V., Jourdan GV., Onut I.V. (2014) PDist-RIA Crawler: A Peer-to-Peer Distributed Crawler for Rich Internet Applications. In: Benatallah B., Bestavros A., Manolopoulos Y., Vakali A., Zhang Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8787. Springer, Cham

Abstract

Crawling Rich Internet Applications (RIAs) is important to ensure their security, accessibility and to index them for searching. To crawl a RIA, the crawler has to reach every application state and execute every application event. On a large RIA, this operation takes a long time. Previously published GDist-RIA Crawler proposes a distributed architecture to parallelize the task of crawling RIAs, and run the crawl over multiple computers to reduce time. In GDist-RIA Crawler, a centralized unit calculates the next task to execute, and tasks are dispatched to worker nodes for execution. This architecture is not scalable due to the centralized unit which is bound to become a bottleneck as the number of nodes increases. This paper extends GDist-RIA Crawler and proposes a fully peer-to-peer and scalable architecture to crawl RIAs, called PDist-RIA Crawler. PDist-RIA doesn’t have the same limitations in terms scalability while matching the performance of GDist-RIA. We describe a prototype showing the scalability and performance of the proposed solution.

Keywords

Web Crawling Rich Internet Application Peer-to-Peer Algorithm Crawling Strategies 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Seyed M. Mirtaheri
    • 1
  • Gregor V. Bochmann
    • 1
  • Guy-Vincent Jourdan
    • 1
  • Iosif Viorel Onut
    • 2
  1. 1.School of Electrical Engineering and Computer ScienceUniversity of OttawaOttawaCanada
  2. 2.Security AppScanR®EnterpriseIBMOttawaCanada

Personalised recommendations