GDist-RIA Crawler: A Greedy Distributed Crawler for Rich Internet Applications

  • Seyed M. Mirtaheri
  • Gregor von Bochmann
  • Guy-Vincent Jourdan
  • Iosif Viorel Onut
Conference paper

DOI: 10.1007/978-3-319-09581-3_14

Part of the Lecture Notes in Computer Science book series (LNCS, volume 8593)
Cite this paper as:
Mirtaheri S.M., von Bochmann G., Jourdan GV., Onut I.V. (2014) GDist-RIA Crawler: A Greedy Distributed Crawler for Rich Internet Applications. In: Noubir G., Raynal M. (eds) Networked Systems. Lecture Notes in Computer Science, vol 8593. Springer, Cham

Abstract

Crawling web applications is important for indexing, accessibility and security assessment. Crawling traditional web applications is an old problem, for which good and efficient solution are known. Crawling Rich Internet Applications (RIA) quickly and efficiently, however, is an open problem. Technologies such as AJAX and partial Document Object Model (DOM) updates only make the problem of crawling RIA more time consuming to the web crawler. One way to reduce the time to crawl a RIA is to crawl a RIA in parallel with multiple computers. Previously published Dist-RIA Crawler presents a distributed breath-first search algorithm to crawl RIAs. This paper expands Dist-RIA Crawler in two ways. First, it introduces an adaptive load-balancing algorithm that enables the crawler to learn about the speed of the nodes and adapt to changes, thus better utilize the resources. Second, it present a distributed greedy algorithm to crawl a RIA in parallel, called GDist-RIA Crawler. The GDist-RIA Crawler uses a server-client architecture where the server dispatched crawling jobs to the crawling clients. This paper illustrates a prototype implementation of the GDist-RIA Crawler, explains some of the techniques used to implement the prototype and inspects empirical performance measurements.

Keywords

Web crawling Rich internet application Greedy algorithm Load-balancing 

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Seyed M. Mirtaheri
    • 1
  • Gregor von Bochmann
    • 1
  • Guy-Vincent Jourdan
    • 1
  • Iosif Viorel Onut
    • 2
  1. 1.School of Electrical Engineering and Computer ScienceUniversity of OttawaOttawaCanada
  2. 2.Security AppScan® Enterprise, IBMOttawaCanada

Personalised recommendations