Indexing Rich Internet Applications Using Components-Based Crawling

  • Ali Moosavi
  • Salman Hooshmand
  • Sara Baghbanzadeh
  • Guy-Vincent Jourdan
  • Gregor V. Bochmann
  • Iosif Viorel Onut
Conference paper

DOI: 10.1007/978-3-319-08245-5_12

Part of the Lecture Notes in Computer Science book series (LNCS, volume 8541)
Cite this paper as:
Moosavi A., Hooshmand S., Baghbanzadeh S., Jourdan GV., Bochmann G.V., Onut I.V. (2014) Indexing Rich Internet Applications Using Components-Based Crawling. In: Casteleyn S., Rossi G., Winckler M. (eds) Web Engineering. ICWE 2014. Lecture Notes in Computer Science, vol 8541. Springer, Cham

Abstract

Automatic crawling of Rich Internet Applications (RIAs) is a challenge because client-side code modifies the client dynamically, fetching server-side data asynchronously. Most existing solutions model RIAs as state machines with DOMs as states and JavaScript events execution as transitions. This approach fails when used with “real-life”, complex RIAs, because the size of the produced model is much too large to be practical. In this paper, we propose a new method to crawl AJAX-based RIAs in an efficient manner by detecting “components”, which are areas of the DOM that are independent from each other, and by crawling each component separately. This leads to a dramatic reduction of the required state space for the model, without loss of content coverage. Our method does not require prior knowledge of the RIA nor predefined definition of components. Instead, we infer the components by observing the behavior of the RIA during crawling. Our experimental results show that our method can index quickly and completely industrial RIAs that are simply out of reach for traditional methods.

Keywords

Rich Internet Applications Web Crawling Web Application Modeling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Ali Moosavi
    • 1
  • Salman Hooshmand
    • 1
  • Sara Baghbanzadeh
    • 1
  • Guy-Vincent Jourdan
    • 1
    • 2
  • Gregor V. Bochmann
    • 1
    • 2
  • Iosif Viorel Onut
    • 3
    • 4
  1. 1.EECS - University of OttawaCanada
  2. 2.Fellow of IBM Canada CAS ResearchCanada
  3. 3.Research and DevelopmentIBM® Security AppScan® EnterpriseCanada
  4. 4.IBM Canada Software Lab.Canada

Personalised recommendations