Skip to main content

A Strategy for Efficient Crawling of Rich Internet Applications

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 6757)

Abstract

New web application development technologies such as Ajax, Flex or Silverlight result in so-called Rich Internet Applications (RIAs) that provide enhanced responsiveness, but introduce new challenges for crawling that cannot be addressed by the traditional crawlers. This paper describes a novel crawling technique for RIAs. The technique first generates an optimal crawling strategy for an anticipated model of the crawled RIA by aiming at discovering new states as quickly as possible. As the strategy is executed, if the discovered portion of the actual model of the application deviates from the anticipated model, the anticipated model and the strategy are updated to conform to the actual model. We compare the performance of our technique to a number of existing ones as well as depth-first and breadth-first crawling on some Ajax test applications. The results show that our technique has a better performance often with a faster rate of state discovery.

Keywords

  • Rich Internet Applications
  • Crawling
  • Web Application Modeling

References

  1. Anderson, I.: Combinatorics of Finite Sets. Oxford Univ. Press, London (1987)

    MATH  Google Scholar 

  2. Arasu, A., Cho, J., Garcia-Molina, A., Paepcke, A., Raghavan, S.: Searching the web. ACM Transactions on Internet Technology 1(1), 2–43 (2001)

    CrossRef  Google Scholar 

  3. Bau, J., Bursztein, E., Gupta, D., Mitchell, J.C.: State of the Art: Automated Black-Box Web Application Vulnerability Testing. In: Proc. IEEE Symposium on Security and Privacy (2010)

    Google Scholar 

  4. Benjamin, K.: A Strategy for Efficient Crawling of Rich Internet Applications. Master’s Thesis. SITE-University of Ottawa (2010), http://ssrg.site.uottawa.ca/docs/Benjamin-Thesis.pdf

  5. Benjamin, K., Bochmann, G.v., Jourdan, G.V., Onut, I.V.: Some Modeling Challenges when Testing Rich Internet Applications for Security. In: First International Workshop on Modeling and Detection of Vulnerabilities, Paris, France (2010)

    Google Scholar 

  6. Bezemer, B., Mesbah, A., Deursen, A.v: Automated Security Testing of Web Widget Interactions. In: Foundations of Software Engineering Symposium (FSE), pp. 81–90. ACM, New York (2009)

    Google Scholar 

  7. Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)

    CrossRef  Google Scholar 

  8. Bruijn, N.d.G., Tengbergen, C., Kruyswijk, D.: On the set of divisors of a number. Nieuw Arch. Wisk. 23, 191–194 (1951)

    MathSciNet  MATH  Google Scholar 

  9. Dilworth, R.P.: A Decomposition Theorem for Partially Ordered Sets. Annals of Mathematics 51, 161–166 (1950)

    MathSciNet  CrossRef  MATH  Google Scholar 

  10. Duda, C., Frey, G., Kossmann, D., Zhou, C.: AJAXSearch: Crawling, Indexing and Searching Web 2.0 Applications.VLDB (2008)

    Google Scholar 

  11. Frey, G.: Indexing Ajax Web Applications, Master’s Thesis, ETH Zurich (2007)

    Google Scholar 

  12. Hsu, T., Logan, M., Shahriari, S., Towse, C.: Partitioning the Boolean Lattice into Chains of Large Minimum Size. Journal of Combinatorial Theory 97(1), 62–84 (2002)

    MathSciNet  CrossRef  MATH  Google Scholar 

  13. Manku, G.S., Jain, A., Das Sarma, A.: Detecting near-duplicates for web crawling. In: Proceedings of the 16th International Conference on World Wide Web. WWW 2007, pp. 141–150. ACM, New York (2007)

    Google Scholar 

  14. Marchetto, A., Tonella, P., Ricca, F.: State-based testing of Ajax web applications. In: Proc. 1st IEEE Intl. Conf. on Software Testing Verification and Validation (ICST 2008). IEEE Computer Society, Los Alamitos (2008)

    Google Scholar 

  15. Matter, R.: Ajax Crawl: making Ajax applications searchable. Master’s Thesis. ETH, Zurich (2008)

    Google Scholar 

  16. Mesbah, A., Deursen, A.v.: Exposing the Hidden Web Induced by AJAX.TUD-SERG Technical Report Series. TUD-SERG-2008-001 (2008)

    Google Scholar 

  17. Mesbah, A., Bozdag, E., Deursen, A.v: Crawling Ajax by Inferring User Interface State Changes. In: Proceedings of the 8th International Conference on Web Engineering, pp. 122–134. IEEE Computer Society, Los Alamitos (2008)

    Google Scholar 

  18. Roest, D., Mesbah, A., Deursen, A.v: Regression Testing Ajax Applications: Coping with Dynamism. In: Third International Conference on Software Testing, Verification and Validation, pp. 127–136 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Benjamin, K., von Bochmann, G., Dincturk, M.E., Jourdan, GV., Onut, I.V. (2011). A Strategy for Efficient Crawling of Rich Internet Applications. In: Auer, S., Díaz, O., Papadopoulos, G.A. (eds) Web Engineering. ICWE 2011. Lecture Notes in Computer Science, vol 6757. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22233-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22233-7_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22232-0

  • Online ISBN: 978-3-642-22233-7

  • eBook Packages: Computer ScienceComputer Science (R0)