Advertisement

Indexing Rich Internet Applications Using Components-Based Crawling

  • Ali Moosavi
  • Salman Hooshmand
  • Sara Baghbanzadeh
  • Guy-Vincent Jourdan
  • Gregor V. Bochmann
  • Iosif Viorel Onut
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8541)

Abstract

Automatic crawling of Rich Internet Applications (RIAs) is a challenge because client-side code modifies the client dynamically, fetching server-side data asynchronously. Most existing solutions model RIAs as state machines with DOMs as states and JavaScript events execution as transitions. This approach fails when used with “real-life”, complex RIAs, because the size of the produced model is much too large to be practical. In this paper, we propose a new method to crawl AJAX-based RIAs in an efficient manner by detecting “components”, which are areas of the DOM that are independent from each other, and by crawling each component separately. This leads to a dramatic reduction of the required state space for the model, without loss of content coverage. Our method does not require prior knowledge of the RIA nor predefined definition of components. Instead, we infer the components by observing the behavior of the RIA during crawling. Our experimental results show that our method can index quickly and completely industrial RIAs that are simply out of reach for traditional methods.

Keywords

Rich Internet Applications Web Crawling Web Application Modeling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fraternali, P., Rossi, G., Sánchez-Figueroa, F.: Rich internet applications. IEEE Internet Computing 14(3), 9–12 (2010)CrossRefGoogle Scholar
  2. 2.
    Duda, C., Frey, G., Kossmann, D., Zhou, C.: Ajaxsearch: crawling, indexing and searching web 2.0 applications. Proceedings of the VLDB Endowment 1(2), 1440–1443 (2008)CrossRefGoogle Scholar
  3. 3.
    Duda, C., Frey, G., Kossmann, D., Matter, R., Zhou, C.: Ajax crawl: making ajax applications searchable. In: ICDE 2009, pp. 78–89. IEEE (2009)Google Scholar
  4. 4.
    Amalfitano, D., Fasolino, A.R., Tramontana, P.: Reverse engineering finite state machines from rich internet applications. In: Proceedings of WCRE, pp. 69–73. IEEE (2008)Google Scholar
  5. 5.
    Amalfitano, D., Fasolino, A.R., Tramontana, P.: Rich internet application testing using execution trace data. In: Proceedings of ICSTW, pp. 274–283. IEEE (2010)Google Scholar
  6. 6.
    Peng, Z., He, N., Jiang, C., Li, Z., Xu, L., Li, Y., Ren, Y.: Graph-based ajax crawl: Mining data from rich internet applications. In: Proceedings of ICCSEE, vol. 3, pp. 590–594 (March 2012)Google Scholar
  7. 7.
    Dincturk, M.E., Jourdan, G.V., Bochmann, G.v., Onut, I.V.: A model-based approach for crawling rich internet applications. ACM Transactions on the WEB (to appear, 2014)Google Scholar
  8. 8.
    Choudhary, S., Dincturk, M.E., Mirtaheri, S.M., Jourdan, G.-V., Bochmann, G.v., Onut, I.V.: Model-based rich internet applications crawling:menu and probability models. Journal of Web Engineering 13(3) (to appear, 2014)Google Scholar
  9. 9.
    Choudhary, S., Dincturk, M.E., Mirtaheri, S.M., Jourdan, G.-V., Bochmann, G.v., Onut, I.V.: Building rich internet applications models: Example of a better strategy. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 291–305. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  10. 10.
    Faheem, M., Senellart, P.: Intelligent and adaptive crawling of web applications for web archiving. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 306–322. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  11. 11.
    Amalfitano, D., Fasolino, A.R., Polcaro, A., Tramontana, P.: The dynaria tool for the comprehension of ajax web applications by dynamic analysis. In: Innovations in Systems and Software Engineering, pp. 1–17 (2013)Google Scholar
  12. 12.
    Doush, I.A., Alkhateeb, F., Maghayreh, E.A., Al-Betar, M.A.: The design of ria accessibility evaluation tool. Advances in Engineering Software 57, 1–7 (2013)CrossRefGoogle Scholar
  13. 13.
    Mesbah, A., van Deursen, A.: Invariant-based automatic testing of ajax user interfaces. In: ICSE, pp. 210–220 (May 2009)Google Scholar
  14. 14.
    Amalfitano, D., Fasolino, A.R., Tramontana, P.: A gui crawling-based technique for android mobile application testing. In: Proceedings of ICSTW, pp. 252–261. IEEE Computer Society, Washington, DC (2011)Google Scholar
  15. 15.
    Amalfitano, D., Fasolino, A.R., Tramontana, P., De Carmine, S., Memon, A.M.: Using gui ripping for automated testing of android applications. In: Proceedings of ASE, pp. 258–261. ACM, New York (2012)Google Scholar
  16. 16.
    Erfani, M., Mesbah, A.: Reverse engineering ios mobile applications. In: Proceedings of WCRE (2012)Google Scholar
  17. 17.
    Mesbah, A., Bozdag, E., van Deursen, A.: Crawling ajax by inferring user interface state changes. In: Proceedings of ICWE, pp. 122–134. IEEE (2008)Google Scholar
  18. 18.
    Ayoub, K., Aly, H., Walsh, J.: Dom based page uniqueness identification, canada patent ca2706743a1 (2010)Google Scholar
  19. 19.
    Milani Fard, A., Mesbah, A.: Feedback-directed exploration of web applications to derive test models. In: Proceedings of ISSRE, 10 pages. IEEE Computer Society (2013)Google Scholar
  20. 20.
    Choudhary, S., Dincturk, M.E., Mirtaheri, S.M., Moosavi, A., Bochmann, G.v., Jourdan, G.-V., Onut, I.-V.: Crawling rich internet applications: the state of the art. In: CASCON, pp. 146–160 (2012)Google Scholar
  21. 21.
    Mirtaheri, S.M., Dinçtürk, M.E., Hooshmand, S., Bochmann, G.v., Jourdan, G.-V., Onut, I.V.: A brief history of web crawlers. In: Proceedings of CASCON, pp. 40–54. IBM Corp. (2013)Google Scholar
  22. 22.
    Bezemer, C.P., Mesbah, A., van Deursen, A.: Automated security testing of web widget interactions. In: Proceedings of ESEC/FSE, pp. 81–90. ACM (2009)Google Scholar
  23. 23.
    Chen, A.Q.: Widget identification and modification for web 2.0 access technologies (wimwat). ACM SIGACCESS Accessibility and Computing (96), 11–18 (2010)Google Scholar
  24. 24.
    Crescenzi, V., Mecca, G., Paolo, Merialdo, et al.: Roadrunner: Towards automatic data extraction from large web sites. In: VLDB, vol. 1, pp. 109–118 (2001)Google Scholar
  25. 25.
    Harel, D.: Statecharts: A visual formalism for complex systems. Science of Computer Programming 8(3), 231–274 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Peng, Z., He, N., Jiang, C., Li, Z., Xu, L., Li, Y., Ren, Y.: Graph-based ajax crawl: Mining data from rich internet applications. In: Proceedings of ICCSEE, vol. 3, pp. 590–594. IEEE (2012)Google Scholar
  27. 27.
    Moosavi, A.: Component-based crawling of complex rich internet applications. Master’s thesis, EECS - University of Ottawa (2014), http://ssrg.site.uottawa.ca/docs/Ali-Moosavi-Thesis.pdf
  28. 28.
    Benjamin, K., Bochmann, G.v., Jourdan, G.-V., Onut, I.-V.: Some modeling challenges when testing rich internet applications for security. In: Proceedings of ICSTW, pp. 403–409. IEEE Computer Society, Washington, DC (2010)Google Scholar
  29. 29.
    Choudhary, S., Dincturk, M.E., Bochmann, G.v., Jourdan, G.-V., Onut, I.V., Ionescu, P.: Solving some modeling challenges when testing rich internet applications for security. In: Proceedings of ICST, pp. 850–857 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Ali Moosavi
    • 1
  • Salman Hooshmand
    • 1
  • Sara Baghbanzadeh
    • 1
  • Guy-Vincent Jourdan
    • 1
    • 2
  • Gregor V. Bochmann
    • 1
    • 2
  • Iosif Viorel Onut
    • 3
    • 4
  1. 1.EECS - University of OttawaCanada
  2. 2.Fellow of IBM Canada CAS ResearchCanada
  3. 3.Research and DevelopmentIBM® Security AppScan® EnterpriseCanada
  4. 4.IBM Canada Software Lab.Canada

Personalised recommendations