Harvesting Multiple Resources for Software as a Service Offers: A Big Data Study

  • Asma Musabah Alkalbani
  • Ahmed Mohamed GhamryEmail author
  • Farookh Khadeer HussainEmail author
  • Omar Khadeer HussainEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9947)


Currently, the World Wide Web (WWW) is the primary resource for cloud services information, including offers and providers. Cloud applications (Software as a Service), such as Google App, are one of the most popular and commonly used types of cloud services. Having access to a large amount of information on SaaS offers is critical for the potential cloud client to select and purchase an appropriate service. Web harvesting has become a primary tool for discovering knowledge from the Web source. This paper describes the design and development of Web scraper to collect information on SaaS offers from target Digital cloud services advertisement portals, namely, and The collected data were used to establish two datasets: a SaaS provider’s dataset and a SaaS reviews/feedback dataset. Further, we applied sentiment analysis on the reviews dataset to establish a third dataset called the SaaS sentiment polarity dataset. The significance of this study is that the first work focuses on Web harvesting for cloud computing domain, and it also establishes the first SaaS services datasets. Furthermore, we present statistical data that can be helpful to determine the current status of SaaS services and the number of services offered on the Web. In our conclusion, we provide further insight into improving Web scraping for SaaS service information. Our datasets are available online through


Software as a Service Service offer Web harvesting SaaS dataset 


  1. 1.
    Gartner says worldwide it spending is forecast to grow 0.6 percent in 2016. Accessed 08 Aug 2016
  2. 2.
    Worldwide SaaS and cloud software 2015–2019 forecast and 2014 vendor shares - 257397. Accessed 08 Aug 2016
  3. 3.
    Cloud reviews — cloud hosting — managed cloud — cloud storage & apps. Accessed 08 Aug 2016
  4. 4.
    Business software reviews, SaaS & cloud applications directory – getapp. Accessed 08 Aug 2016
  5. 5.
    Afify, Y., Moawad, I., Badr, N., Tolba, M.: A semantic-based software-as-a-service (SaaS) discovery and selection system. In: 2013 8th International Conference on Computer Engineering & Systems (ICCES), pp. 57–63. IEEE (2013)Google Scholar
  6. 6.
    Han, T., Sim, K.M.: An ontology-enhanced cloud service discovery system. In: Proceedings of the International MultiConference of Engineers and Computer Scientists, vol. 1, pp. 17–19 (2010)Google Scholar
  7. 7.
    Kang, J., Sim, K.M.: Cloudle: an agent-based cloud search engine that consults a cloud ontology. In: Proceedings of the International Conference on Cloud Computing and Virtualization, pp. 312–318. Citeseer (2010)Google Scholar
  8. 8.
    Noor, T.H., Sheng, Q.Z., Alfazi, A., Ngu, A.H., Law, J.: CSCE: a crawler engine for cloud services discovery on the world wide web. In: 2013 IEEE 20th International Conference on Web Services (ICWS), pp. 443–450. IEEE (2013)Google Scholar
  9. 9.
    Weikum, G., Theobald, M.: From information to knowledge: harvesting entities and relationships from web sources. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 65–76. ACM (2010)Google Scholar
  10. 10.
    Vasudevan, M., Haleema, P., Iyengar, N.C.S.: Semantic discovery of cloud service catalog published over resource description framework. Int. J. Grid Distrib. Comput. 7(6), 211–220 (2014)CrossRefGoogle Scholar
  11. 11.
    Kang, J., Sim, K.M.: Cloudle: a multi-criteria cloud service search engine. In: 2010 IEEE Asia-Pacific Services Computing Conference (APSCC), pp. 339–346. IEEE (2010)Google Scholar
  12. 12.
    Tahamtan, A., Beheshti, S.A., Anjomshoaa, A., Tjoa, A.M.: A cloud repository and discovery framework based on a unified business and cloud service ontology. In: 2012 IEEE Eighth World Congress on Services, pp. 203–210. IEEE (2012)Google Scholar
  13. 13.
    Kerrigan, M., Mocan, A., Tanler, M., Fensel, D.: The web service modeling toolkit - an integrated development environment for semantic web services. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 789–798. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-72667-8_57 CrossRefGoogle Scholar
  14. 14.
    Alkalbani, A., Shenoy, A., Hussain, F.K., Hussain, O.K., Xiang, Y.: Design and implementation of the hadoop-based crawler for SaaS service discovery. In: 2015 IEEE 29th International Conference on Advanced Information Networking and Applications, pp. 785–790. IEEE (2015)Google Scholar
  15. 15.
    Alkalbani, A.M., Ghamry, A.M., Hussain, F.K., Hussain, O.K.: Sentiment analysis and classification for software as a service reviews. In: 2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA), pp. 53–58. IEEE (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Decision Support and e-Service Intelligence Lab, School of Software, Center of Quantum Computation and Intelligent SystemsUniversity of TechnologySydneyAustralia
  2. 2.School of BusinessUniversity of New South Wales Canberra (UNSW Canberra)CampbellAustralia
  3. 3.Australian Defence Force AcademyCanberraAustralia

Personalised recommendations