Skip to main content

A Review of Integration of Data Warehousing and WWW in the Last Decade

  • Conference paper
  • First Online:
Proceedings of Third International Conference on Computing, Communications, and Cyber-Security

Abstract

The data warehouse (DW) is a powerful technology to store and analyse huge volumes of historical data supporting business intelligence. The World Wide Web, or simply the Web, has revolutionized the way to author, share, search and access information. In the past few decades, a significant amount of research has been done in both the DW and Web domains. Interestingly, the integration of data warehousing and the World Wide Web has led to a variety of new opportunities as well as challenges for the researchers and the industry. The main motivation to conduct this systematic review of the relevant research works integrating DW and the Web in the last decade is to provide the groundwork for the research advancement in this field. A total of 27 relevant research works were identified for the research. An in-depth analysis was performed to find the problems addressed, the most relevant research categories, the tools or techniques applied and the application domains of these research works. Encouragingly, our results yielded seven categories and four sub-categories of research employing the integration of DW and Web. On the other hand, we found some open research issues, and the future research works should focus on generalized solutions for handling semantic heterogeneity, change propagation and quality analysis of identified Web sources for the DW.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chandra, P., & Gupta, M. K. (2018). Comprehensive survey on data warehousing research. International Journal of Information Technology, 10, 217–224.

    Article  Google Scholar 

  2. Perez, J. M., Berlanga, R., Aramburu, M. J., & Pedersen, T. B. (2008). Integrating data warehouses with web data: A survey. IEEE Transactions on Knowledge and Data Engineering, 20, 940–955.

    Article  Google Scholar 

  3. Inmon, W. H. (2005). Building the data warehouse. Wiley.

    Google Scholar 

  4. Brajkovic, H., Jaksic, D., & Poscic, P. (2020). Data warehouse and data quality—An overview. In Central European Conference on Information and Intelligent Systems 2020 (pp. 17–24).

    Google Scholar 

  5. Kimball, R., & Ross, M. (2002). The data warehouse toolkit. Wiley.

    Google Scholar 

  6. Bhutani, P., & Saha, A. (2019). Towards an evolved information food chain of world wide web and taxonomy of semantic web mining. In S. Bhattacharyya, A. E. Hassanien, D. Gupta, A. Khanna, & I. Pan (Eds.), International Conference on Innovative Computing and Communications (pp. 443–451). Springer.

    Google Scholar 

  7. WorldWideWebSize.com | The size of the World Wide Web (The Internet). https://www.worldwidewebsize.com/, last accessed 2021/09/20.

  8. Zhu, Y., & Buchmann, A. (2002). Evaluating and selecting web sources as external information resources of a data warehouse. In Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002 (pp. 149–160). IEEE.

    Google Scholar 

  9. Bhutani, P., Saha, A., & Gosain, A. (2021). Empirical validation of WebQMDW model for quality-based external web data source incorporation in a data warehouse. IJACSA, 12.

    Google Scholar 

  10. Kitchenham, B., Pearl Brereton, O., Budgen, D., Turner, M., Bailey, J., & Linkman, S. (2009). Systematic literature reviews in software engineering—A systematic literature review. Information and Software Technology, 51, 7–15.

    Article  Google Scholar 

  11. Google Scholar. https://scholar.google.co.in/, last accessed 2021/09/20.

  12. IEEE Xplore. https://ieeexplore.ieee.org/Xplore/home.jsp, last accessed 2021/09/20.

  13. Springer—International Publisher Science, Technology, Medicine. https://www.springer.com/gp/, last accessed 2021/09/20.

  14. ScienceDirect.com | Science, health and medical journals, full text articles and books. https://www.sciencedirect.com/, last accessed 2021/09/20.

  15. ACM Digital Library. https://dl.acm.org/, last accessed 2021/09/20.

  16. Liu, X., & Luo, X. (2010). A data warehouse solution for e-Government. International Journal of Research and Reviews in Applied Sciences, 4, 101–105.

    Google Scholar 

  17. Sudhamathy, G. (2010). Mining web logs: An automated approach. In Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India (pp. 1–4).

    Google Scholar 

  18. Chen, X., Wu, Y., & Cheng, H. (2010). Quotient space granular computing for the Click-stream data warehouse in web servers. In 2010 International Conference on Computer and Communication Technologies in Agriculture Engineering (pp. 93–96). IEEE, Chengdu, China.

    Google Scholar 

  19. Moya, L. G., Kudama, S., Cabo, M. J. A., & Llavori, R. B. (2011). Integrating web feed opinions into a corporate data warehouse. In Proceedings of the 2nd International Workshop on Business intelligence and the WEB—BEWEB ’11 (pp. 20–27), Uppsala, Sweden. ACM Press.

    Google Scholar 

  20. Nguyen, B., Vion, A., Dudouet, F.-X., Colazzo, D., Manolescu, I., & Senellart, P. (2011). XML content warehousing: Improving sociological studies of mailing lists and web data. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique, 112, 5–31.

    Article  Google Scholar 

  21. Marotta, A., González, L., & Ruggia, R. (2012). A quality aware service-oriented web warehouse platform. In Proceedings of the 2012 Joint EDBT/ICDT Workshops on EDBT-ICDT ’12 (p. 29), Berlin, Germany. ACM Press.

    Google Scholar 

  22. Lv, H. L., Van, A. M., Cheng, V. L., & Wang, F. V. (2012). Design of cloud data warehouse and its application in smart grid. In International Conference on Automatic Control and Artificial Intelligence (ACAI 2012) (pp. 849–852), Xiamen, China. Institution of Engineering and Technology.

    Google Scholar 

  23. Ali, A. A., Abdelrahman, T. A., & Mohamed, W. M. (2013). Using schema matching in data transformation for warehousing web data. International Journal of Information Technologies and Knowledge, 7, 230–240.

    Google Scholar 

  24. Domingues, M. A., Soares, C., Jorge, A. M., & Rezende, S. O. (2014). A data warehouse to support web site automation. Journal of the Brazilian Computer Society, 20, 11.

    Article  Google Scholar 

  25. Mehmood, R., Shaikh, M. U., Ma, L., & Bie, R. (2014). Enhanced web warehouse model: A secure approach. In 2014 International Conference on Identification, Information and Knowledge in the Internet of Things (pp. 88–91), Beijing, China. IEEE.

    Google Scholar 

  26. Samuel, J. (2014). Feeding a data warehouse with data coming from web services. A mediation approach for the DaWeS prototype (Doctoral thesis), Université Blaise Pascal-Clermont-Ferrand II.

    Google Scholar 

  27. Kavitha, P., & Vydehi, M. S. (2014). Query processing of XML data warehouse using XML pattern matching techniques. International Journal of Engineering Research, 3.

    Google Scholar 

  28. Delgado, A., & Marotta, A. (2015). Automating the process of building flexible web warehouses with BPM systems. In 2015 Latin American Computing Conference (CLEI) (pp. 1–11), Arequipa, Peru. IEEE.

    Google Scholar 

  29. Jiang, Y., Shao, Z., Guo, Y., Zhang, H., & Sun, L. (2015). Building XML data warehouse with data reconstruction by knowledge graph. In 2015 IEEE Fifth International Conference on Big Data and Cloud Computing (pp. 314–320), Dalian, China. IEEE.

    Google Scholar 

  30. Mehmood, R., Shaikh, M. U., Bie, R., Dawood, H., & Dawood, H. (2015). IoT-enabled web warehouse architecture: A secure approach. Personal and Ubiquitous Computing, 19, 1157–1167.

    Article  Google Scholar 

  31. Om Sharan Sinha, H. (2016). An improvised Topsis approach to select web source as external data source for web warehousing. Indian Journal of Science and Technology, 9.

    Google Scholar 

  32. Nikam, R. V., Shirwaikar, S., & Kharat, V. S. (2016). Conceptual model for a data warehouse on the web. In 2016 IEEE Bombay Section Symposium (IBSS) (pp. 1–6), Baramati, India. IEEE.

    Google Scholar 

  33. Ravat, F., & Song, J. (2016). Enabling OLAP analyses on the web of data. In 2016 Eleventh International Conference on Digital Information Management (ICDIM) (pp. 215–224), Porto, Portugal. IEEE.

    Google Scholar 

  34. Gupta, G., Kumar, N., & Chhabra, I. (2017). Data acquisition based web scrapping algorithm for extraction of data sets from patent portal. In International Conference on Communication, Computing and Networking (ICCCN-2017), Chandigarh, India. NITTTR.

    Google Scholar 

  35. Alrefae, A., & Cao, J. (2017). Intensional XML-enabled web-based real-time decision support system. In 2017 International Conference on Computing Networking and Informatics (ICCNI) (pp. 1–10), Lagos. IEEE.

    Google Scholar 

  36. Gupta, G., Kumar, N., & Chhabra, I. (2018). Optimised transformation algorithm for hadoop data loading in web ETL framework. ICST Transactions on Scalable Information Systems, 160600.

    Google Scholar 

  37. Strand, M., & Syberfeldt, A. (2019). Incorporating external data into a BI solution at a public waste management organization. International Journal of Business Intelligence Research, 10, 36–56.

    Article  Google Scholar 

  38. Walha, A., Ghozzi, F., & Gargouri, F. (2019). From user generated content to social data warehouse: Processes, operations and data modelling. IJWET, 14, 203.

    Article  Google Scholar 

  39. Agapito, G., Zucco, C., & Cannataro, M. (2020). COVID-WAREHOUSE: A data warehouse of Italian covid-19, pollution, and climate data. IJERPH, 17, 5596.

    Article  Google Scholar 

  40. Sellami, A., Nabli, A., & Gargouri, F. (2020). Graph NoSQL data warehouse creation. In Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services (pp. 34–38), Chiang Mai, Thailand. ACM.

    Google Scholar 

  41. Bhutani, P., Saha, A., & Gosain, A. (2020). WSEMQT : A novel approach for quality-based evaluation of web data sources for a data warehouse. IET Software, 14, 806–815.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Priyanka Bhutani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhutani, P., Saha, A., Gosain, A. (2023). A Review of Integration of Data Warehousing and WWW in the Last Decade. In: Singh, P.K., Wierzchoń, S.T., Tanwar, S., Rodrigues, J.J.P.C., Ganzha, M. (eds) Proceedings of Third International Conference on Computing, Communications, and Cyber-Security. Lecture Notes in Networks and Systems, vol 421. Springer, Singapore. https://doi.org/10.1007/978-981-19-1142-2_58

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-1142-2_58

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-1141-5

  • Online ISBN: 978-981-19-1142-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics