Abstract
The data warehouse (DW) is a powerful technology to store and analyse huge volumes of historical data supporting business intelligence. The World Wide Web, or simply the Web, has revolutionized the way to author, share, search and access information. In the past few decades, a significant amount of research has been done in both the DW and Web domains. Interestingly, the integration of data warehousing and the World Wide Web has led to a variety of new opportunities as well as challenges for the researchers and the industry. The main motivation to conduct this systematic review of the relevant research works integrating DW and the Web in the last decade is to provide the groundwork for the research advancement in this field. A total of 27 relevant research works were identified for the research. An in-depth analysis was performed to find the problems addressed, the most relevant research categories, the tools or techniques applied and the application domains of these research works. Encouragingly, our results yielded seven categories and four sub-categories of research employing the integration of DW and Web. On the other hand, we found some open research issues, and the future research works should focus on generalized solutions for handling semantic heterogeneity, change propagation and quality analysis of identified Web sources for the DW.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chandra, P., & Gupta, M. K. (2018). Comprehensive survey on data warehousing research. International Journal of Information Technology, 10, 217–224.
Perez, J. M., Berlanga, R., Aramburu, M. J., & Pedersen, T. B. (2008). Integrating data warehouses with web data: A survey. IEEE Transactions on Knowledge and Data Engineering, 20, 940–955.
Inmon, W. H. (2005). Building the data warehouse. Wiley.
Brajkovic, H., Jaksic, D., & Poscic, P. (2020). Data warehouse and data quality—An overview. In Central European Conference on Information and Intelligent Systems 2020 (pp. 17–24).
Kimball, R., & Ross, M. (2002). The data warehouse toolkit. Wiley.
Bhutani, P., & Saha, A. (2019). Towards an evolved information food chain of world wide web and taxonomy of semantic web mining. In S. Bhattacharyya, A. E. Hassanien, D. Gupta, A. Khanna, & I. Pan (Eds.), International Conference on Innovative Computing and Communications (pp. 443–451). Springer.
WorldWideWebSize.com | The size of the World Wide Web (The Internet). https://www.worldwidewebsize.com/, last accessed 2021/09/20.
Zhu, Y., & Buchmann, A. (2002). Evaluating and selecting web sources as external information resources of a data warehouse. In Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002 (pp. 149–160). IEEE.
Bhutani, P., Saha, A., & Gosain, A. (2021). Empirical validation of WebQMDW model for quality-based external web data source incorporation in a data warehouse. IJACSA, 12.
Kitchenham, B., Pearl Brereton, O., Budgen, D., Turner, M., Bailey, J., & Linkman, S. (2009). Systematic literature reviews in software engineering—A systematic literature review. Information and Software Technology, 51, 7–15.
Google Scholar. https://scholar.google.co.in/, last accessed 2021/09/20.
IEEE Xplore. https://ieeexplore.ieee.org/Xplore/home.jsp, last accessed 2021/09/20.
Springer—International Publisher Science, Technology, Medicine. https://www.springer.com/gp/, last accessed 2021/09/20.
ScienceDirect.com | Science, health and medical journals, full text articles and books. https://www.sciencedirect.com/, last accessed 2021/09/20.
ACM Digital Library. https://dl.acm.org/, last accessed 2021/09/20.
Liu, X., & Luo, X. (2010). A data warehouse solution for e-Government. International Journal of Research and Reviews in Applied Sciences, 4, 101–105.
Sudhamathy, G. (2010). Mining web logs: An automated approach. In Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India (pp. 1–4).
Chen, X., Wu, Y., & Cheng, H. (2010). Quotient space granular computing for the Click-stream data warehouse in web servers. In 2010 International Conference on Computer and Communication Technologies in Agriculture Engineering (pp. 93–96). IEEE, Chengdu, China.
Moya, L. G., Kudama, S., Cabo, M. J. A., & Llavori, R. B. (2011). Integrating web feed opinions into a corporate data warehouse. In Proceedings of the 2nd International Workshop on Business intelligence and the WEB—BEWEB ’11 (pp. 20–27), Uppsala, Sweden. ACM Press.
Nguyen, B., Vion, A., Dudouet, F.-X., Colazzo, D., Manolescu, I., & Senellart, P. (2011). XML content warehousing: Improving sociological studies of mailing lists and web data. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique, 112, 5–31.
Marotta, A., González, L., & Ruggia, R. (2012). A quality aware service-oriented web warehouse platform. In Proceedings of the 2012 Joint EDBT/ICDT Workshops on EDBT-ICDT ’12 (p. 29), Berlin, Germany. ACM Press.
Lv, H. L., Van, A. M., Cheng, V. L., & Wang, F. V. (2012). Design of cloud data warehouse and its application in smart grid. In International Conference on Automatic Control and Artificial Intelligence (ACAI 2012) (pp. 849–852), Xiamen, China. Institution of Engineering and Technology.
Ali, A. A., Abdelrahman, T. A., & Mohamed, W. M. (2013). Using schema matching in data transformation for warehousing web data. International Journal of Information Technologies and Knowledge, 7, 230–240.
Domingues, M. A., Soares, C., Jorge, A. M., & Rezende, S. O. (2014). A data warehouse to support web site automation. Journal of the Brazilian Computer Society, 20, 11.
Mehmood, R., Shaikh, M. U., Ma, L., & Bie, R. (2014). Enhanced web warehouse model: A secure approach. In 2014 International Conference on Identification, Information and Knowledge in the Internet of Things (pp. 88–91), Beijing, China. IEEE.
Samuel, J. (2014). Feeding a data warehouse with data coming from web services. A mediation approach for the DaWeS prototype (Doctoral thesis), Université Blaise Pascal-Clermont-Ferrand II.
Kavitha, P., & Vydehi, M. S. (2014). Query processing of XML data warehouse using XML pattern matching techniques. International Journal of Engineering Research, 3.
Delgado, A., & Marotta, A. (2015). Automating the process of building flexible web warehouses with BPM systems. In 2015 Latin American Computing Conference (CLEI) (pp. 1–11), Arequipa, Peru. IEEE.
Jiang, Y., Shao, Z., Guo, Y., Zhang, H., & Sun, L. (2015). Building XML data warehouse with data reconstruction by knowledge graph. In 2015 IEEE Fifth International Conference on Big Data and Cloud Computing (pp. 314–320), Dalian, China. IEEE.
Mehmood, R., Shaikh, M. U., Bie, R., Dawood, H., & Dawood, H. (2015). IoT-enabled web warehouse architecture: A secure approach. Personal and Ubiquitous Computing, 19, 1157–1167.
Om Sharan Sinha, H. (2016). An improvised Topsis approach to select web source as external data source for web warehousing. Indian Journal of Science and Technology, 9.
Nikam, R. V., Shirwaikar, S., & Kharat, V. S. (2016). Conceptual model for a data warehouse on the web. In 2016 IEEE Bombay Section Symposium (IBSS) (pp. 1–6), Baramati, India. IEEE.
Ravat, F., & Song, J. (2016). Enabling OLAP analyses on the web of data. In 2016 Eleventh International Conference on Digital Information Management (ICDIM) (pp. 215–224), Porto, Portugal. IEEE.
Gupta, G., Kumar, N., & Chhabra, I. (2017). Data acquisition based web scrapping algorithm for extraction of data sets from patent portal. In International Conference on Communication, Computing and Networking (ICCCN-2017), Chandigarh, India. NITTTR.
Alrefae, A., & Cao, J. (2017). Intensional XML-enabled web-based real-time decision support system. In 2017 International Conference on Computing Networking and Informatics (ICCNI) (pp. 1–10), Lagos. IEEE.
Gupta, G., Kumar, N., & Chhabra, I. (2018). Optimised transformation algorithm for hadoop data loading in web ETL framework. ICST Transactions on Scalable Information Systems, 160600.
Strand, M., & Syberfeldt, A. (2019). Incorporating external data into a BI solution at a public waste management organization. International Journal of Business Intelligence Research, 10, 36–56.
Walha, A., Ghozzi, F., & Gargouri, F. (2019). From user generated content to social data warehouse: Processes, operations and data modelling. IJWET, 14, 203.
Agapito, G., Zucco, C., & Cannataro, M. (2020). COVID-WAREHOUSE: A data warehouse of Italian covid-19, pollution, and climate data. IJERPH, 17, 5596.
Sellami, A., Nabli, A., & Gargouri, F. (2020). Graph NoSQL data warehouse creation. In Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services (pp. 34–38), Chiang Mai, Thailand. ACM.
Bhutani, P., Saha, A., & Gosain, A. (2020). WSEMQT : A novel approach for quality-based evaluation of web data sources for a data warehouse. IET Software, 14, 806–815.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bhutani, P., Saha, A., Gosain, A. (2023). A Review of Integration of Data Warehousing and WWW in the Last Decade. In: Singh, P.K., Wierzchoń, S.T., Tanwar, S., Rodrigues, J.J.P.C., Ganzha, M. (eds) Proceedings of Third International Conference on Computing, Communications, and Cyber-Security. Lecture Notes in Networks and Systems, vol 421. Springer, Singapore. https://doi.org/10.1007/978-981-19-1142-2_58
Download citation
DOI: https://doi.org/10.1007/978-981-19-1142-2_58
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-1141-5
Online ISBN: 978-981-19-1142-2
eBook Packages: EngineeringEngineering (R0)