Skip to main content
Log in

Active XML-based Web data integration

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

Today, the Web is the largest source of information worldwide. There is currently a strong trend for decision-making applications such as Data Warehousing (DW) and Business Intelligence (BI) to move onto the Web, especially in the cloud. Integrating data into DW/BI applications is a critical and time-consuming task. To make better decisions in DW/BI applications, next generation data integration poses new requirements to data integration systems, over those posed by traditional data integration. In this paper, we propose a generic, metadata-based, service-oriented, and event-driven approach for integrating Web data timely and autonomously. Beside handling data heterogeneity, distribution and interoperability, our approach satisfies near real-time requirements and realize active data integration. For this sake, we design and develop a framework that utilizes Web standards (e.g., XML and Web services) for tackling data heterogeneity, distribution and interoperability issues. Moreover, our framework utilizes Active XML (AXML) to warehouse passive data as well as services to integrate active and dynamic data on-the-fly. AXML embedded services and changes detection services ensure near real-time data integration. Furthermore, the idea of integrating Web data actively and autonomously revolves around mining events logged by the data integration environment. Therefore, we propose an incremental XML-based algorithm for mining association rules from logged events. Then, we define active rules dynamically upon mined data to automate and reactivate integration tasks. Finally, as a proof of concept, we implement a framework prototype as a Web application using open-source tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Notes

  1. http://www.informatica.com

  2. http://www.programmableweb.com/apis/directory/

  3. http://www.w3.org/TR/xquery/

  4. http://eric.univ-lyon2.fr/~rsalem/axdi/

  5. http://kettle.pentaho.com/

  6. http://www.modis.ispras.ru/sedna/

  7. http://www.topologi.com/diffx/

References

  • Abiteboul, S., Benjelloun, O., Milo, T. (2002). Web services and data integration. In Proceedings of the 3rd international conference on web information systems engineering, WISE ’02, (pp. 3–6). Washington, DC, USA: IEEE Computer Society.

    Google Scholar 

  • Abiteboul, S., Nguyen, B., Ruberg, G. (2006). Building an active content warehouse In Darmont, & Boussaïd (Eds.), Processing and managing complex data for decision support. Idea Group.

  • Abiteboul, S., Benjelloun, O., Milo, T. (2008a). The active XML: an overview. VLDB Journal, 17(5), 1019–1040.

    Google Scholar 

  • Abiteboul, S., Manolescu, I., Zoupanos, S. (2008b). OptimAX: Optimizing distributed activeXML applications In Schwabe, D., Curbera, F., Dantzig, P. (Eds.), ICWE, IEEE (pp. 299–310).

  • Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of 20th International Conference on Very Large DataBase (VLDB’94), September 12–15, (pp.487–499). Santiago de Chile, Chile.

  • Bailey, J., Poulovassilis, A., Wood, P.T. (2002). An event-condition-action language for XML. In The 12th international world wide web conference, WWW (pp. 486–495). Hawaaii.

  • Baril, X., & Bellahs̀ene, Z. (2003). Designing and managing an XML warehouse. In XML data management: Native XML and XML-enabled database systems (pp. 455–473). Addison Wesley.

  • Bentayeb, F., Maiz, N., Mahboubi, H., Favre, C., Loudcher, S., Harbi, N., Boussaid, O., Darmont, J. (2011). Innovative approaches for efficiently warehousing complex data from the web, business science reference. In Zorrilla,M.,Mazón, J., Ferràndez, Ó ., Garrigós, I., Daniel, F., Trujillo, J. (Eds.), Business intelligence applications and the web: Models, systems and technologies (pp. 26–52).

  • Bhowmick, S.S., Madria, S.K., Ng, W.K. (2003). Web data management: A warehouse approach: Springer-Verlag, New York Inc.

  • Bonifati, A., Braga, D., Campi, A., Ceri, S. (2002a). Active XQuery. In Proceedings of the 18th International Conference on Data Engineering (ICDE’02) (p. 403). San Jose, CA.

  • Bonifati, A., Ceri, S., Paraboschi, S. (2002b). Pushing reactive services to XML repositories using active rules. Computer Networks, 39(5), 645–660.

    Google Scholar 

  • Boussaïd, O., Messaoud, R.B., Choquet, R., Anthoard, S. (2006). X-warehousing: An XML-based approach for warehousing complex data. In 10th East-European on Advances in Databases and Information Systems (ADBIS’06) (pp. 39–54). Thessaloniki, Greece.

  • Boussaid, O., Darmont, J., Bentayeb, F., Loudcher, S. (2008). Warehousing complex data from the web. International Journal of Web Engineering and Technology, 4, 408–433.

    Article  Google Scholar 

  • Brobst, S., & Ballinger, C. (2003). Active data warehousing: Why Teradata warehouse is the only proven, platform. NCR Teradata, white paper. http://whitepapers.zdnet.co.uk/. Accessed October 2011.

  • Chawathe, S.S., Garcia-Molina, H., Hammer, J., Ireland, K., Papakonstantinou, Y., Ullman, J.D., Widom, J. (1994). The TSIMMIS project: Integration of heterogeneous information sources. In IPSJ (pp. 7–18).

  • Darmont, J., & Boussa¨ıd, O. (2006). Processing and managing complex data for decision support. Idea Group Inc (IGI).

  • Darmont, J., Boussaid, O., Christian Ralaivao, J., Aouiche, K. (2005). An architecture framework for complex data warehouses. In 7th International Conference on Enterprise Information Systems (ICEIS’05) (pp. 370–373). Miami, USA.

  • Erl, T. (2004). Service-oriented architecture: A field guide to integrating XML and web services.: Prentice Hall.

  • Feng, L., & Dillon, T. (2004). Mining interesting XML-enabled association rules with templates. Springer.

  • Gaber, M.M., Zaslavsky, A.B., Krishnaswamy, S. (2005). Mining data streams: A review. ACM SIGMOD Record, 34(2), 18–26.

    Article  Google Scholar 

  • Halevy, A.Y., Rajaraman, A., Ordille, J.J. (2006). Data integration: The teenage years. In Dayal, U., Whang, K.-Y., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y.-K. (Eds.), Proceedings of VLDB (pp. 9–16).

  • Han, J., & Kamber,M. (2005). Data mining: Concepts and techniques, 2nd edn. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

  • Hümmer, W., Bauer, A., Harde, G. (2003). Xcube: XML for data ware houses. In 6th International Workshop on Data Warehousing and OLAP (DOLAP’03) (pp. 33–40). New Orleans, USA.

  • Inmon, W.H. (2002). Building the data warehouse, 2nd edn. New York: John Wiley & Sons.

    Google Scholar 

  • Inmon, W.H., Strauss, D., Neushloss, G. (2008). DW 2.0: The architecture for the next generation of data warehousing. Morgan Kaufmann.

  • Janjua, N., Hussain, F., Hussain, O. (2012). Semantic information and knowledge integration through argumentative reasoning to support intelligent decision making. Information Systems Frontiers, 1–26. doi:10.1007/s10796-012-9365-x.

  • Jiang, N., & Gruenwald, L. (2006). Research issues in data stream association rule mining. ACM SIGMOD Record, 35(1), 14–19.

    Article  Google Scholar 

  • Karakasidis, A., Vassiliadis, P., Pitoura, E. (2005). ETL queues for active data warehousing. In Proceedings of 2nd international workshop on Information Quality in Information Systems (IQIS’05) (pp. 28–39). Baltimore, USA.

  • Kimball, R., & Merz, R. (2000). The data Webhouse toolkit: Building the web-enabled data warehouse. John Wiley & Sons.

  • Kimball, R., & Ross, M. (2002). The data warehouse toolkit: The complete guide to dimensional modeling, 2nd edn. JohnWiley & Sons: New York.

    Google Scholar 

  • Knoblock, C.A., Minton, S., Ambite, J.L., Ashish, N., Muslea, I., Philpot, A.G., Tejada, S. (2001). The ariadne approach to web-based information integration. International Journal of Cooperative Information Systems, 10(1 & 2), 145–169.

    Article  Google Scholar 

  • Li, G., & Wei, M. (2012). Everything-as-a-service platform for on-demand virtual enterprises. Information Systems Frontiers, 1–18. doi:10.1007/s10796-012-9351-3.

  • Linthicum, D.S. (2010). Approaching SaaS integration with data integration best practices and technology. White paper. http://www.informaticacloud.com/images/whitepapers/WP-Approaching_SaaS_Integration.pdf.

  • Lorenzo, G.D., Hacid, H., Paik, H.-Y., Benatallah, B. (2009). Data integration in mashups. SIGMOD Record, 38(1), 59–66.

    Article  Google Scholar 

  • Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., Yu, C. (2007). Web-scale data integration: You can afford to pay as you go. In CIDR (pp. 342–350). www.crdrdb.org.

  • Mahboubi, H., Hachicha, M., Darmont, J. (2008). In Encyclopedia of data warehousing and mining 2nd edn, (pp. 2109–2116). USA: IGI Publishing.

    Chapter  Google Scholar 

  • Martens, B., & Teuteberg, F. (2012). Decision-making in cloud computing environments: A cost and risk based approach. Information Systems Frontiers, 14, 871–893. doi:10.1007/s10796-011-9317-x.

    Article  Google Scholar 

  • Milo, T., Abiteboul, S., Anman, B., Benjelloun, O., Ngoc, F. (2003). Exchanging intentional XML data. In Proceedings of international ACM special interest group for the management of data (SIGMOD’03) (pp. 289–300).

  • Naeem, M., Dobbie, G., Weber, G. (2011). X-hybridjoin for near-real-time data warehousing. In Fernandes, A., Gray, A., Belhajjame, K. (Eds.), Advances in databases, lecture notes in computer science (Vol. 7051, pp. 33–47). Berlin / Heidelberg: Springer.

  • Naeem, M.A., Dobbie, G., Webber, G. (2008). An event-based near real-time data integration architecture. In Proc. 12th enterprise distributed object computing conf. workshops (pp. 401–404).

  • Nassis, V., Rajugan, R., Dillon, T., Rahayu, J. (2005). Conceptual and systematic design approach for XML document warehouses. International Journal of Data Warehousing & Mining, 1(3), 63–86.

    Article  Google Scholar 

  • Onose, N., & Siméon, J. (2004). XQuery at your web service. In Feldman, S.I., Uretsky, M., Najork, M.,Wills, C.E. (Eds.), WWW, ACM (pp. 603–611). doi:10.1145/988672.988754.

  • Oracle, W.P. (2010). Real-time data integration for data warehousing and operational business intelligence (p. 17). Oracle White Paper. http://www.oracle.com/us/products/middleware/data-integration/goldengate11g-realtimedw-wp-168215.pdf.

  • Park, B., Han, H., Song, I. (2005). XML-OLAP: A multidimensional analysis framework for XML warehouses. In 7th International Conference on Data Warehousing and Knowledge Discovery (DaWaK’05) (pp. 32–42). Copenhagen, Denmark.

  • Paton, N. (1999). Active rules in database systems. New York: Springer.

    Book  Google Scholar 

  • Pérez, J.M., Llavori, R.B., Aramburu, M.J., Pedersen, T.B. (2008). Integrating data warehouses with web data: A survey. IEEE Transactions on Knowledge and Data Engineering, 20(7), 940–955.

    Article  Google Scholar 

  • Phan, B., Pardede, E., Rahayu, W. (2012). On the improvement of active xml (axml) representation and query evaluation. Information Systems Frontiers, 1–20. doi:10.1007/s10796-012-9363-z.

  • Pokorný, J. (2002). XML data warehouse: Modelling and querying. In 5th International baltic conference (pp. 267–280). (BalticDB&IS’02).

  • Polyzotis, N., Skiadopoulos, S., Vassiliadis, P., Simitsis, A., Frantzell, N. (2007). Supporting streaming updates in an active data warehouse. In 23rd International Conference Data Engineering(ICDE’07) (pp. 476–485). Istanbul, Turkey.

  • Rajugan, R., Chang, E., Dillon, T. (2005). Conceptual design of an XML FACT repository for dispersed XML document warehouses and XML marts. In 5th international conference on Computer and Information Technology (CIT’05) (pp. 141–149). Shanghai, China.

  • Rekouts, M. (2005). Incorporating active rules processing into update execution in XML database systems. In 16th International Workshop on Database and Expert Systems Applications(DEXA’05). Copenhagen, Denmark.

  • Ruberg, G., & Mattoso, M. (2008). XCraft: Boosting the performance of active XML materialization. In 11th International Conference on Extending Database Technology (EDBT’08) (pp. 299–310). Nantes, France.

  • Rusu, L.I., Rahayu, J.W., Taniar, D. (2005). A methodology for building XML data warehouses. International Journal of Data Warehousing & Mining, 1(2), 67–92.

    Google Scholar 

  • Salem, R., Boussaïd, O., Darmont J. (2010). Conceptual workflow for complex data integration using AXML. In International Conference on Machine and Web Intelligence (ICMWI 10). Algiers, Algeria.

  • Salem, R., Darmont, J., Boussaïd, O. (2011). Efficient incremental breadth-depth xml event mining. In 15th International Database Engineering & Applications Symposium (IDEAS’11). ACM Lisbon, Portugal.

  • Schlesinger, L., Irmert, F., Lehner, W. (2005). Supporting the ETL-process by web service technologies. Int J of Web and Grid Services, 1(1), 31–47.

    Article  Google Scholar 

  • Sheth, A.P., & Larson, J.A. (1990). Federated database systems for managing distributed and autonomous databases. ACM Computing Surveys, 183–236.

  • Thalhammer, T., Schrefl, M., Mohania, M. (2001). Active data warehouses: Complementing OLAP with active rules. Data and Knowledge Engineering, 39(3), 241–269.

    Article  Google Scholar 

  • Tho, M.N., & Tjoa, A. (2003). Zero-latency data warehousing for heterogeneous data sources and continues data streams. In Proceedings of 5th international conference on information and web-based applications services (iiWAS’03) (pp. 55–64). Jakarta, Indonesia.

  • Thor, A., & Rahm, E. (2011). Cloudfuice: A flexible cloud-based data integration system. In Auer, S., Díaz, O., Papadopoulos, G.A. (Eds.), ICWE. Lecture notes in computer science (Vol. 6757, pp. 304–318). Springer.

  • Utomo, W.H. (2011). Article: B2B Integration Based on SOA using Web Service. International Journal of Computer Applications, 32(2), 41–48.

    Google Scholar 

  • Vassiliadis, P., & Simitsis, A. (2009). Near real time etl. In Kozielski, S., & Wrembel, R. (Eds.), New trends in data warehousing and data analysis, annals of information systems (Vol. 3, pp. 1–31). US: Springer.

    Google Scholar 

  • Vidal, V., Lemos, F., Feitosa, F. (2008). Towards automatic generation of AXML Web services for dynamic data integration. In 3rd international workshop on database technologies for handling XML information on the web (DataX-EDBT’08) (pp. 43–50). Nantes, France.

  • Vrdoljak, B., Banek, M., Rizzi, S. (2003). Designing Web warehouses from XML schemas. In 5th International Conference on Data Warehousing and Knowledge Discovery (DaWaK’03) (pp. 89–98). Prague, Czech.

  • Wu, W. (2006). Integrating deep web data sources. PhD thesis, Champaign, IL, USA.

  • Xyleme, L. (2001). A dynamic warehouse for XML data of the Web. In International Database Engineering & Applications Symposium (IDEAS’01) (pp. 3–7). Grenoble, France.

  • Yu, P.S., & Chi, Y. (2009). Association rule mining on streams. In Encyclopedia of database systems (pp. 136–139). US: Springer.

    Google Scholar 

  • Zhao, B., & Liu, C. (2006). Efficient SIP-specific event notification. In ICN/ICONS/MCL (p. 1) doi:10.1109/ICNICONSMCL.2006.85. IEEE Computer Society.

  • Zhao, Q., Chen, L., Bhowmick, S.S., Madria, S.K. (2006). XML structural delta mining: Issues and challenges. Data & Knowledge Engineering, 59(3), 627–651.

    Article  Google Scholar 

  • Zhu, F., Turner, M., Kotsiopoulos, I.A., Bennett, K.H., Russell, M., Budgen, D., Brereton, P., Keane, J.A., Layzell, P.J., Rigby, M., Xu, J. (2004). Dynamic data integration using web services. In ICWS 2004, San Diego, July 2004. IEEE Computer Society Press (pp. 262–269).

  • Ziegler, P., & Dittrich, K.R. (2004). Three decades of data integration - all problems solved? In Jacquart R. (Ed.), IFIP congress topical sessions (pp. 3–12). Kluwer.

Download references

Acknowledgments

The authors thank the anonymous reviewers of this paper for their thoughtful comments, which greatly helped improving our present work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rashed Salem.

Appendix

Appendix

Fig. 22
figure 22

Sample AXML document before invoking embedded AXML services (1 of 2)

Fig. 23
figure 23

Sample AXML document before invoking embedded AXML services (2 of 2)

Fig. 24
figure 24

Sample AXML document after invoking embedded AXML services

Rights and permissions

Reprints and permissions

About this article

Cite this article

Salem, R., Boussaïd, O. & Darmont, J. Active XML-based Web data integration. Inf Syst Front 15, 371–398 (2013). https://doi.org/10.1007/s10796-012-9405-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-012-9405-6

Keywords

Navigation