Abstract
Documents represent an interesting source for decisional analyses. They help decision makers to better understand the evolution of their business activities. Therefore, they merit to be warehoused for decision purposes within organizations. Generally, these documents exist in XML format and are described by multiple structures. In this paper, we present a semi-automatic approach to build the XML Document Warehouse. This approach is made up of two methods namely: Unification of structures of XML Structures, and Multidimensional modeling. More specifically, this paper focuses on the experiment and evaluation of the proposed approach for warehousing document-centric XML documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pérez, M.J.M., Berlanga, L.M.R., Aramburu, C.M.J., Pederson, T.B.: Contextualizing data warehouses with documents. In: Decision Support System (DSS), vol. 45, pp. 77–94. Elsevier (2008)
Tseng, F.S.C., Chou, A.Y.H.: The concept of document warehousing for multi-dimensional modeling of textual-based business intelligence. In: Decision Support Systems (DSS), vol. 42, pp. 727–744. Elsevier (2006)
McCabe, M.C., Lee, J., Chowdhury, A., Grossman, D., Frieder, O.: On the design and evaluation of a multi-dimensional approach to information retrieval. In: Proceedings of the 23rd Annual International ACM SIGIR Conference, pp. 363–365 (2000)
Sullivan, D.: Document Warehousing and Text Mining: Techniques for Improving Business Operations. Marketing and Sales. Wiley, New York (2001)
Fuhr, N., Grobjohann, K.: XIRQL: a query language for information retrieval in XML documents. In: 24th International ACM Conference on Research and Development in Information Retrieval (SIGIR), pp, 172–180. ACM Press (2001)
Kamps, J., Marx, M., De Rijke, M., Sigurbjornsson, B.: Best-match querying from document-centric XML. In: Proceedings of the Seventh International Workshop the Web and Databases, pp. 55–60 (2004)
Feki, J., Ben Messaoud, I., Zurfluh, G.: Building an XML document warehouse. J. Decis. Syst. JDS 2013 22, 122–148 (2013)
Ben Messaoud, I., Feki, J., Khrouf, K., Zurfluh, G.: Unification of XML document structures for Document Warehouse (DocW). In: 13th International Conference on Entreprise Information Systems (ICEIS), pp. 85–94, Beijing (2011a)
Ben Messaoud, I., Feki, J., Zurfluh, G.: A first step for building a document warehouse: unification of XML documents. In: Sixth International Conference on Research Challenges in Information Science (RCIS), pp. 59–64, Spain (2012)
Ben Messaoud, I., Feki, J., Zurfluh, G.: Modélisation multidimensionnelle des documents XML. Revue des Nouvelles Technologies de l’Information (RNTI) B-7, 55–70 (2011b)
Ben Messaoud, I., Feki, J., Zurfluh, G.: Galaxy-Gen: a tool for building galaxy model from XML documents. In: 6th International Conference on Knowledge Engineering and Ontology Development KEOD 2014, Rome, Italie (2014)
Tournier, R.: Analyse en ligne (OLAP) des documents. Ph.D. thesis, University of Toulouse III, France (2007)
Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: clustering XML schemas for effective integration. In: Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), pp. 292–299, Virginia (2002)
Mello, R.D.S., Castano, S., Heuser, C.A.: A method for the unification of XML schemata. Inf. Softw. Technol. 44, 241–249 (2002)
Yoo, C.-S., Woo, S.-M., Kim, Y.-S.: Unification of XML DTD for XML documents with similar structure. In: Gervasi, O., Gavrilova, M.L., Kumar, V., Laganá, A., Lee, H.P., Mun, Y., Taniar, D., Tan, C.J.K. (eds.) ICCSA 2005. LNCS, vol. 3482, pp. 954–963. Springer, Heidelberg (2005)
Zhang, Y., Liu, W.: Semantic integration of XML schema. In: First International Conference on Machine Learning and Cybernetics, Beijing (2002)
De Meo, P., Quattrone, G., Terracina, G., Ursino, D.: “Almost Automatic” and semantic integration of XML schemas at various “Severity” levels. In: Meersman, R., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 4–21. Springer, Heidelberg (2003)
Boussaid, O., Ben Messaoud, R., Choquet, R., Anthoard, S., Conception et construction d’entrepôts XML. 2ème journée francophone surles Entrepôts de Données et l’Analyse en ligne EDA 2006, pp. 3–22, Versailles, France (2006)
Hachaichi, Y., Feki, J., Ben-Abdallah, H.: Modélisation multidimensionnelle de documents XML centrés-données. J. Decis. Syst. JDS 2010 19/3, 313–345 (2010)
Khrouf K.: Entrepôts de documents: De l’alimentation à l’exploitation. Thèse de doctorat en Informatique, Université Paul Sabatier, Toulouse, France (2004)
Ravat, F., Teste, O., Tournier, R.: Analyse multidimensionnelle de documents via des dimensions OLAP. Document numérique, Hermès, Numéro spécial Entreposage de documents et données semi-structurées, pp. 85–104 (2007)
Pujolle, G., Ravat, F., Teste, O., Tournier, R., Zurfluh, G.: Multidimensional database design from document-centric XML documents. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2011. LNCS, vol. 6862, pp. 51–65. Springer, Heidelberg (2011)
Jaro, M.A.: Advances in record linking methodology as applied to the 1985 census of Tampa Florida. J. Am. Stat. Soc. 89, 414–420 (1989)
Aouabed, H., Ben Messaoud, I., Feki, J., Zurfluh, G.: USD: Un outil d’unification des structures des documents XML. 6ème Atelier des Systèmes Décisionnels ASD 2012, pp. 83–94, Blida, Algérie (2012)
Golfarelli, M., Maio, D., Rizzi, S.: Conceptual design of data warehouses from E/R schema. In: Proceedings of the 31st Annual Hawaii International Conference on System Sciences (HICSS 1998), pp. 334–343. IEEE Computer Society, Washington, D.C., USA (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ben Messaoud, I., Feki, J., Zurfluh, G. (2015). A Semi-automatic Approach to Build XML Document Warehouse. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2014. Communications in Computer and Information Science, vol 553. Springer, Cham. https://doi.org/10.1007/978-3-319-25840-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-25840-9_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25839-3
Online ISBN: 978-3-319-25840-9
eBook Packages: Computer ScienceComputer Science (R0)