Abstract
XML is a rather verbose representation of semistructured data, which may require huge amounts of storage space. We propose several summarized representations of XML data, which can both provide succinct information and be directly queried. These representations are based on the extraction of association rules from XML datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Fast algorithm for mining association rules. In: Proceeding of VLDB 1994 (1994)
Arenas, M., Libkin, L.: A normal form for xml documents. ACM TODS (2002)
Augurusa, E., Braga, D., Campi, A., Ceri, S.: Design and implementation of a graphical interface to xquery. In: SAC 2003 (2003)
Baralis, E., Garza, P., Quintarelli, E., Tanca, L.: Answering Queries on XML Data by means of Association Rules. Technical Report, 3, Politecnico di Milano (2004), http://www.elet.polimi.it/upload/quintare/Papers/BGQT-RR.ps
Buneman, P., Davidson, S., Fan, W., Hara, C., Tan, W.: Reasoning about keys for XML. In: DBLP 2001 (2001)
Buneman, P., Fan, W., Siméon, J., Weinstein, S.: Constraints for semistructured data and XML. ACM SIGMOD Record 30(1), 47–54 (2001)
Cortesi, A., Dovier, A., Quintarelli, E., Tanca, L.: Operational and Abstract Semantics of a Query Language for Semi–Structured Information. Theoretical Computer Science 275(1–2), 521–560 (2002)
Damiani, E., Oliboni, B., Quintarelli, E., Tanca, L.: Modeling Semistructured Data by using graph-based constraints. Technical Report 27/03, Politecnico di Milano. Dipartimento di Elettronica e Informazione (July 2003)
Dovier, A., Piazza, C., Policriti, A.: An efficient algorithm for computing bisimulation equivalence. Theoretical Computer Science (to appear)
Fan, W., Libkin, L.: On XML integrity constraints in the presence of DTDs. In: Symposium on Principles of Database Systems (2001)
Grahne, G., Zhu, J.: Discovering approximate keys in XML data. In: Proceedings of CIKM 2002, pp. 453–460. ACM Press, New York (2002)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceeding of SIGMOD 2000 (2000)
Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal 42(2), 100–111 (1999)
Lee, M.L., Ling, T.W., Low, W.L.: Designing functional dependencies for XML. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 124. Springer, Heidelberg (2002)
Merialdo, P.: SIGMOD RECORD in XML (2003), http://www.acm.org/sigmod/record/xml
Milner, R.: A Calculus of Communication Systems. LNCS, vol. 92. Springer, Berlin (1980)
Motro, A.: Using Integrity Constraints to Provide Intensional Answers to Relational Queries. In: Proceedings of VLDB 1989, pp. 237–245. Morgan Kaufmann, San Francisco (1989)
Porter, M.F.: An algorithm for suffix stripping. Program (1980)
Staples, J., Robinson, P.J.: Unification of quantified terms. In: Fasel, J.H., Keller, R.M. (eds.) Graph Reduction 1986. LNCS, vol. 279, pp. 426–450. Springer, Heidelberg (1987)
World Wide Web Consortium. XML Information Set (2001), http://www.w3C.org/xml-infoset/
World Wide Web Consortium. Extensible Markup Language (XML) 1.0 (1998), http://www.w3C.org/TR/REC-xml/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baralis, E., Garza, P., Quintarelli, E., Tanca, L. (2004). Summarizing XML Data by Means of Association Rules. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds) Current Trends in Database Technology - EDBT 2004 Workshops. EDBT 2004. Lecture Notes in Computer Science, vol 3268. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30192-9_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-30192-9_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23305-3
Online ISBN: 978-3-540-30192-9
eBook Packages: Computer ScienceComputer Science (R0)