Abstract
Mining knowledge from structured data has been extensively addressed in the few past years. However, most proposed approaches are interested in flat structures. With the growing popularity of the Web, the number of semi-structured documents available is rapidly increasing. Structure of these objects is irregular and it is judicious to assume that a query on documents structure is almost as important as a query on data. Moreover, manipulated data is not static since it is constantly being updated. The problem of maintaining such sub-structures then becomes as much of a priority as researching them because, every time data is updated, found sub-structures could become invalid. In this paper we propose a system, called A.U.S.M.S. (Automatic Update Schema Mining System), which enables us to retrieve data, identify frequent sub-structures and keep up-to-date extracted knowledge after sources evolutions.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proceedings of SIGMOD 1993, pp. 20–76 (May 1993)
Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proceedings of International Conference on Data Engineering (ICDE 1995), Tapei, Taiwan, pp. 3–14 (March 1995)
Ares, J., Gehrke, J., Yiu, T., Flannick, J.: Sequential Pattern Using Bitmap Representation. In: Proceedings of PKDD 2002, Edmonton, Canada (July 2002)
Asai, T., Abe, K., et al.: Efficient substructure discovery from Large Semi-structured Data. In: Proceedings of the (ICDM 2002) Conference, Washington DC, USA (April 2002)
Chawathe, S., Abiteboul, S., Widom, J.: Representing and Querying Changes History in Semistructured Data. In: Proceedings of ICDE 1998, Orlando, USA (February 1998)
Herman, I., Marshall, M.S.: GraphXML An XML based graph interchange format, Centre for Mathematics and Computer Sciences (CWI), Technical Report Amsterdam (2000)
Laur, P.A., Masseglia, F., Poncelet, P.: A General Architecture for Finding Structural Regularities on the Web. In: Cerri, S.A., Dochev, D. (eds.) AIMSA 2000. LNCS (LNAI), vol. 1904, pp. 179–188. Springer, Heidelberg (2000)
Laur, P.A., Poncelet, P.: AUSMS: un environement pour l’extraction de sous-structures fréquentes dans une collection d’objets semi-structurées (in french). Actes des Journées d’Extraction et Gestion des Connaissances (EGC 2003), Lyon, France (2003)
Masseglia, F., Poncelet, P., Teisseire, M.: Incremental Mining of Sequential Patterns in Large Database. Actes des Journées BDA 2000, Blois, France (October 2000)
Mannila, H., Toivonen, H.: On an Algorithm for Finding all Interesting Sequences. In: Proceedings of the 13th European Meeting on Cybernetics and Systems Research, Vienna, Austria (April 1996)
Miyahara, T., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 47–52. Springer, Heidelberg (2001)
Parthasarathy, S., Zaki, M.J.: Incremental and Interactive Sequence Mining. In: Proceedings of the CIKM 1999 Conference, Kansas City, USA, pp. 251–258 (November 1999)
Wang, K., Liu, H.: Schema Discovery for Semi-structured Data. In: Proceedings of the KDD 1997 Conference, Newport Beach, USA, pp. 271–274 (August 1997)
Wang, K., Liu, H.: Discovering Structural Association of Semistructured Data. IEEE Transactions on Knowledge and Data Engineering, 353–371 (January 1999)
Zaki, M.: Efficiently Mining Frequent Trees in a Forest. In: Proceedings of SIGKDD 2002, Edmonton, Canada (July 2002)
Zheng, Q., Xu, K., Ma, S., Lu, W.: The Algorithms of Updating Sequential Patterns. In: Proceedings of the International Conference on Data Mining, ICDM 2002 (April 2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Laur, PA., Teisseire, M., Poncelet, P. (2003). AUSMS: An Environment for Frequent Sub-structures Extraction in a Semi-structured Object Collection. In: Mařík, V., Retschitzegger, W., Štěpánková, O. (eds) Database and Expert Systems Applications. DEXA 2003. Lecture Notes in Computer Science, vol 2736. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45227-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-45227-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40806-2
Online ISBN: 978-3-540-45227-0
eBook Packages: Springer Book Archive