AUSMS: An Environment for Frequent Sub-structures Extraction in a Semi-structured Object Collection

Laur, Pierre-Alain; Teisseire, Maguelonne; Poncelet, Pascal

doi:10.1007/978-3-540-45227-0_5

AUSMS: An Environment for Frequent Sub-structures Extraction in a Semi-structured Object Collection

Pierre-Alain Laur⁷,
Maguelonne Teisseire⁷ &
Pascal Poncelet⁸

Conference paper

636 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2736))

Abstract

Mining knowledge from structured data has been extensively addressed in the few past years. However, most proposed approaches are interested in flat structures. With the growing popularity of the Web, the number of semi-structured documents available is rapidly increasing. Structure of these objects is irregular and it is judicious to assume that a query on documents structure is almost as important as a query on data. Moreover, manipulated data is not static since it is constantly being updated. The problem of maintaining such sub-structures then becomes as much of a priority as researching them because, every time data is updated, found sub-structures could become invalid. In this paper we propose a system, called A.U.S.M.S. (Automatic Update Schema Mining System), which enables us to retrieve data, identify frequent sub-structures and keep up-to-date extracted knowledge after sources evolutions.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proceedings of SIGMOD 1993, pp. 20–76 (May 1993)
Google Scholar
Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proceedings of International Conference on Data Engineering (ICDE 1995), Tapei, Taiwan, pp. 3–14 (March 1995)
Google Scholar
Ares, J., Gehrke, J., Yiu, T., Flannick, J.: Sequential Pattern Using Bitmap Representation. In: Proceedings of PKDD 2002, Edmonton, Canada (July 2002)
Google Scholar
Asai, T., Abe, K., et al.: Efficient substructure discovery from Large Semi-structured Data. In: Proceedings of the (ICDM 2002) Conference, Washington DC, USA (April 2002)
Google Scholar
Chawathe, S., Abiteboul, S., Widom, J.: Representing and Querying Changes History in Semistructured Data. In: Proceedings of ICDE 1998, Orlando, USA (February 1998)
Google Scholar
Herman, I., Marshall, M.S.: GraphXML An XML based graph interchange format, Centre for Mathematics and Computer Sciences (CWI), Technical Report Amsterdam (2000)
Google Scholar
Laur, P.A., Masseglia, F., Poncelet, P.: A General Architecture for Finding Structural Regularities on the Web. In: Cerri, S.A., Dochev, D. (eds.) AIMSA 2000. LNCS (LNAI), vol. 1904, pp. 179–188. Springer, Heidelberg (2000)
Chapter Google Scholar
Laur, P.A., Poncelet, P.: AUSMS: un environement pour l’extraction de sous-structures fréquentes dans une collection d’objets semi-structurées (in french). Actes des Journées d’Extraction et Gestion des Connaissances (EGC 2003), Lyon, France (2003)
Google Scholar
Masseglia, F., Poncelet, P., Teisseire, M.: Incremental Mining of Sequential Patterns in Large Database. Actes des Journées BDA 2000, Blois, France (October 2000)
Google Scholar
Mannila, H., Toivonen, H.: On an Algorithm for Finding all Interesting Sequences. In: Proceedings of the 13th European Meeting on Cybernetics and Systems Research, Vienna, Austria (April 1996)
Google Scholar
Miyahara, T., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 47–52. Springer, Heidelberg (2001)
Chapter Google Scholar
Parthasarathy, S., Zaki, M.J.: Incremental and Interactive Sequence Mining. In: Proceedings of the CIKM 1999 Conference, Kansas City, USA, pp. 251–258 (November 1999)
Google Scholar
Wang, K., Liu, H.: Schema Discovery for Semi-structured Data. In: Proceedings of the KDD 1997 Conference, Newport Beach, USA, pp. 271–274 (August 1997)
Google Scholar
Wang, K., Liu, H.: Discovering Structural Association of Semistructured Data. IEEE Transactions on Knowledge and Data Engineering, 353–371 (January 1999)
Google Scholar
Zaki, M.: Efficiently Mining Frequent Trees in a Forest. In: Proceedings of SIGKDD 2002, Edmonton, Canada (July 2002)
Google Scholar
Zheng, Q., Xu, K., Ma, S., Lu, W.: The Algorithms of Updating Sequential Patterns. In: Proceedings of the International Conference on Data Mining, ICDM 2002 (April 2002)
Google Scholar

Download references

Author information

Authors and Affiliations

LIRMM, 161 rue Ada, 34392 cedex 5, Montpellier, France
Pierre-Alain Laur & Maguelonne Teisseire
EMA/LGI2P, Ecole des Mines d’Alès Site EERIE, Parc Scientifique Georges Besse, 30035 cedex 1, Nîmes, France
Pascal Poncelet

Authors

Pierre-Alain Laur
View author publications
You can also search for this author in PubMed Google Scholar
Maguelonne Teisseire
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Poncelet
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Gerstner Laboratory, Czech Technical University in Prague, Technická 2, 166 27, Prague 6, Czech Republic
Vladimír Mařík
Johannes Kepler University Linz, Altenberger Str. 69, 4040, Linz, Austria
Werner Retschitzegger
Faculty of Electrical Engineering, The Gerstner Laboratory, Czech Technical University in Prague, Technická 2, 166 27, Prague 6, Czech Republic
Olga Štěpánková

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Laur, PA., Teisseire, M., Poncelet, P. (2003). AUSMS: An Environment for Frequent Sub-structures Extraction in a Semi-structured Object Collection. In: Mařík, V., Retschitzegger, W., Štěpánková, O. (eds) Database and Expert Systems Applications. DEXA 2003. Lecture Notes in Computer Science, vol 2736. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45227-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-45227-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40806-2
Online ISBN: 978-3-540-45227-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics