Skip to main content

AUSMS: An Environment for Frequent Sub-structures Extraction in a Semi-structured Object Collection

  • Conference paper
  • 636 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2736))

Abstract

Mining knowledge from structured data has been extensively addressed in the few past years. However, most proposed approaches are interested in flat structures. With the growing popularity of the Web, the number of semi-structured documents available is rapidly increasing. Structure of these objects is irregular and it is judicious to assume that a query on documents structure is almost as important as a query on data. Moreover, manipulated data is not static since it is constantly being updated. The problem of maintaining such sub-structures then becomes as much of a priority as researching them because, every time data is updated, found sub-structures could become invalid. In this paper we propose a system, called A.U.S.M.S. (Automatic Update Schema Mining System), which enables us to retrieve data, identify frequent sub-structures and keep up-to-date extracted knowledge after sources evolutions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proceedings of SIGMOD 1993, pp. 20–76 (May 1993)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proceedings of International Conference on Data Engineering (ICDE 1995), Tapei, Taiwan, pp. 3–14 (March 1995)

    Google Scholar 

  3. Ares, J., Gehrke, J., Yiu, T., Flannick, J.: Sequential Pattern Using Bitmap Representation. In: Proceedings of PKDD 2002, Edmonton, Canada (July 2002)

    Google Scholar 

  4. Asai, T., Abe, K., et al.: Efficient substructure discovery from Large Semi-structured Data. In: Proceedings of the (ICDM 2002) Conference, Washington DC, USA (April 2002)

    Google Scholar 

  5. Chawathe, S., Abiteboul, S., Widom, J.: Representing and Querying Changes History in Semistructured Data. In: Proceedings of ICDE 1998, Orlando, USA (February 1998)

    Google Scholar 

  6. Herman, I., Marshall, M.S.: GraphXML An XML based graph interchange format, Centre for Mathematics and Computer Sciences (CWI), Technical Report Amsterdam (2000)

    Google Scholar 

  7. Laur, P.A., Masseglia, F., Poncelet, P.: A General Architecture for Finding Structural Regularities on the Web. In: Cerri, S.A., Dochev, D. (eds.) AIMSA 2000. LNCS (LNAI), vol. 1904, pp. 179–188. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  8. Laur, P.A., Poncelet, P.: AUSMS: un environement pour l’extraction de sous-structures fréquentes dans une collection d’objets semi-structurées (in french). Actes des Journées d’Extraction et Gestion des Connaissances (EGC 2003), Lyon, France (2003)

    Google Scholar 

  9. Masseglia, F., Poncelet, P., Teisseire, M.: Incremental Mining of Sequential Patterns in Large Database. Actes des Journées BDA 2000, Blois, France (October 2000)

    Google Scholar 

  10. Mannila, H., Toivonen, H.: On an Algorithm for Finding all Interesting Sequences. In: Proceedings of the 13th European Meeting on Cybernetics and Systems Research, Vienna, Austria (April 1996)

    Google Scholar 

  11. Miyahara, T., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 47–52. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  12. Parthasarathy, S., Zaki, M.J.: Incremental and Interactive Sequence Mining. In: Proceedings of the CIKM 1999 Conference, Kansas City, USA, pp. 251–258 (November 1999)

    Google Scholar 

  13. Wang, K., Liu, H.: Schema Discovery for Semi-structured Data. In: Proceedings of the KDD 1997 Conference, Newport Beach, USA, pp. 271–274 (August 1997)

    Google Scholar 

  14. Wang, K., Liu, H.: Discovering Structural Association of Semistructured Data. IEEE Transactions on Knowledge and Data Engineering, 353–371 (January 1999)

    Google Scholar 

  15. Zaki, M.: Efficiently Mining Frequent Trees in a Forest. In: Proceedings of SIGKDD 2002, Edmonton, Canada (July 2002)

    Google Scholar 

  16. Zheng, Q., Xu, K., Ma, S., Lu, W.: The Algorithms of Updating Sequential Patterns. In: Proceedings of the International Conference on Data Mining, ICDM 2002 (April 2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Laur, PA., Teisseire, M., Poncelet, P. (2003). AUSMS: An Environment for Frequent Sub-structures Extraction in a Semi-structured Object Collection. In: Mařík, V., Retschitzegger, W., Štěpánková, O. (eds) Database and Expert Systems Applications. DEXA 2003. Lecture Notes in Computer Science, vol 2736. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45227-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45227-0_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40806-2

  • Online ISBN: 978-3-540-45227-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics