A General Architecture for Finding Structural Regularities on the Web

  • P. A. Laur
  • F. Masseglia
  • P. Poncelet
  • M. Teisseire
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1904)

Abstract

With the growing popularity of the World Wide Web, the number of semistructured documents produced in all types of organiza- tions increases at a rapid rate. However the provided information can- not be queried or manipulated in the general way since, although there is some structure in the information, it is too irregular to be modeled using a relational or an object-oriented approach. Nevertheless, some semistructured objects, for the same type of information, have a very similar structure. In this paper we address the problem of finding such regularities and we propose a general architecture based on a very effi- cient data mining technique.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abi97.
    S. Abiteboul. Querying Semi-Structured Data. In Proceedings of International Conference on Database Theory (ICDT’97), pages 1–18, Delphi, Greece, January 1997.Google Scholar
  2. ABS00.
    S. Abiteboul, P. Buneman, and D. Suciu. Data on the Web. Morgan Kaufmann, 2000.Google Scholar
  3. AIS93.
    R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. In Proceedings of the 1993 ACM SIGMOD Conference, pages 207–216, Washington DC, USA, May 1993.Google Scholar
  4. AQM+97._S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J.L Wiener. The Lorel Query Language for Semi-Structured Data. International Journal on Digital Libraries, 1(1):68–88, April 1997.Google Scholar
  5. AS94.
    R. Agrawal and R. Srikant. Fast Algorithms for Mining Generalized Association Rules. In Proceedings of the 20th International Conference on Very Large Databases (VLDB’94), Santiago, Chile, September 1994.Google Scholar
  6. AS95.
    R. Agrawal and R. Srikant. Mining Sequential Patterns. In Proceedings of the 11th International Conference on Data Engineering (ICDE’95), Tapei, Taiwan, March 1995.Google Scholar
  7. BMUT97.
    S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic Itemset Counting and Implication Rules for Market Basket Data. In Proceedings of the International Conference on Management of Data (SIGMOD’ 97), pages 255–264, Tucson, Arizona, May 1997.Google Scholar
  8. FPSSU96.
    U. M. Fayad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA, 1996.Google Scholar
  9. HGMC+97._J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, and A. Crespo. Extracting Semistructured Information from the Web. In Proceedings of the Workshop on Management of Semistructured Data. See [Wor97], Tucson, Arizona, May 1997.Google Scholar
  10. KS95.
    D. Konopnicki and O. Shmueli. W3QS: A Query System for the World-Wide Web. In Proceedings of the 21 st International Conference on Very Large Databases (VLDB’95), pages 54–65, Zurich, Switzerland, September 1995.Google Scholar
  11. MCP98.
    F. Masseglia, F. Cathala, and P. Poncelet. The PSP Approach for Mining Sequential Patterns. In Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD’98), LNAI, Vol. 1510, pages 176–184, Nantes, France, September 1998.Google Scholar
  12. MPC99.
    F. Masseglia, P. Poncelet, and R. Cicchetti. An Efficient Algorithm for Web Usage Mining. Networking and Information Systems Journal, 2(5–6):571–603, December 1999.Google Scholar
  13. NUWC97.
    S. Nestorov, J. Ullman, J. Wiener, and S. Chawathe. Representative Objects: Concise Representations of Semistructured, Hierarchical Data. In Proceedings of the 13th International Conference on Data Engineering (ICDE’97), pages 79–90, Birmingham, U. K., April 1997.Google Scholar
  14. SA96.
    R. Srikant and R. Agrawal. Mining Sequential Patterns: Generalizations and Performance Improvements. In Proceedings of the 5th International Conference on Extending Database Technology (EDBT’96), pages 3–17, Avignon, France, September 1996.Google Scholar
  15. SON95.
    A. Savasere, E. Omiecinski, and S. Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. In Proceedings of the 21 st International Conference on Very Large Databases (VLDB’95), pages 432–444, Zurich, Switzerland, September 1995.Google Scholar
  16. Toi96.
    H. Toivonen. Sampling Large Databases for Association Rules. In Proceedings of the 22nd International Conference on Very Large Databases (VLDB’96), September 1996.Google Scholar
  17. WL98.
    K. Wang and H. Q. Liu. Discovering Typical Structures of Documents: A Road Map Approach. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, pages 146–154, Melbourne, Austrialia, August 1998.Google Scholar
  18. WL99.
    K. Wang and H. Liu. Discovering Structural Association of Semistructured Data. IEEE Transactions on Knowledge and Data Engineering, 1999.Google Scholar
  19. Wor97.
    Work97. The Workshop on Management of Semistructured Data. In http://www.research.att.com/~suciu/workshop-papers.html, Tucson, Arizona, May 1997.

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • P. A. Laur
    • 1
  • F. Masseglia
    • 1
    • 2
  • P. Poncelet
    • 1
  • M. Teisseire
    • 1
  1. 1.LIRMM UMR CNRS 5506Montpellier Cedex 5France
  2. 2.Laboratoire PRiSMUniversité de VersaillesVersailles CedexFrance

Personalised recommendations