AIMSA 2000: Artificial Intelligence: Methodology, Systems, and Applications pp 179-188 | Cite as
A General Architecture for Finding Structural Regularities on the Web
Abstract
With the growing popularity of the World Wide Web, the number of semistructured documents produced in all types of organiza- tions increases at a rapid rate. However the provided information can- not be queried or manipulated in the general way since, although there is some structure in the information, it is too irregular to be modeled using a relational or an object-oriented approach. Nevertheless, some semistructured objects, for the same type of information, have a very similar structure. In this paper we address the problem of finding such regularities and we propose a general architecture based on a very effi- cient data mining technique.
Preview
Unable to display preview. Download preview PDF.
References
- Abi97.S. Abiteboul. Querying Semi-Structured Data. In Proceedings of International Conference on Database Theory (ICDT’97), pages 1–18, Delphi, Greece, January 1997.Google Scholar
- ABS00.S. Abiteboul, P. Buneman, and D. Suciu. Data on the Web. Morgan Kaufmann, 2000.Google Scholar
- AIS93.R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. In Proceedings of the 1993 ACM SIGMOD Conference, pages 207–216, Washington DC, USA, May 1993.Google Scholar
- AQM+97._S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J.L Wiener. The Lorel Query Language for Semi-Structured Data. International Journal on Digital Libraries, 1(1):68–88, April 1997.Google Scholar
- AS94.R. Agrawal and R. Srikant. Fast Algorithms for Mining Generalized Association Rules. In Proceedings of the 20th International Conference on Very Large Databases (VLDB’94), Santiago, Chile, September 1994.Google Scholar
- AS95.R. Agrawal and R. Srikant. Mining Sequential Patterns. In Proceedings of the 11th International Conference on Data Engineering (ICDE’95), Tapei, Taiwan, March 1995.Google Scholar
- BMUT97.S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic Itemset Counting and Implication Rules for Market Basket Data. In Proceedings of the International Conference on Management of Data (SIGMOD’ 97), pages 255–264, Tucson, Arizona, May 1997.Google Scholar
- FPSSU96.U. M. Fayad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA, 1996.Google Scholar
- HGMC+97._J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, and A. Crespo. Extracting Semistructured Information from the Web. In Proceedings of the Workshop on Management of Semistructured Data. See [Wor97], Tucson, Arizona, May 1997.Google Scholar
- KS95.D. Konopnicki and O. Shmueli. W3QS: A Query System for the World-Wide Web. In Proceedings of the 21 st International Conference on Very Large Databases (VLDB’95), pages 54–65, Zurich, Switzerland, September 1995.Google Scholar
- MCP98.F. Masseglia, F. Cathala, and P. Poncelet. The PSP Approach for Mining Sequential Patterns. In Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD’98), LNAI, Vol. 1510, pages 176–184, Nantes, France, September 1998.Google Scholar
- MPC99.F. Masseglia, P. Poncelet, and R. Cicchetti. An Efficient Algorithm for Web Usage Mining. Networking and Information Systems Journal, 2(5–6):571–603, December 1999.Google Scholar
- NUWC97.S. Nestorov, J. Ullman, J. Wiener, and S. Chawathe. Representative Objects: Concise Representations of Semistructured, Hierarchical Data. In Proceedings of the 13th International Conference on Data Engineering (ICDE’97), pages 79–90, Birmingham, U. K., April 1997.Google Scholar
- SA96.R. Srikant and R. Agrawal. Mining Sequential Patterns: Generalizations and Performance Improvements. In Proceedings of the 5th International Conference on Extending Database Technology (EDBT’96), pages 3–17, Avignon, France, September 1996.Google Scholar
- SON95.A. Savasere, E. Omiecinski, and S. Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. In Proceedings of the 21 st International Conference on Very Large Databases (VLDB’95), pages 432–444, Zurich, Switzerland, September 1995.Google Scholar
- Toi96.H. Toivonen. Sampling Large Databases for Association Rules. In Proceedings of the 22nd International Conference on Very Large Databases (VLDB’96), September 1996.Google Scholar
- WL98.K. Wang and H. Q. Liu. Discovering Typical Structures of Documents: A Road Map Approach. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, pages 146–154, Melbourne, Austrialia, August 1998.Google Scholar
- WL99.K. Wang and H. Liu. Discovering Structural Association of Semistructured Data. IEEE Transactions on Knowledge and Data Engineering, 1999.Google Scholar
- Wor97.Work97. The Workshop on Management of Semistructured Data. In http://www.research.att.com/~suciu/workshop-papers.html, Tucson, Arizona, May 1997.