A Conceptual Model for the Web
Most documents available over the web conform to the HTML specification. Such documents are hierarchically structured in nature. The existing graph-based or tree-based data models for the web only provide a very low level representation of such hierarchical structure. In this paper, we introduce a conceptual model for the web that is able to represent the complex hierarchical structure within the web documents at a high level that is close to human conceptualization/visualization of the documents. We also describe how to convert HTML documents based on this conceptual model. Using the conceptual model and conversion method, we can capture the essence (i.e., semistructure) of HTML documents in a natural and simple way.
KeywordsConceptual Model Input Type Semistructured Data List Object Radio Button
Unable to display preview. Download preview PDF.
- 1.T. Bray, J. Paoli, and C.M. Sperberg-McQueen. Extensible Markup Language (XML) 1.0. W3C Recommendation. See http://www.w3c.org/TR/1999/REC-xml-19980210, February 1998.
- 2.P. Buneman, S. Davidson, G. Hilebrand, and D. Suciu. A Query Language and Optimization Techniques for Unstructured Data. In Proceedings of the ACM SIG-MOD International Conference on Management of Data, pages 505–516, 1996.Google Scholar
- 3.J. Clark and S. DeRose. XML Path Language (XPath) Version 1.0. W3C Recommendation. See http://www.w3c.org/TR/1999/REC-xpath-19991116, November 1999.
- 4.M. Fernandez, D. Florescu, A. Levy, and D. Suciu. A Query Language for a Web-Site Management System. SIGMOD Record, pages 4–11, 1997.Google Scholar
- 5.M. Fernandez, D. Florescu, A. Levy, and D. Suciu. Reasoning About Web-Site Structure. In Proceedings of AAAI’98 Workshop on AI and Information Integration, 1998.Google Scholar
- 7.J. Hammer, H. Garcia-Molina, J. Cho, A. Crespo, and R. Aranha. Extracting Semistructured Information from the Web. In Proceedings of the Workshop on Management of Semistructured Data, 1997.Google Scholar
- 8.C. A. Knoblock, S. Minton, J. L. Ambite, N. Ashish, P. J. Modi, I. Muslea, A. G. Philpot, and S. Tejada. Modeling Web Sources for Information Integration. In Proceedings of the 15th National Conference on AI, 1998.Google Scholar
- 9.M. Liu and T. W. Ling. A Data Model for Semistructured Data with Partial and Inconsistent Information. In Proceedings of the International Conference on Ad-vances in Database Technology (EDBT 2000), pages 317–331, Konstanz, Germany, March 27-31 2000. Springer-Verlag LNCS 1777.Google Scholar
- 10.M. Liu, T. W. Ling, and T. Guan. Integration of Semistructured Data with Partial and Inconsistent Information. In Proceedings of the International Database Engineering and Application Symposium (IDEAS’ 99), pages 44–52, Montreal, Canada, August 2-4 1999. IEEE-CS Press.Google Scholar
- 11.I. Muslea, S. Minton, and C. A. Knoblock. Hierarchical Wrapper Induction for Semistructured Information Sources. To appear in Journal of Autonomous Agents and Multi-Agent Systems.Google Scholar
- 12.Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object Exchange across Heterogeneous Information. In Proceedings of the International Conference on Data Engineering, pages 251–260. IEEE Computer Society, 1995.Google Scholar
- 13.D. Raggett, A. L. Hors, and I. Jacobs. HTML 4.01 Specification. W3C Recommendation. See http://www.w3c.org/TR/html401, December 1999.
- 14.L. Wood, A. L. Hors, et al. Document Object Model (DOM) Level 2 Specification. W3C Recommendation. See http://www.w3c.org/TR/2000/CR-DOM-Level-2-20000307, March 2000.