Advertisement

World Wide Web

, Volume 4, Issue 1–2, pp 49–77 | Cite as

A Conceptual Model and Rule-Based Query Language for HTML

  • Mengchi Liu
  • Tok Wang Ling
Article

Abstract

Most documents available over the Web conform to the HTML specification. Such documents are hierarchically structured in nature. The existing data models for the Web either fail to capture the hierarchical structure within the documents or can only provide a very low level representation of such hierarchical structure. How to represent and query HTML documents at a higher level is an important issue. In this paper, we first propose a novel conceptual model for HTML. This conceptual model has only a few simple constructs but is able to represent the complex hierarchical structure within HTML documents at a level that is close to human conceptualization/visualization of the documents. We also describe how to convert HTML documents based on this conceptual model. Using the conceptual model and conversion method, one can capture the essence (i.e., semistructure) of HTML documents in a natural and simple way. Based on this conceptual model, we then present a rule–based language to query HTML documents over the Internet. This language provides a simple but very powerful way to query both intra–document structures and inter–document structures and allows the query results to be restructured. Being rule–based, it naturally supports negation and recursion and therefore is more expressive than SQL–based languages. A logical semantics is also provided.

data model for HTML conceptual modeling HTML query language rule-based query language fixpoint logical semantics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    S. Abiteboul, “Querying semistructured data,” in Proc. of the Internat. Conf. on Data Base Theory, Lecture Notes in Computer Science 1186, Springer: New York, 1997, pp. 1–18.Google Scholar
  2. [2]
    S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. L. Wiener, “The Lorel query language for semistructured data,” Internat. J. Digital Libraries 1(1), 1997, 68–88.Google Scholar
  3. [3]
    G. Arocena and A. Mendelzon, “WebOQL: Restructuring documents, databases and Webs,” in Proc. of the Internat. Conf. on Data Engineering, IEEE Computer Soc., 1998, pp. 24–33.Google Scholar
  4. [4]
    C. Beeri, S. Naqvi, O. Shmueli, and S. Tsur, “Set construction in a logic database language,” J. Logic Programming 10(3,4), 1991, 181–232.Google Scholar
  5. [5]
    T. Bray, J. Paoli, and C. M. Sperberg–McQueen, Extensible markup language (XML) 1.0, W3C Recommendation; see http://www.w3c.org/TR/1999/REC–xml–19980210, February 1998.Google Scholar
  6. [6]
    P. Buneman, S. Davidson, G. Hilebrand, and D. Suciu, “A query language and optimization techniques for unstructured data,” in Proc. of the ACM SIGMOD Internat. Conf. on Management of Data, 1996, pp. 505–516.Google Scholar
  7. [7]
    J. Clark and S. DeRose, XML path language (XPath) version 1.0, W3C Recommendation; see http://www.w3c.org/TR/1999/REC–xpath–19991116, November 1999.Google Scholar
  8. [8]
    O. Shmueli and D. Konopnicki, “W3QS: A query system for the World–Wide Web,” in Proc. of the Internat. Conf. on Very Large Data Bases, Zurich, Switzerland, Morgan Kaufmann, 1995, pp. 54–65.Google Scholar
  9. [9]
    M. Fernandez, D. Florescu, A. Levy, and D. Suciu, “A query language for aWeb–site management system,” SIGMOD Record, 1997, 4–11.Google Scholar
  10. [10]
    M. Fernandez, D. Florescu, A. Levy, and D. Suciu, “Reasoning about Web–site structure,” in Proc. of AAAI'98 Workshop on AI and Information Integration, 1998.Google Scholar
  11. [11]
    D. Florescu, A. Levy, and A. Mendelzon, “Database techniques for the World–Wide Web: A survey,” SIGMOD Record 26(3), 1997.Google Scholar
  12. [12]
    D. Florescu, A. Levy, and A. Mendelzon, “Database techniques for the World–Wide Web: A survey,” SIGMOD Record 27(3), 1998, 59–74.Google Scholar
  13. [13]
    J. Hammer, H. Garcia–Molina, J. Cho, A. Crespo, and R. Aranha, “Extracting semistructured information from the Web,” in Proc. of the Workshop on Management of Semistructured Data, 1997.Google Scholar
  14. [14]
    R. Himmeroder, G. Lausen, B. Ludascher, and C. Schlepphorst, “On a declarative semantics for Web queries,” in Proc. of the Internat. Conf. on Deductive and Object–Oriented Databases, Switzerland, 1997, Lecture Notes in Computer Science, Springer: New York, pp. 386–398.Google Scholar
  15. [15]
    C. A. Knoblock, S. Minton, J. L. Ambite, N. Ashish, P. J. Modi, I. Muslea, A. G. Philpot, and S. Tejada, “Modeling Web sources for information integration,” in Proc. of the 15th National Conf. on AI, 1998.Google Scholar
  16. [16]
    L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian, “A declarative language for querying and restructuring the Web,” in Proc. of the 6th Internat. Workshop on Research Issues in Data Engineering, 1996.Google Scholar
  17. [17]
    M. Liu, “ROL: A deductive object base language,” Information Systems 21(5), 1996, 431–457.Google Scholar
  18. [18]
    M. Liu, “Relationlog: A typed extension to datalog with sets and tuples,” J. Logic Programming 36(3), 1998, 271–299.Google Scholar
  19. [19]
    M. Liu and T. W. Ling, “A conceptual model for the Web,” in Proc. of the Internat. Conf. on Conceptual Modeling (ER 2000), Salt Lake City, 9–12 October 2000, Lecture Notes in Computer Science, Springer: New York, pp. 225–238.Google Scholar
  20. [20]
    M. Liu and T.W. Ling, “A data model for semistructured data with partial and inconsistent information,” in Proc. of the Internat. Conf. on Advances in Database Technology (EDBT 2000), Konstanz, Germany, 27–31 March 2000, Lecture Notes in Computer Science 1777, Springer: New York, pp. 317–331.Google Scholar
  21. [21]
    M. Liu and T. W. Ling, “A rule–based query language for the Web,” in Proc. of the 7th Internat. Conf. on Database Systems for Advanced Applications (DASFAA 2001), Hong Kong, China, 18–20 April 2001, IEEE Computer Soc. Press: Silver Spring, MD, pp. 6–13.Google Scholar
  22. [22]
    M. Liu, T. W. Ling, and T. Guan, “Integration of semistructured data with partial and inconsistent information,” in Proc. of the Internat. Database Engineering and Application Symposium (IDEAS '99), Montreal, Canada, 2–4 August 1999, IEEE Computer Soc. Press: Silver Spring, MD, pp. 44–52.Google Scholar
  23. [23]
    A. Mendelzon, G. Mihaila, and T. Milo, “Querying the World Wide Web,” in Proc. of the 1st Internat. Conf. on Parellel and Distributed Information System, 1996, pp. 80–91.Google Scholar
  24. [24]
    A. O. Mendelzon and T. Milo, “Formal models of Web queries,” in Proc. of the ACM Symposium on Principles of Database Systems, 1997.Google Scholar
  25. [25]
    I. Muslea, S. Minton, and C. A. Knoblock, “Hierarchical wrapper induction for semistructured information sources,” J. Autonom. Agents Multi–Agent Systems 4(1/2), 2001, 93–114.Google Scholar
  26. [26]
    J. Myllymaki, “Effective Web data extraction with standard XML technologies,” in Proc. of the 10th Internat. World Wide Web Conf., Hong Kong, China, 2001, ACM: New York, pp. 689–696.Google Scholar
  27. [27]
    Y. Papakonstantinou, H. Garcia–Molina, and J. Widom, “Object exchange across heterogeneous information,” in Proc. of the Internat. Conf. on Data Engineering, IEEE Computer Soc. Press: Silver Spring, MD, 1995, pp. 251–260.Google Scholar
  28. [28]
    D. Raggett, A. L. Hors, and I. Jacobs, “HTML 4.01 specification,” W3C Recommendation; see http://www.w3c.org/TR/html401, December 1999.Google Scholar
  29. [29]
    J. D. Ullman, Principles of Database and Knowledge–Base Systems, Vol. 1, Computer Soc. Press: Silver Spring, MD, 1988.Google Scholar
  30. [30]
    L. Wood, A. L. Hors et al., “Document Object Model (DOM) Level 2 Specification,” W3C Recommendation; see http://www.w3c.org/TR/2000/CR–DOM–Level–2–20000307, March 2000.Google Scholar

Copyright information

© Kluwer Academic Publishers 2001

Authors and Affiliations

  • Mengchi Liu
    • 1
  • Tok Wang Ling
    • 1
  1. 1.School of Computer Science, Carleton UniversityOttawaCanada

Personalised recommendations