Automatic Creation and Simplified Querying of Semantic Web Content: An Approach Based on Information-Extraction Ontologies

  • Yihong Ding
  • David W. Embley
  • Stephen W. Liddle
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4185)


The semantic web represents a major advance in web utility, but it is currently difficult to create semantic-web content because pages must be semantically annotated through processes that are mostly manual and require a high degree of engineering skill. Furthermore, users need an effective way to query the semantic web, but any burden placed on users to learn a query language is unlikely to garner sufficient user support and interest. Unfortunately, both the creation and use of semantic-web pages are difficult, and these are precisely the processes that must be made simple in order for the semantic web to truly succeed. We propose using information-extraction ontologies to handle both of these challenges. In this paper we show how a successful ontology-based data-extraction technique can (1) automatically generate semantic annotations for ordinary web pages, and (2) support free-form, textual queries that will be relatively simple for end users to write.


Information Extraction Data Frame Semantic Annotation External Representation Participation Constraint 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arlotta, L., Crescenzi, V., Mecca, G., Merialdo, P.: Automatic annotation of data extracted from large web sites. In: Proc. Sixth International Workshop on the Web and Databases (WebDB 2003), San Diego, California, June 2003, pp. 7–12 (2003)Google Scholar
  2. 2.
    Berners-Lee, T., Hendler, J., Lassila, O.: TheSemanticWeb. ScientificAmerican 36(25), 34–43 (2001)CrossRefGoogle Scholar
  3. 3.
    Homepage, BYU Data Extraction Group,
  4. 4.
    Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., McCurley, K.S., Rajagopalan, S., Tomkins, A., Tomlin, J.A., Zien, J.Y.: A Case for Automated Large Scale Semantic Annotations. Journal of Web Semantics 1(1), 115–132 (2003)Google Scholar
  5. 5.
    Embley, D.W., Kurtz, B.D., Woodfield, S.N.: Object-oriented Systems Analysis: AModel-Driven Approach. Prentice Hall, Englewood Cliffs (1992)Google Scholar
  6. 6.
    Embley, D.W., Campbell, D.M., Jiang, Y.S., Liddle, S.W., Lonsdale, D.W., Ng, Y.-K., Smith, R.D.: Conceptual-model-based data extraction from multiple-record web pages. Data & Knowledge Engineering 31(3), 227–251 (1999)MATHCrossRefGoogle Scholar
  7. 7.
    Gruber, T.R.: A translation approach to portable ontology specifications. Knowledge Acquisition 5(2), 199–220 (1993)CrossRefGoogle Scholar
  8. 8.
    Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM Semi-automatic CREAtion of Metadata. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 358–372. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  9. 9.
    Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D.: Semantic Annotation, Indexing, and Retrieval. Journal of Web Semantics 2(1), 49–79 (2004)Google Scholar
  10. 10.
    Lerman, K., Minton, S.N., Knoblock, C.A.: Wrapper maintenance: A machine learning approach. Journal of Artificial Intelligence Research 18, 149–181 (2003)MATHGoogle Scholar
  11. 11.
    Maier, D.: The Theory of Relational Databases. Computer Science Press, Inc., Rockville (1983)MATHGoogle Scholar
  12. 12.
    Mukherjee, S., Yang, G., Ramakrishnan, I.V.: Automatic Annotation of Content- Rich HTML Documents: Structural and Semantic Analysis. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 533–549. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  13. 13.
    W3C (World Wide Web Consortium). OWL Web Ontology Language Reference,
  14. 14.
    Sheth, A., Ramakrishnan, C.: Semantic (Web) technology in action: Ontology driven information systems for search, integration and analysis. IEEE Data Engineering Bulletin 26(4), 40–48 (2003)Google Scholar
  15. 15.
    W3C (WorldWideWeb Consortium). SPARQL Query Language for RDF (February 2006),
  16. 16.
    Vargas-Vera, M., Motta, E., Domingue, J., Lanzoni, M., Stutt, A., Ciravegna, F.: MnM: Ontology Driven Tool for SemanticMarkup. In: Proc.Workshop Semantic Authoring, Annotation & KnowledgeMarkup (SAAKM 2002), Lyon, France, pp. 43–47 (July 2002)Google Scholar
  17. 17.
    Vickers, M.: Ontology-Based Free-Form Query Processing for the Semantic Web. Masters Thesis, Brigham Young University, Provo, Utah (June 2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yihong Ding
    • 1
  • David W. Embley
    • 1
  • Stephen W. Liddle
    • 2
  1. 1.Department of Computer ScienceBrigham Young UniveristyProvoU.S.A
  2. 2.Information Systems DepartmentBrigham Young UniveristyProvoU.S.A

Personalised recommendations