Skip to main content

Ontology Creation: Extraction of Domain Knowledge from Web Documents

  • Conference paper
Conceptual Modeling – ER 2005 (ER 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3716))

Included in the following conference series:

Abstract

Considerable research has gone into developing ontologies and applying them to a variety of applications. The extraction of domain knowledge for developing these ontologies is often performed on a manual basis. The World Wide Web contains a wealth of knowledge about an application domain; however it is embedded within web pages. This research presents a methodology for semi-automatically extracting knowledge from the World Wide Web and organizing it into domain ontologies. Initial semantics of a target domain are provided by a set of keywords. From these, web pages are identified that contain relevant information for the subject domain using search engines. Web data extraction techniques are employed to extract information from these web pages and infer how the information is related. Extracted knowledge is then organized into a domain ontology. Testing of the methodology on various application domains illustrates the feasibility of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Chiang, R., Chua, E.H., Storey, V.C.: A Smart Web Query Engine for Semantic Retrieval of Web Data. In: Bouzeghoub, M., Kedad, Z., Métais, E. (eds.) NLDB 2000. LNCS, vol. 1959, p. 215. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  2. Embley, D.W.: Toward Semantic Understanding: An Approach Based on Information Extraction Ontologies. Presented at ACM International Conference Proceeding Series; Proceedings of the fifteenth conference on Australasian database, Dunedin, New Zealand (2004)

    Google Scholar 

  3. Gruber, T.R.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition 5, 199–220 (1993)

    Article  Google Scholar 

  4. Weber, R.: Ontological Issues in Accounting Information Systems. In: Sutton, S.A.A. (ed.) Researching Accounting as an Information Systems Discipline. American Accounting Association, Sarasota (2002)

    Google Scholar 

  5. Dahlgren, K.: A Linguistic Ontology. International Journal of Human-Computer Studies 43, 809–818 (1995)

    Article  Google Scholar 

  6. Kedad, Z., Métais, E.: Dealing with Semantic Heterogeneity During Data Integration. In: Akoka, J., Bouzeghoub, M., Comyn-Wattiau, I., Métais, E. (eds.) ER 1999. LNCS, vol. 1728, pp. 325–339. Springer, Heidelberg (1999)

    Google Scholar 

  7. Bergholtz, M., Johannesson, P.: Classifying the Semantics of Relationships in Conceptual Modeling by Categorization of Roles, Madrid, Spain, June 28-29 (2001)

    Google Scholar 

  8. Storey, V.C.: Classifying and Comparing Relationships in Conceptual Modeling. IEEE Transactions on Knowledge and Data Engineering (forthcoming, 2005)

    Google Scholar 

  9. Laender, A.H.F., Ribeiro-Neto, B.A., de Silva, A.S., Teixeira, J.S.: A Brief Survey of Web Data Extraction Tools. ACM SIGMOD Record 31, 84–93 (2002)

    Article  Google Scholar 

  10. Califf, A.M.E., Mooney, R.J.: Relational learning of pattern-match rules for information extraction. Presented at Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, Orlando, Florida (1999)

    Google Scholar 

  11. Freitag, D.: Machine Learning for Information Extraction in Informal Domains. Machine Learning 39, 169–202 (2000)

    Article  MATH  Google Scholar 

  12. Soderland, S.: Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning 34, 233–272 (1999)

    Article  MATH  Google Scholar 

  13. Embley, D.W., Campbell, D.M., Jiang, Y.S., Ng, Y.-K., Smith, R.D., Liddle, S.W., Quass, D.W.: Conceptual-model-based data extraction from multiple-record Web pages. Data & Knowledge Engineering 31, 227–251 (1999)

    Article  MATH  Google Scholar 

  14. Etzioni, O.: The World-Wide Web: Quagmire or Gold Mine? Communications of the ACM archive 39, 65–68 (1996)

    Article  Google Scholar 

  15. Kosala, R., Blockeel, H.: Web Mining Research: A Survey. SIGKDD Explorations 2, 1–15 (2000)

    Article  Google Scholar 

  16. Xavier (2005), http://www.xavier.edu/library/xututor/evaluating/types_of_websites.cfm

  17. Fellbaum, C.: Introduction. In: WordNet: An Electronic Lexical Database, pp. 1–19. The MIT Press, Cambridge (1998)

    Google Scholar 

  18. Burton-Jones, A., Storey, V.C., Sugumaran, V., Purao, S.: A Heuristic-based Methodology for Semantic Augmentation of User Queries on the Web. In: Song, I.-Y., Liddle, S.W., Ling, T.-W., Scheuermann, P. (eds.) ER 2003. LNCS, vol. 2813, pp. 476–489. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Storey, V.C., Chiang, R., Chen, G.L. (2005). Ontology Creation: Extraction of Domain Knowledge from Web Documents. In: Delcambre, L., Kop, C., Mayr, H.C., Mylopoulos, J., Pastor, O. (eds) Conceptual Modeling – ER 2005. ER 2005. Lecture Notes in Computer Science, vol 3716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11568322_17

Download citation

  • DOI: https://doi.org/10.1007/11568322_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29389-7

  • Online ISBN: 978-3-540-32068-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics