Ontology Creation: Extraction of Domain Knowledge from Web Documents

  • Veda C. Storey
  • Roger Chiang
  • G. Lily Chen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3716)


Considerable research has gone into developing ontologies and applying them to a variety of applications. The extraction of domain knowledge for developing these ontologies is often performed on a manual basis. The World Wide Web contains a wealth of knowledge about an application domain; however it is embedded within web pages. This research presents a methodology for semi-automatically extracting knowledge from the World Wide Web and organizing it into domain ontologies. Initial semantics of a target domain are provided by a set of keywords. From these, web pages are identified that contain relevant information for the subject domain using search engines. Web data extraction techniques are employed to extract information from these web pages and infer how the information is related. Extracted knowledge is then organized into a domain ontology. Testing of the methodology on various application domains illustrates the feasibility of the approach.


Domain Knowledge Natural Language Processing Target Domain Domain Ontology Internet Address 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chiang, R., Chua, E.H., Storey, V.C.: A Smart Web Query Engine for Semantic Retrieval of Web Data. In: Bouzeghoub, M., Kedad, Z., Métais, E. (eds.) NLDB 2000. LNCS, vol. 1959, p. 215. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  2. 2.
    Embley, D.W.: Toward Semantic Understanding: An Approach Based on Information Extraction Ontologies. Presented at ACM International Conference Proceeding Series; Proceedings of the fifteenth conference on Australasian database, Dunedin, New Zealand (2004)Google Scholar
  3. 3.
    Gruber, T.R.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition 5, 199–220 (1993)CrossRefGoogle Scholar
  4. 4.
    Weber, R.: Ontological Issues in Accounting Information Systems. In: Sutton, S.A.A. (ed.) Researching Accounting as an Information Systems Discipline. American Accounting Association, Sarasota (2002)Google Scholar
  5. 5.
    Dahlgren, K.: A Linguistic Ontology. International Journal of Human-Computer Studies 43, 809–818 (1995)CrossRefGoogle Scholar
  6. 6.
    Kedad, Z., Métais, E.: Dealing with Semantic Heterogeneity During Data Integration. In: Akoka, J., Bouzeghoub, M., Comyn-Wattiau, I., Métais, E. (eds.) ER 1999. LNCS, vol. 1728, pp. 325–339. Springer, Heidelberg (1999)Google Scholar
  7. 7.
    Bergholtz, M., Johannesson, P.: Classifying the Semantics of Relationships in Conceptual Modeling by Categorization of Roles, Madrid, Spain, June 28-29 (2001)Google Scholar
  8. 8.
    Storey, V.C.: Classifying and Comparing Relationships in Conceptual Modeling. IEEE Transactions on Knowledge and Data Engineering (forthcoming, 2005)Google Scholar
  9. 9.
    Laender, A.H.F., Ribeiro-Neto, B.A., de Silva, A.S., Teixeira, J.S.: A Brief Survey of Web Data Extraction Tools. ACM SIGMOD Record 31, 84–93 (2002)CrossRefGoogle Scholar
  10. 10.
    Califf, A.M.E., Mooney, R.J.: Relational learning of pattern-match rules for information extraction. Presented at Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, Orlando, Florida (1999)Google Scholar
  11. 11.
    Freitag, D.: Machine Learning for Information Extraction in Informal Domains. Machine Learning 39, 169–202 (2000)zbMATHCrossRefGoogle Scholar
  12. 12.
    Soderland, S.: Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning 34, 233–272 (1999)zbMATHCrossRefGoogle Scholar
  13. 13.
    Embley, D.W., Campbell, D.M., Jiang, Y.S., Ng, Y.-K., Smith, R.D., Liddle, S.W., Quass, D.W.: Conceptual-model-based data extraction from multiple-record Web pages. Data & Knowledge Engineering 31, 227–251 (1999)zbMATHCrossRefGoogle Scholar
  14. 14.
    Etzioni, O.: The World-Wide Web: Quagmire or Gold Mine? Communications of the ACM archive 39, 65–68 (1996)CrossRefGoogle Scholar
  15. 15.
    Kosala, R., Blockeel, H.: Web Mining Research: A Survey. SIGKDD Explorations 2, 1–15 (2000)CrossRefGoogle Scholar
  16. 16.
  17. 17.
    Fellbaum, C.: Introduction. In: WordNet: An Electronic Lexical Database, pp. 1–19. The MIT Press, Cambridge (1998)Google Scholar
  18. 18.
    Burton-Jones, A., Storey, V.C., Sugumaran, V., Purao, S.: A Heuristic-based Methodology for Semantic Augmentation of User Queries on the Web. In: Song, I.-Y., Liddle, S.W., Ling, T.-W., Scheuermann, P. (eds.) ER 2003. LNCS, vol. 2813, pp. 476–489. Springer, Heidelberg (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Veda C. Storey
    • 1
  • Roger Chiang
    • 2
  • G. Lily Chen
    • 1
  1. 1.Department of Computer Information Systems, J. Mack Robinson College of BusinessGeorgia State UniversityAtlanta
  2. 2.Information Systems Department, College of BusinessUniversity of CincinnatiCincinnati

Personalised recommendations