Advertisement

AllRight: Automatic Ontology Instantiation from Tabular Web Documents

  • Kostyantyn Shchekotykhin
  • Dietmar Jannach
  • Gerhard Friedrich
  • Olga Kozeruk
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4825)

Abstract

The process of instantiating an ontology with high-quality and up-to-date instance information manually is both time consuming and prone to error. Automatic ontology instantiation from Web sources is one of the possible solutions to this problem and aims at the computer supported population of an ontology through the exploitation of (redundant) information available on the Web.

In this paper we present AllRight, a comprehensive ontology instantiating system. In particular, the techniques implemented in AllRight are designed for application scenarios, in which the desired instance information is given in the form of tables and for which existing Information Extraction (IE) approaches based on statistical or natural language processing methods are not directly applicable.

Within AllRight, we have therefore developed new techniques for dealing with tabular instance data and combined these techniques with existing methods. The system supports all necessary steps for ontology instantiation, i.e. web crawling, name extraction, document clustering as well as fact extraction and validation. AllRight has been successfully evaluated in the popular domains of digital cameras and notebooks leading to a about eighty percent accuracy of the extracted facts given only a very limited amount of seed knowledge.

Keywords

Recommender System Domain Ontology Document Cluster Fact Extraction Core Ontology 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Alani, H., Kim, S., Milard, D.E., Weal, M.J., Hall, W., Lewis, P.H., Shadbolt, N.R.: Automatic ontology-based knowledge extraction from web documents. IEEE Intelligent Systems 18, 14–21 (2003)CrossRefGoogle Scholar
  2. 2.
    Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific web resource discovery. Comput. Networks 31(11-16), 1623–1640 (1999)CrossRefGoogle Scholar
  3. 3.
    Cimiano, P., Ladwig, G., Staab, S.: Gimme’ the context: context-driven automatic semantic annotation with c-pankow. In: Proceedings of the 14th international conference on World Wide Web, pp. 332–341 (2005)Google Scholar
  4. 4.
    Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S.: Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence 118(1), 69–113 (2000)zbMATHCrossRefGoogle Scholar
  5. 5.
    Dill, S., Tomlin, J., Zien, J., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., Rajagopalan, S., et al.: SemTag and seeker: bootstrapping the semantic web via automated semantic annotation. In: Proceedings of the Twelfth International Conference on World Wide Web, pp. 178–186 (2003)Google Scholar
  6. 6.
    Embley, D.W., Campbell, D.M., Jiang, Y.S., Liddle, S.W., Ng, Y.-K., Quass, D., Smith, R.D.: Conceptual-model-based data extraction from multiple-record web pages. Data Knowledge Engineering 31(3), 227–251 (1999)zbMATHCrossRefGoogle Scholar
  7. 7.
    Etzioni, O., Cafarella, M., Downey, D., Popescu, A., Shaked, T., Soderland, S., Weld, D., Yates, A.: Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165(1), 91–134 (2005)CrossRefGoogle Scholar
  8. 8.
    Felfernig, A., Friedrich, G., Jannach, D., Zanker, M.: An integrated environment for the development of knowledge-based recommender applications. International Journal of Electronic Commerce 11(2), 2006–2007 (2007)Google Scholar
  9. 9.
    Felfernig, A., Friedrich, G., Schmidt-Tieme, L.: Recommender systems. IEEE Intelligent Systems - Special Issue on Recommender Systems 22(3) (May 2007)Google Scholar
  10. 10.
    Friedrich, G., Shchekotykhin, K.: A General Diagnosis Method for Ontologies. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 232–246. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Friedrich, G., Shchekotykhin, K.: NameIt: Extraction of product names. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 29–33. Springer, Heidelberg (2006)Google Scholar
  12. 12.
    Ipeirotis, P.G., Agichtein, E., Jain, P., Gravano, L.: To search or to crawl?: towards a query optimizer for text-centric tasks. In: SIGMOD 2006. Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pp. 265–276. ACM Press, New York (2006)CrossRefGoogle Scholar
  13. 13.
    Jayram, T., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., Zhu, H.: Avatar information extraction system. IEEE Data Engineering Bulletin 29(1), 40–48 (2006)Google Scholar
  14. 14.
    Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM) 46(5), 604–632 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Krüpl, B., Herzog, M., Gatterbauer, W.: Visually guided bottom-up table detection and segmentation in web documents. In: WWW 2006. The 15th International World Wide Web Conference (2006)Google Scholar
  16. 16.
    Naumann, F., Bilke, A., Bleiholder, J., Weis, M.: Data fusion in three steps: Resolving schema, tuple, and value inconsistencies. IEEE Data Eng. Bull 29(2), 21–31 (2006)Google Scholar
  17. 17.
    Pan, J., Yang, H., Faloutsos, C., Duygulu, P.: Automatic multimedia cross-modal correlation discovery. In: Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 653–658. ACM Press, New York (2004)CrossRefGoogle Scholar
  18. 18.
    Pelleg, D., Moore, A.W.: X-means: Extending k-means with efficient estimation of the number of clusters. In: Seventeenth International Conference on Machine Learning, pp. 727–734. Morgan Kaufmann, San Francisco (2000)Google Scholar
  19. 19.
    Petasis, G., Karkaletsis, V., Spyropoulos, C.: Cross-lingual information extraction from web pages: the use of a general-purpose text engineering platform. In: 4th International Conference on Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria (2003)Google Scholar
  20. 20.
    Popescu, A., Etzioni, O.: Extracting product features and opinions from reviews. In: Proceedings of EMNLP 2005 (2005)Google Scholar
  21. 21.
    Turney, P., et al.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 417–424 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Kostyantyn Shchekotykhin
    • 1
  • Dietmar Jannach
    • 1
  • Gerhard Friedrich
    • 1
  • Olga Kozeruk
    • 1
  1. 1.Universität Klagenfurt, Universitätsstrasse 65, 9020 Klagenfurt, Austria, Europe 

Personalised recommendations