WebOMSIE: An Ontology-Based Multi Source Web Information Extraction

  • Zineb Younsi
  • Mohamed Quafafou
  • Redouane Ouzegane
  • Abdelkamel Tari
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 185)


The Web contains a huge volume of information supplied by diverse sources such as e-commerce sites, electronic directories, search engines, etc. The difficulty of the task of automating information extraction from these sources lives on the fact that these last ones were conceived for a human access (manual navigation). This difficulty is increased as the number of sources in question increases. In this paper, we are interested in the problem of EI, from several sources. The first approach to resolve this problem consists in suggesting a new method of EI and applying it to the various sources. This approach is not very successful and it is difficult to implement, especially when the sources are very heterogeneous. Therefore, We propose a more effective alternative, allowing us to benefit from already existing methods and tools, by applying to every source, the tool which suits most. For that purpose, we exploit domain ontology to deduct the tool adequate to every source. In this paper, we present the WebOMSIE system, an ontology-based framework of multi source information extraction from the Web.


Information extraction WETDL ontology knowledge base reasoner descriptive logic 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chang, C.H., Kayed, M., Moheb, R.G., Shaalan, K.: A Survey of Web Information Extraction Systems. IEEE Transactions on Knowledge and Data Engineering 18 (2006)Google Scholar
  2. 2.
    Laender, A.-H.-F., RibeiroNeto, B.-A., da Silva, A.-S., Teixeira, J.-S.: A Brief Survey of Web Data Extraction Tools. SIGMOD Record (2002)Google Scholar
  3. 3.
    Laender, A.-H.-F., RibeiroNeto, B.-A., da Silva, A.-S.: DEByE - Data Extraction by example. Data and Knowledge Engineering (2001)Google Scholar
  4. 4.
    Lenzerini, M.: Data Integration: A Theoretical Perspective. In: Symposium on Principles of Database Systems (2002)Google Scholar
  5. 5.
    Arocena, G.O., Mendelzon, A.O.: WebOQL: Restructuring Documents, Databases, and Webs. In: Proc. 14th IEEE Int’l Conf. Data Eng., pp. 24–33 (1998)Google Scholar
  6. 6.
    Bechofer, S.: The DIG Descirption Logic Interface: DIG/1.1. University of Manchester (2007)Google Scholar
  7. 7.
    Habegger, B.: Multi-Pattern Wrappers for Relation Extraction from the Web. In: Proceedings of the European Conference on Artificial Intelligence (2002)Google Scholar
  8. 8.
    Habegger, B.: Extraction d’informations partir du Web. Phd thesis Nantes University (2004)Google Scholar
  9. 9.
    Hogue, A., Karger, D.: Thresher: Automating the Unwrapping of Semantic Content from the World Wide. In: Proc. 14th Int’l Conf. World Wide Web, pp. 86–95 (2005)Google Scholar
  10. 10.
    Embley, D.-W., Campbell, D.-M., Jiang, Y.-S., Liddle, S.-W., Lonsdale, D.-.W., Ng, Y.-K., Smith, R.-D.: Conceptual-Model-Based Data Extraction from Multiple-Record Web pages. Data and Knowledge Engineering 31, 227–251 (1999)MATHCrossRefGoogle Scholar
  11. 11.
    Bijan Parsia, B., Evren, S.: Pellet: An owl dl reasoned. In: Proceedings of the International Workshop on Description Logics (2004)Google Scholar
  12. 12.
    Wang, J., Lochovsky, F.H.: Data Extraction and Label Assignment for Web Databases. In: Proc. 12th Int’l Conf. World Wide Web (WWW), pp. 187–196 (2003)Google Scholar
  13. 13.
    Chang, C.-H., Lui, S.-C.: IEPAD: Information Extraction based on Pattern Discovery. In: Proceedings of the ACM WWW 10 Conference (2001)Google Scholar
  14. 14.
    Hsu, C.-N., Dung, M.-T.: Generating finite state transducers for semi-structured data extraction from the web. Information Systems 23, 521–538 (1998)CrossRefGoogle Scholar
  15. 15.
    Hammer, J., McHugh, J., Garcia-Molina, H.: Semistructured Data: The TSIMMIS Experience. In: Proc. First East-European Symp. Advances in Databases and Information Systems (1997)Google Scholar
  16. 16.
    Muslea, I., Minton, S., Knoblock, C.: Hierarchical Wrapper Induction for Semistructured Information Sources. Autonomous Agents and Multi-Agent 1 (2001)Google Scholar
  17. 17.
    Aderlberg, B.: NoDoSE: A Tool for Semi-Automatically Extracting Structured and Semi-Structured Data from Text Document. SIGMOD Record 27, 283–294 (1998)CrossRefGoogle Scholar
  18. 18.
    Kushmerick, N.: Wrapper induction: Efficiency and expressiveness. Artificial Intelligence 118, 15–68 (2000)MathSciNetMATHCrossRefGoogle Scholar
  19. 19.
    Kushmerick, N.: Finite-state approaches to web Information Extraction. In: 3rd Summer Convention on Information Extraction (2002)Google Scholar
  20. 20.
    Sahuguet, A., Azavant, F.: Buildingintelligent web applications using lightweight wrappers. Data and Knowledge Engineering 36 (2001)Google Scholar
  21. 21.
    Soderlan, S.: Learning Information Extraction Rules for semi-Structured and Free Text. Machine Learning 34, 233–272 (1999)CrossRefGoogle Scholar
  22. 22.
    Freitag, D.: Machine Learning for information Extraction in informal domains. Machine Learning 39, 169–202 (2000)MATHCrossRefGoogle Scholar
  23. 23.
    Crescenzi, V., Mecca, G.: Grammers Have Execptions. Information Systems 23, 539–565 (1998)CrossRefGoogle Scholar
  24. 24.
    Liu, L., Pu, C., Han, W.: XWRAP: An XML-enable Wrapper Construction System for web information Sources. In: Proceedings of the 16th IEEE International Conference on Data Engineering, pp. 611–621 (2000)Google Scholar
  25. 25.
    Crescenzi, V., Mecca, G., Merialdo, P.: RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In: Proceedings of the 26th International Conference on Very Large Database Systems, pp. 109–118 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Zineb Younsi
    • 1
  • Mohamed Quafafou
    • 2
  • Redouane Ouzegane
    • 1
  • Abdelkamel Tari
    • 1
  1. 1.Bejaia UniversityBejaiaAlgeria
  2. 2.Marseille University FranceSaint Jórome MarseilleFrance

Personalised recommendations