WebOMSIE: An Ontology-Based Multi Source Web Information Extraction

Younsi, Zineb; Quafafou, Mohamed; Ouzegane, Redouane; Tari, Abdelkamel

doi:10.1007/978-3-642-32518-2_19

Zineb Younsi³,
Mohamed Quafafou⁴,
Redouane Ouzegane³ &
…
Abdelkamel Tari³

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 185))

1432 Accesses

Abstract

The Web contains a huge volume of information supplied by diverse sources such as e-commerce sites, electronic directories, search engines, etc. The difficulty of the task of automating information extraction from these sources lives on the fact that these last ones were conceived for a human access (manual navigation). This difficulty is increased as the number of sources in question increases. In this paper, we are interested in the problem of EI, from several sources. The first approach to resolve this problem consists in suggesting a new method of EI and applying it to the various sources. This approach is not very successful and it is difficult to implement, especially when the sources are very heterogeneous. Therefore, We propose a more effective alternative, allowing us to benefit from already existing methods and tools, by applying to every source, the tool which suits most. For that purpose, we exploit domain ontology to deduct the tool adequate to every source. In this paper, we present the WebOMSIE system, an ontology-based framework of multi source information extraction from the Web.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chang, C.H., Kayed, M., Moheb, R.G., Shaalan, K.: A Survey of Web Information Extraction Systems. IEEE Transactions on Knowledge and Data Engineering 18 (2006)
Google Scholar
Laender, A.-H.-F., RibeiroNeto, B.-A., da Silva, A.-S., Teixeira, J.-S.: A Brief Survey of Web Data Extraction Tools. SIGMOD Record (2002)
Google Scholar
Laender, A.-H.-F., RibeiroNeto, B.-A., da Silva, A.-S.: DEByE - Data Extraction by example. Data and Knowledge Engineering (2001)
Google Scholar
Lenzerini, M.: Data Integration: A Theoretical Perspective. In: Symposium on Principles of Database Systems (2002)
Google Scholar
Arocena, G.O., Mendelzon, A.O.: WebOQL: Restructuring Documents, Databases, and Webs. In: Proc. 14th IEEE Int’l Conf. Data Eng., pp. 24–33 (1998)
Google Scholar
Bechofer, S.: The DIG Descirption Logic Interface: DIG/1.1. University of Manchester (2007)
Google Scholar
Habegger, B.: Multi-Pattern Wrappers for Relation Extraction from the Web. In: Proceedings of the European Conference on Artificial Intelligence (2002)
Google Scholar
Habegger, B.: Extraction d’informations partir du Web. Phd thesis Nantes University (2004)
Google Scholar
Hogue, A., Karger, D.: Thresher: Automating the Unwrapping of Semantic Content from the World Wide. In: Proc. 14th Int’l Conf. World Wide Web, pp. 86–95 (2005)
Google Scholar
Embley, D.-W., Campbell, D.-M., Jiang, Y.-S., Liddle, S.-W., Lonsdale, D.-.W., Ng, Y.-K., Smith, R.-D.: Conceptual-Model-Based Data Extraction from Multiple-Record Web pages. Data and Knowledge Engineering 31, 227–251 (1999)
Article MATH Google Scholar
Bijan Parsia, B., Evren, S.: Pellet: An owl dl reasoned. In: Proceedings of the International Workshop on Description Logics (2004)
Google Scholar
Wang, J., Lochovsky, F.H.: Data Extraction and Label Assignment for Web Databases. In: Proc. 12th Int’l Conf. World Wide Web (WWW), pp. 187–196 (2003)
Google Scholar
Chang, C.-H., Lui, S.-C.: IEPAD: Information Extraction based on Pattern Discovery. In: Proceedings of the ACM WWW 10 Conference (2001)
Google Scholar
Hsu, C.-N., Dung, M.-T.: Generating finite state transducers for semi-structured data extraction from the web. Information Systems 23, 521–538 (1998)
Article Google Scholar
Hammer, J., McHugh, J., Garcia-Molina, H.: Semistructured Data: The TSIMMIS Experience. In: Proc. First East-European Symp. Advances in Databases and Information Systems (1997)
Google Scholar
Muslea, I., Minton, S., Knoblock, C.: Hierarchical Wrapper Induction for Semistructured Information Sources. Autonomous Agents and Multi-Agent 1 (2001)
Google Scholar
Aderlberg, B.: NoDoSE: A Tool for Semi-Automatically Extracting Structured and Semi-Structured Data from Text Document. SIGMOD Record 27, 283–294 (1998)
Article Google Scholar
Kushmerick, N.: Wrapper induction: Efficiency and expressiveness. Artificial Intelligence 118, 15–68 (2000)
Article MathSciNet MATH Google Scholar
Kushmerick, N.: Finite-state approaches to web Information Extraction. In: 3rd Summer Convention on Information Extraction (2002)
Google Scholar
Sahuguet, A., Azavant, F.: Buildingintelligent web applications using lightweight wrappers. Data and Knowledge Engineering 36 (2001)
Google Scholar
Soderlan, S.: Learning Information Extraction Rules for semi-Structured and Free Text. Machine Learning 34, 233–272 (1999)
Article Google Scholar
Freitag, D.: Machine Learning for information Extraction in informal domains. Machine Learning 39, 169–202 (2000)
Article MATH Google Scholar
Crescenzi, V., Mecca, G.: Grammers Have Execptions. Information Systems 23, 539–565 (1998)
Article Google Scholar
Liu, L., Pu, C., Han, W.: XWRAP: An XML-enable Wrapper Construction System for web information Sources. In: Proceedings of the 16th IEEE International Conference on Data Engineering, pp. 611–621 (2000)
Google Scholar
Crescenzi, V., Mecca, G., Merialdo, P.: RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In: Proceedings of the 26th International Conference on Very Large Database Systems, pp. 109–118 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Bejaia University, Bejaia, 06000, Algeria
Zineb Younsi, Redouane Ouzegane & Abdelkamel Tari
Marseille University France, Saint Jórome Marseille, France
Mohamed Quafafou

Authors

Zineb Younsi
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Quafafou
View author publications
You can also search for this author in PubMed Google Scholar
Redouane Ouzegane
View author publications
You can also search for this author in PubMed Google Scholar
Abdelkamel Tari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zineb Younsi .

Editor information

Editors and Affiliations

, Department of Computer Science, Eindhoven University of Technology, Eindhoven, 5600, Netherlands
Mykola Pechenizkiy
Institute of Computing Science, Poznan University of Technology, ul. Piotrowo 2, Poznan, 60-965, Poland
Marek Wojciechowski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Younsi, Z., Quafafou, M., Ouzegane, R., Tari, A. (2013). WebOMSIE: An Ontology-Based Multi Source Web Information Extraction. In: Pechenizkiy, M., Wojciechowski, M. (eds) New Trends in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol 185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32518-2_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-32518-2_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32517-5
Online ISBN: 978-3-642-32518-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics