Skip to main content

WebOMSIE: An Ontology-Based Multi Source Web Information Extraction

  • Conference paper
New Trends in Databases and Information Systems

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 185))

  • 1432 Accesses

Abstract

The Web contains a huge volume of information supplied by diverse sources such as e-commerce sites, electronic directories, search engines, etc. The difficulty of the task of automating information extraction from these sources lives on the fact that these last ones were conceived for a human access (manual navigation). This difficulty is increased as the number of sources in question increases. In this paper, we are interested in the problem of EI, from several sources. The first approach to resolve this problem consists in suggesting a new method of EI and applying it to the various sources. This approach is not very successful and it is difficult to implement, especially when the sources are very heterogeneous. Therefore, We propose a more effective alternative, allowing us to benefit from already existing methods and tools, by applying to every source, the tool which suits most. For that purpose, we exploit domain ontology to deduct the tool adequate to every source. In this paper, we present the WebOMSIE system, an ontology-based framework of multi source information extraction from the Web.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chang, C.H., Kayed, M., Moheb, R.G., Shaalan, K.: A Survey of Web Information Extraction Systems. IEEE Transactions on Knowledge and Data Engineering 18 (2006)

    Google Scholar 

  2. Laender, A.-H.-F., RibeiroNeto, B.-A., da Silva, A.-S., Teixeira, J.-S.: A Brief Survey of Web Data Extraction Tools. SIGMOD Record (2002)

    Google Scholar 

  3. Laender, A.-H.-F., RibeiroNeto, B.-A., da Silva, A.-S.: DEByE - Data Extraction by example. Data and Knowledge Engineering (2001)

    Google Scholar 

  4. Lenzerini, M.: Data Integration: A Theoretical Perspective. In: Symposium on Principles of Database Systems (2002)

    Google Scholar 

  5. Arocena, G.O., Mendelzon, A.O.: WebOQL: Restructuring Documents, Databases, and Webs. In: Proc. 14th IEEE Int’l Conf. Data Eng., pp. 24–33 (1998)

    Google Scholar 

  6. Bechofer, S.: The DIG Descirption Logic Interface: DIG/1.1. University of Manchester (2007)

    Google Scholar 

  7. Habegger, B.: Multi-Pattern Wrappers for Relation Extraction from the Web. In: Proceedings of the European Conference on Artificial Intelligence (2002)

    Google Scholar 

  8. Habegger, B.: Extraction d’informations partir du Web. Phd thesis Nantes University (2004)

    Google Scholar 

  9. Hogue, A., Karger, D.: Thresher: Automating the Unwrapping of Semantic Content from the World Wide. In: Proc. 14th Int’l Conf. World Wide Web, pp. 86–95 (2005)

    Google Scholar 

  10. Embley, D.-W., Campbell, D.-M., Jiang, Y.-S., Liddle, S.-W., Lonsdale, D.-.W., Ng, Y.-K., Smith, R.-D.: Conceptual-Model-Based Data Extraction from Multiple-Record Web pages. Data and Knowledge Engineering 31, 227–251 (1999)

    Article  MATH  Google Scholar 

  11. Bijan Parsia, B., Evren, S.: Pellet: An owl dl reasoned. In: Proceedings of the International Workshop on Description Logics (2004)

    Google Scholar 

  12. Wang, J., Lochovsky, F.H.: Data Extraction and Label Assignment for Web Databases. In: Proc. 12th Int’l Conf. World Wide Web (WWW), pp. 187–196 (2003)

    Google Scholar 

  13. Chang, C.-H., Lui, S.-C.: IEPAD: Information Extraction based on Pattern Discovery. In: Proceedings of the ACM WWW 10 Conference (2001)

    Google Scholar 

  14. Hsu, C.-N., Dung, M.-T.: Generating finite state transducers for semi-structured data extraction from the web. Information Systems 23, 521–538 (1998)

    Article  Google Scholar 

  15. Hammer, J., McHugh, J., Garcia-Molina, H.: Semistructured Data: The TSIMMIS Experience. In: Proc. First East-European Symp. Advances in Databases and Information Systems (1997)

    Google Scholar 

  16. Muslea, I., Minton, S., Knoblock, C.: Hierarchical Wrapper Induction for Semistructured Information Sources. Autonomous Agents and Multi-Agent 1 (2001)

    Google Scholar 

  17. Aderlberg, B.: NoDoSE: A Tool for Semi-Automatically Extracting Structured and Semi-Structured Data from Text Document. SIGMOD Record 27, 283–294 (1998)

    Article  Google Scholar 

  18. Kushmerick, N.: Wrapper induction: Efficiency and expressiveness. Artificial Intelligence 118, 15–68 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  19. Kushmerick, N.: Finite-state approaches to web Information Extraction. In: 3rd Summer Convention on Information Extraction (2002)

    Google Scholar 

  20. Sahuguet, A., Azavant, F.: Buildingintelligent web applications using lightweight wrappers. Data and Knowledge Engineering 36 (2001)

    Google Scholar 

  21. Soderlan, S.: Learning Information Extraction Rules for semi-Structured and Free Text. Machine Learning 34, 233–272 (1999)

    Article  Google Scholar 

  22. Freitag, D.: Machine Learning for information Extraction in informal domains. Machine Learning 39, 169–202 (2000)

    Article  MATH  Google Scholar 

  23. Crescenzi, V., Mecca, G.: Grammers Have Execptions. Information Systems 23, 539–565 (1998)

    Article  Google Scholar 

  24. Liu, L., Pu, C., Han, W.: XWRAP: An XML-enable Wrapper Construction System for web information Sources. In: Proceedings of the 16th IEEE International Conference on Data Engineering, pp. 611–621 (2000)

    Google Scholar 

  25. Crescenzi, V., Mecca, G., Merialdo, P.: RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In: Proceedings of the 26th International Conference on Very Large Database Systems, pp. 109–118 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zineb Younsi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Younsi, Z., Quafafou, M., Ouzegane, R., Tari, A. (2013). WebOMSIE: An Ontology-Based Multi Source Web Information Extraction. In: Pechenizkiy, M., Wojciechowski, M. (eds) New Trends in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol 185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32518-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32518-2_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32517-5

  • Online ISBN: 978-3-642-32518-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics