Skip to main content

Automatic Extraction of Semantically-Meaningful Information from the Web

  • Conference paper
  • First Online:
Adaptive Hypermedia and Adaptive Web-Based Systems (AH 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2347))

Abstract

The semantic Web will bring meaning to the Internet, making it possible for web agents to understand the information it contains. However, current trends seem to suggest that the semantic web is not likely to be adopted in the forthcoming years. In this sense, meaningful information extraction from the web becomes a handicap for web agents. In this article, we present a framework for automatic extraction of semantically-meaningful information from the current web. Separating the extraction process from the business logic of an agent enhances modularity, adaptability, and maintainability. Our approach is novel in that it combines different technologies to extract information, surf the web and automatically adapt to web changes.

The work reported in this article was supported by the Spanish Inter-ministerial Commission on Science and Technology under grant TIC2000-1106-C02-01

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. DARPA (Defense Advanced Research Projects Agency). The darpa agent mark up language (daml). http://www.daml.org, 2000.

  2. R. Baumgartner, S. Flesca, and G. Gottlob. Visual web information extraction with lixto. In 27th VLDB Conference, 2001.

    Google Scholar 

  3. T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. Scientific American, May 2001.

    Google Scholar 

  4. W. W. Cohen and L. S. Jensen. A structured wrapper induction system for extracting information from semi-structured documents. In Workshop on Adaptive Text Extraction and Mining (IJCAI-2001), 2001.

    Google Scholar 

  5. W3C (The World Wide Web Consortium). Document object model. http://www.w3.org/DOM/, 2000.

  6. O. Corcho and A. Gómez-Pérez. A road map on ontology specification languages. In Workshop on Applications of Ontologies and Problem solving methods. 14th European Conference on Artificial Intelligence (ECAI’00), 2000.

    Google Scholar 

  7. S. Cranefield and M. Purvis. Generating ontology-specific content languages. In Proceedings of Ontologies in Agent Systems Workshop (Agents 2001),, pages 29–35, 2000.

    Google Scholar 

  8. S.J. DeRose. Xml linking. ACM Computing Surveys, 1999.

    Google Scholar 

  9. Finin, T. Labrou, and Y. Mayfield. Kqml as an agent communication language. Software Agents, MIT Press, 1997.

    Google Scholar 

  10. FIPA (The Fundation for Intelligent Physical Agents). Fipa specifications. http://www.fipa.org/specifications/index.html.

  11. H. García-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. Integrating and accessing heterogeneous information sources in tsimmis. In The AAAI Symposium on Information Gathering, pages 61–64, March 1995.

    Google Scholar 

  12. C.F. Goldfarb and P. Prescod. The XML Handbook. Prentice-Hall, 2nd edition, 2000.

    Google Scholar 

  13. OMG (Object Management Group). Unified modelling language version 2.0. http://www.omg.org/uml/, 2001.

  14. J. Hendler. Agents and the semantic web. IEEE Intelligent Systems Journal, 2001.

    Google Scholar 

  15. C. A. Knoblock. Accurately and reliably extracting data from the web: A machine learning approach. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2000.

    Google Scholar 

  16. N. Kushmerick. Regression testing for wrapper maintenance. In Proceedings of the 16th National Conference on Artificial Intelligence (AAAI-1999), pages 74–79, 1999.

    Google Scholar 

  17. N. Kushmerick. Wrapper induction: Efficiency and expressiveness. Artificial Intelligence, 118(2000):15–68, 1999.

    Article  MathSciNet  Google Scholar 

  18. N. Kushmerick. Wrapper verification. World Wide Web Journal, 2000.

    Google Scholar 

  19. S. Luke, L. Spector, D. Rager, and J. Hendler. Ontology-based web agents. In First International Conference on Autonomous Agents, 1997.

    Google Scholar 

  20. G. Mecca, P. Merialdo, and P. Atzeni. Araneus in the era of xml. Data Engineering Bullettin, Special Issue on XML, September 1999.

    Google Scholar 

  21. I. Muslea, S. Minton, and C. Knoblock. Wrapper induction for semistructured, web-based information sources. In Proceedings of the Conference on Automated Learning and Discovery (CONALD), 1998.

    Google Scholar 

  22. J. Odell, H. Van Dyke, and B. Bauer. Extending uml for agents. In AOIS Worshop (AAAI), pages 3–17, 2000.

    Google Scholar 

  23. S. Soderland. Learning information extraction rules for semi-structured and free text. Machine Learning, pages 1–44, 1999.

    Google Scholar 

  24. M. J. Wooldridge and M. R. Jennings. Intelligent agents: Theory and practice. The Knowledge Engineering Review, 10(2):115–152, 1995.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Arjona, J., Corchuelo, R., Ruiz, A., Toro, M. (2002). Automatic Extraction of Semantically-Meaningful Information from the Web. In: De Bra, P., Brusilovsky, P., Conejo, R. (eds) Adaptive Hypermedia and Adaptive Web-Based Systems. AH 2002. Lecture Notes in Computer Science, vol 2347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47952-X_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-47952-X_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43737-6

  • Online ISBN: 978-3-540-47952-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics