A Template-Based Information Extraction from Web Sites with Unstable Markup

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 475)

Abstract

This paper presents results of a work on crawling CEUR Workshop proceedings(CEUR Workshop proceedings web site, URL: http://ceur-ws.org) web site to a Linked Open Data (LOD) dataset in the framework of ESWC 2014 Semantic Publishing Challenge 2014(ESWC 2014 Semantic Publishing Challenge, URL: http://2014.eswc-conferences.org/semantic-publishing-challenge). Our approach is based on using an extensible template-dependent crawler and DBpedia for linking extracted entities, such as the names of universities and countries.

Keywords

Information extraction Semantic publishing Linked open data Semantic web 

Notes

Acknowledgments

This work has been partially financially supported by the Government of Russian Federation, Grant #074-U01.

References

  1. 1.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Seman. Web J. (2014). http://www.semantic-web-journal.net/content/dbpedia-large-scale-multilingual-knowledge-base-extracted-wikipedia-0
  2. 2.
    Ratcliff, J.W., Metzener, D.E.: Pattern-matching-the gestalt approach. Dr DOBBS J. (DDJ) 13(7), 1–46 (1988)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.ITMO UniversitySt. PetersburgRussia

Personalised recommendations