A Template-Based Information Extraction from Web Sites with Unstable Markup

Conference paper

DOI: 10.1007/978-3-319-12024-9_11

Part of the Communications in Computer and Information Science book series (CCIS, volume 475)
Cite this paper as:
Kolchin M., Kozlov F. (2014) A Template-Based Information Extraction from Web Sites with Unstable Markup. In: Presutti V. et al. (eds) Semantic Web Evaluation Challenge. SemWebEval 2014. Communications in Computer and Information Science, vol 475. Springer, Cham

Abstract

This paper presents results of a work on crawling CEUR Workshop proceedings(CEUR Workshop proceedings web site, URL: http://ceur-ws.org) web site to a Linked Open Data (LOD) dataset in the framework of ESWC 2014 Semantic Publishing Challenge 2014(ESWC 2014 Semantic Publishing Challenge, URL: http://2014.eswc-conferences.org/semantic-publishing-challenge). Our approach is based on using an extensible template-dependent crawler and DBpedia for linking extracted entities, such as the names of universities and countries.

Keywords

Information extraction Semantic publishing Linked open data Semantic web 

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.ITMO UniversitySt. PetersburgRussia

Personalised recommendations