Chapter

Unsupervised Information Extraction by Text Segmentation

Part of the series SpringerBriefs in Computer Science pp 33-52

Date:

ONDUX

  • Eli CortezAffiliated withInstituto de Computação, Universidade Federal do Amazonas Email author 
  • , Altigran S. da SilvaAffiliated withInstituto de Computação, Universidade Federal do Amazonas

* Final gross prices may vary according to local VAT.

Get Access

Abstract

This chapter presents ONDUX (On Demand Unsupervised Information Extraction) a method that relies on the presented unsupervised approach to deal with the Information Extraction by Text Segmentation problem. ONDUX was first presented in Cortez et al. (2010) and in Cortez and da Silva (2010). Following, a tool based on ONDUX was presented in Porto et al. (2011). As other unsupervised IETS approaches, ONDUX relies on information available on pre-existing data, but, unlike previously proposed methods, it also relies on a very effective set of content-based features to bootstrap the learning of structure-based features. More specifically, structure-based features are exploited to disambiguate the extraction of certain attributes through a reinforcement step. The reinforcement step relies on sequencing and positioning of attribute values directly learned on-demand from test data. In the following, it is presented an overview of ONDUX and describe the main steps involved in its functioning. Next, each step is discussed in turn with detail. It also reported an experimental evaluation of ONDUX presenting its performance in different datasets and domains. Finally, it is described as a tool that implements the ONDUX method.

Keywords

Information extraction Unsupervised approach Text segmentation Databases Knowledge bases On demand