Abstract
This chapter presents ONDUX (On Demand Unsupervised Information Extraction) a method that relies on the presented unsupervised approach to deal with the Information Extraction by Text Segmentation problem. ONDUX was first presented in Cortez et al. (2010) and in Cortez and da Silva (2010). Following, a tool based on ONDUX was presented in Porto et al. (2011). As other unsupervised IETS approaches, ONDUX relies on information available on pre-existing data, but, unlike previously proposed methods, it also relies on a very effective set of content-based features to bootstrap the learning of structure-based features. More specifically, structure-based features are exploited to disambiguate the extraction of certain attributes through a reinforcement step. The reinforcement step relies on sequencing and positioning of attribute values directly learned on-demand from test data. In the following, it is presented an overview of ONDUX and describe the main steps involved in its functioning. Next, each step is discussed in turn with detail. It also reported an experimental evaluation of ONDUX presenting its performance in different datasets and domains. Finally, it is described as a tool that implements the ONDUX method.
This chapter has previously been published as Cortez et al. (2010); reprinted with permission.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Agichtein, E., & Ganti, V. (2004). Mining reference tables for automatic text segmentation. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 20–29), Seattle, USA.
Anderson, T., & Finn, J. (1996). The new statistical analysis of data. Berlin: Springer.
Borkar, V., Deshmukh, K., & Sarawagi, S. (2001). Automatic segmentation of text into structured records. In Proceedings of the ACM SIGMOD International Conference on Management of Data Conference (pp. 175–186), Santa Barbara, USA.
Cortez, E., & da Silva, A. S. (2010). Unsupervised strategies for information extraction by text segmentation. In Proceedings of the SIGMOD PhD Workshop on Innovative Database Research (pp. 49–54), Indianapolis, USA.
Cortez, E., da Silva, A., Gonçalves, M., & de Moura, E. (2010). ONDUX: On-demand unsupervised learning for information extraction. In Proceedings of the ACM SIGMOD International Conference on Management of Data Conference (pp. 807–818), Indianapolis, USA.
Kaelbling, L. P., Littman, M. L., & Moore, A. P. (1996). Reinforcement learning: A survey. Journal Artificial Intelligence Research, 4(1), 237–285.
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the ICML International Conference on Machine Learning (pp. 282–289), Williamstown, USA.
Mansuri, I. R., & Sarawagi, S. (2006). Integrating unstructured data into relational databases. In Proceedings of the IEEE ICDE International Conference on Data Engineering (pp. 29–41), Atlanta, USA.
McCallum, A. (2012). Cora information extraction collection. http://www.cs.umass.edu/~mccallum/data/cora-ie.tar.gz
Muslea, I. (2012). Rise—a repository of online information sources used in information extraction tasks. http://www.isi.edu/info-agents/RISE/index.html
Peng, F., & McCallum, A. (2006). Information extraction from research papers using conditional random fields. Information Processing and Management, 42(4), 963–979.
Porto, A., Cortez, E., da Silva, A. S., & de Moura, E. S. (2011). Unsupervised information extraction with the ondux tool. In Simpsio Brasileiro de Banco de Dados, Florianpolis, Brasil.
Sarawagi, S. (2008). Information extraction. Foundations and Trends in Databases, 1(3), 261–377.
Zhao, C., Mahmud, J., & Ramakrishnan, I. (2008). Exploiting structured reference data for unsupervised text segmentation with conditional random fields. In Proceedings of the SIAM International Conference on Data Mining (pp. 420–431), Atlanta, USA.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2013 The Author(s)
About this chapter
Cite this chapter
Cortez, E., da Silva, A.S. (2013). ONDUX . In: Unsupervised Information Extraction by Text Segmentation. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-02597-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-02597-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02596-4
Online ISBN: 978-3-319-02597-1
eBook Packages: Computer ScienceComputer Science (R0)