ONDUX

Cortez, Eli; da Silva, Altigran S.

doi:10.1007/978-3-319-02597-1_4

Eli Cortez¹⁶ &
Altigran S. da Silva¹⁶

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

1092 Accesses

Abstract

This chapter presents ONDUX (On Demand Unsupervised Information Extraction) a method that relies on the presented unsupervised approach to deal with the Information Extraction by Text Segmentation problem. ONDUX was first presented in Cortez et al. (2010) and in Cortez and da Silva (2010). Following, a tool based on ONDUX was presented in Porto et al. (2011). As other unsupervised IETS approaches, ONDUX relies on information available on pre-existing data, but, unlike previously proposed methods, it also relies on a very effective set of content-based features to bootstrap the learning of structure-based features. More specifically, structure-based features are exploited to disambiguate the extraction of certain attributes through a reinforcement step. The reinforcement step relies on sequencing and positioning of attribute values directly learned on-demand from test data. In the following, it is presented an overview of ONDUX and describe the main steps involved in its functioning. Next, each step is discussed in turn with detail. It also reported an experimental evaluation of ONDUX presenting its performance in different datasets and domains. Finally, it is described as a tool that implements the ONDUX method.

This chapter has previously been published as Cortez et al. (2010); reprinted with permission.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://crf.sourceforge.net/

References

Agichtein, E., & Ganti, V. (2004). Mining reference tables for automatic text segmentation. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 20–29), Seattle, USA.
Google Scholar
Anderson, T., & Finn, J. (1996). The new statistical analysis of data. Berlin: Springer.
Book MATH Google Scholar
Borkar, V., Deshmukh, K., & Sarawagi, S. (2001). Automatic segmentation of text into structured records. In Proceedings of the ACM SIGMOD International Conference on Management of Data Conference (pp. 175–186), Santa Barbara, USA.
Google Scholar
Cortez, E., & da Silva, A. S. (2010). Unsupervised strategies for information extraction by text segmentation. In Proceedings of the SIGMOD PhD Workshop on Innovative Database Research (pp. 49–54), Indianapolis, USA.
Google Scholar
Cortez, E., da Silva, A., Gonçalves, M., & de Moura, E. (2010). ONDUX: On-demand unsupervised learning for information extraction. In Proceedings of the ACM SIGMOD International Conference on Management of Data Conference (pp. 807–818), Indianapolis, USA.
Google Scholar
Kaelbling, L. P., Littman, M. L., & Moore, A. P. (1996). Reinforcement learning: A survey. Journal Artificial Intelligence Research, 4(1), 237–285.
Google Scholar
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the ICML International Conference on Machine Learning (pp. 282–289), Williamstown, USA.
Google Scholar
Mansuri, I. R., & Sarawagi, S. (2006). Integrating unstructured data into relational databases. In Proceedings of the IEEE ICDE International Conference on Data Engineering (pp. 29–41), Atlanta, USA.
Google Scholar
McCallum, A. (2012). Cora information extraction collection. http://www.cs.umass.edu/~mccallum/data/cora-ie.tar.gz
Muslea, I. (2012). Rise—a repository of online information sources used in information extraction tasks. http://www.isi.edu/info-agents/RISE/index.html
Peng, F., & McCallum, A. (2006). Information extraction from research papers using conditional random fields. Information Processing and Management, 42(4), 963–979.
Article Google Scholar
Porto, A., Cortez, E., da Silva, A. S., & de Moura, E. S. (2011). Unsupervised information extraction with the ondux tool. In Simpsio Brasileiro de Banco de Dados, Florianpolis, Brasil.
Google Scholar
Sarawagi, S. (2008). Information extraction. Foundations and Trends in Databases, 1(3), 261–377.
Article Google Scholar
Zhao, C., Mahmud, J., & Ramakrishnan, I. (2008). Exploiting structured reference data for unsupervised text segmentation with conditional random fields. In Proceedings of the SIAM International Conference on Data Mining (pp. 420–431), Atlanta, USA.
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Computação, Universidade Federal do Amazonas, Manaus, AM, Brazil
Eli Cortez & Altigran S. da Silva

Authors

Eli Cortez
View author publications
You can also search for this author in PubMed Google Scholar
Altigran S. da Silva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eli Cortez .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cortez, E., da Silva, A.S. (2013). ONDUX . In: Unsupervised Information Extraction by Text Segmentation. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-02597-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-02597-1_4
Published: 24 October 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02596-4
Online ISBN: 978-3-319-02597-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics