ONDUX

Chapter
Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)

Abstract

This chapter presents ONDUX (On Demand Unsupervised Information Extraction) a method that relies on the presented unsupervised approach to deal with the Information Extraction by Text Segmentation problem. ONDUX was first presented in Cortez et al. (2010) and in Cortez and da Silva (2010). Following, a tool based on ONDUX was presented in Porto et al. (2011). As other unsupervised IETS approaches, ONDUX relies on information available on pre-existing data, but, unlike previously proposed methods, it also relies on a very effective set of content-based features to bootstrap the learning of structure-based features. More specifically, structure-based features are exploited to disambiguate the extraction of certain attributes through a reinforcement step. The reinforcement step relies on sequencing and positioning of attribute values directly learned on-demand from test data. In the following, it is presented an overview of ONDUX and describe the main steps involved in its functioning. Next, each step is discussed in turn with detail. It also reported an experimental evaluation of ONDUX presenting its performance in different datasets and domains. Finally, it is described as a tool that implements the ONDUX method.

Keywords

Information extraction Unsupervised approach Text segmentation Databases Knowledge bases On demand 

References

  1. Agichtein, E., & Ganti, V. (2004). Mining reference tables for automatic text segmentation. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 20–29), Seattle, USA.Google Scholar
  2. Anderson, T., & Finn, J. (1996). The new statistical analysis of data. Berlin: Springer.CrossRefMATHGoogle Scholar
  3. Borkar, V., Deshmukh, K., & Sarawagi, S. (2001). Automatic segmentation of text into structured records. In Proceedings of the ACM SIGMOD International Conference on Management of Data Conference (pp. 175–186), Santa Barbara, USA.Google Scholar
  4. Cortez, E., & da Silva, A. S. (2010). Unsupervised strategies for information extraction by text segmentation. In Proceedings of the SIGMOD PhD Workshop on Innovative Database Research (pp. 49–54), Indianapolis, USA.Google Scholar
  5. Cortez, E., da Silva, A., Gonçalves, M., & de Moura, E. (2010). ONDUX: On-demand unsupervised learning for information extraction. In Proceedings of the ACM SIGMOD International Conference on Management of Data Conference (pp. 807–818), Indianapolis, USA.Google Scholar
  6. Kaelbling, L. P., Littman, M. L., & Moore, A. P. (1996). Reinforcement learning: A survey. Journal Artificial Intelligence Research, 4(1), 237–285.Google Scholar
  7. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the ICML International Conference on Machine Learning (pp. 282–289), Williamstown, USA.Google Scholar
  8. Mansuri, I. R., & Sarawagi, S. (2006). Integrating unstructured data into relational databases. In Proceedings of the IEEE ICDE International Conference on Data Engineering (pp. 29–41), Atlanta, USA.Google Scholar
  9. McCallum, A. (2012). Cora information extraction collection. http://www.cs.umass.edu/~mccallum/data/cora-ie.tar.gz
  10. Muslea, I. (2012). Rise—a repository of online information sources used in information extraction tasks. http://www.isi.edu/info-agents/RISE/index.html
  11. Peng, F., & McCallum, A. (2006). Information extraction from research papers using conditional random fields. Information Processing and Management, 42(4), 963–979.CrossRefGoogle Scholar
  12. Porto, A., Cortez, E., da Silva, A. S., & de Moura, E. S. (2011). Unsupervised information extraction with the ondux tool. In Simpsio Brasileiro de Banco de Dados, Florianpolis, Brasil.Google Scholar
  13. Sarawagi, S. (2008). Information extraction. Foundations and Trends in Databases, 1(3), 261–377.CrossRefGoogle Scholar
  14. Zhao, C., Mahmud, J., & Ramakrishnan, I. (2008). Exploiting structured reference data for unsupervised text segmentation with conditional random fields. In Proceedings of the SIAM International Conference on Data Mining (pp. 420–431), Atlanta, USA.Google Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  1. 1.Instituto de ComputaçãoUniversidade Federal do AmazonasManausBrazil

Personalised recommendations