Introduction

Cortez, Eli; da Silva, Altigran S.

doi:10.1007/978-3-319-02597-1_1

Eli Cortez¹⁶ &
Altigran S. da Silva¹⁶

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

1108 Accesses

Abstract

The Information Extraction problem (IE) refers to the automatic extraction of structured information from noisy unstructured textual sources. This problem is a research topic in different Computer Science communities, such as: Databases, Information Retrieval, and Artificial Intelligence. This chapter provides an introduction of this problem and also an overview of how information extraction fits into the broader topics of data management. It also provides a list of the main contribution that can be found in this book.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.pbct.inweb.org.br/pbct/

References

Agichtein, E., & Ganti, V. (2004). Mining reference tables for automatic text segmentation. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 20–29), Seattle, USA.
Google Scholar
Banko, M., Cafarella, M., Soderland, S., Broadhead, M., & Etzioni, O. (2009). Open information extraction for the web. PhD thesis, University of Washington.
Google Scholar
Barbosa, L., & Freire, J. (2007). An adaptive crawler for locating hidden-web entry points. In Proceedings of the WWW International World Wide Web Conferences (pp. 441–450), Alberta, Canada.
Google Scholar
Borkar, V., Deshmukh, K., & Sarawagi, S. (2001). Automatic segmentation of text into structured records. In Proceedings of the ACM SIGMOD International Conference on Management of Data Conference (pp. 175–186), Santa Barbara, USA.
Google Scholar
Cafarella, M., Halevy, A., Wang, D., Wu, E., & Zhang, Y. (2008). Webtables: Exploring the power of tables on the web. Proceedings of the VLDB Endowment, 1(1), 538–549.
Google Scholar
Chang, K., He, B., Li, C., Patel, M., & Zhang, Z. (2004). Structured databases on the web: Observations and implications. ACM SIGMOD Record, 33(3), 61–70.
Article Google Scholar
Chang, C., Kayed, M., Girgis, M., & Shaalan, K. (2006). A survey of web information extraction systems. IEEE Transactions on Knowledge and Data Engineering, 18(10), 1411–1428.
Article Google Scholar
Chuang, S., Chang, K., & Zhai, C. (2007). Context-aware wrapping: Synchronized data extraction. In Proceedings of the VLDB International Conference on Very Large Data Bases (pp. 699–710), Viena, Austria.
Google Scholar
Cortez, E., & da Silva, A. S. (2010). Unsupervised strategies for information extraction by text segmentation. In Proceedings of the SIGMOD PhD Workshop on Innovative Database Research (pp. 49–54), Indianapolis, USA.
Google Scholar
Cortez, E., da Silva, A., Gonçalves, M., & de Moura, E. (2010). ONDUX: On-demand unsupervised learning for information extraction. In Proceedings of the ACM SIGMOD International Conference on Management of Data Conference (pp. 807–818), Indianapolis, USA.
Google Scholar
Cortez, E., da Silva, A. S., de Moura, E. S., & Laender, A. H. F. (2011). Joint unsupervised structure discovery and information extraction. In Proceedings of the ACM SIGMOD International Conference on Management of Data Conference (pp. 541–552), Athens, Greece.
Google Scholar
Fader, A., Soderland, S., & Etzioni, O. (2011). Identifying relations for open information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 1535–1545), Edinburgh, UK.
Google Scholar
Freitag, D., & McCallum, A. (2000). Information extraction with HMM structures learned by stochastic optimization. In Proceedings of the National Conference on Artificial Intelligence and Conference on Innovative Applications of Artificial Intelligence (pp. 584–589), Austin, USA.
Google Scholar
Halevy, A. (2012). Towards an ecosystem of structured data on the web. In Proceedings of the International Conference on Extending Database Technology (pp. 1–2), Berlin, Germany.
Google Scholar
Jin, W., Ho, H., & Srihari, R. (2009). OpinionMiner: A novel machine learning system for web opinion mining and extraction. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1195–1204), Paris, France.
Google Scholar
Laender, A. H. F., Ribeiro-Neto, B. A., da Silva, A. S., & Teixeira, J. S. (2002). A brief survey of web data extraction tools. SIGMOD Record, 31(2), 84–93.
Article Google Scholar
Laender, A., Moro, M., Gonçalves, M., Davis, Jr., C., da Silva, A., Silva, A., et al. (2011a). Building a research social network from an individual perspective. In Proceedings of the International ACM/IEEE Joint Conference on Digital Libraries (pp. 427–428), Ottawa, Canada.
Google Scholar
Laender, A., Moro, M., Gonçalves, M., Davis Jr, C., da Silva, A., Silva, A., et al. (2011b). Ciência Brasil—the Brazilian portal of science and technology. In Integrated Seminar of Software and Hardware (SEMISH), Natal, Brasil.
Google Scholar
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the ICML International Conference on Machine Learning (pp. 282–289), Williamstown, USA.
Google Scholar
Madhavan, J., Jeffery, S., Cohen, S., Dong, X., Ko, D., Yu, C., et al. (2007). Web-scale data integration: You can only afford to pay as you go. In Proceedings of the CIDR Biennial Conference on Innovative Data Systems Research (pp. 342–350), Asilomar, USA.
Google Scholar
Mansuri, I. R., & Sarawagi, S. (2006). Integrating unstructured data into relational databases. In Proceedings of the IEEE ICDE International Conference on Data Engineering (pp. 29–41), Atlanta, USA.
Google Scholar
Mausam, Schmitz, M., Soderland, S., Bart, R., & Etzioni, O. (2012). Open language learning for information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 523–534), Jeju Island, Korea.
Google Scholar
Mesquita, F., & Barbosa, D. (2011). Extracting meta statements from the blogosphere. In Proceedings of the International Conference on Weblogs and Social Media, Barcelona, Spain.
Google Scholar
Peng, F., & McCallum, A. (2006). Information extraction from research papers using conditional random fields. Information Processing and Management, 42(4), 963–979.
Article Google Scholar
Porto, A., Cortez, E., da Silva, A. S., & de Moura, E. S. (2011). Unsupervised information extraction with the ondux tool. In Simpsio Brasileiro de Banco de Dados, Florianpolis, Brasil.
Google Scholar
Ratinov, L., & Roth, D. (2009). Design challenges and misconceptions in named entity recognition. In Proceedings of the Conference on Computational Natural Language Learning (pp. 147–155), Stroudsburg, USA.
Google Scholar
Ritter, A., Clark, S., & Etzioni, O. (2011). Named entity recognition in tweets: An experimental study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 1524–1534), Edinburgh, UK.
Google Scholar
Sarawagi, S. (2008). Information extraction. Foundations and Trends in Databases, 1(3), 261–377.
Article Google Scholar
Sardi Mergen, S., Freire, J., & Heuser, C. (2010). Indexing relations on the web. In Proceedings of the International Conference on Extending Database Technology (pp. 430–440), Lausanne, Switzerland.
Google Scholar
Serra, E., Cortez, E., da Silva, A., & de Moura, E. (2011). On using Wikipedia to build knowledge bases for information extraction by text segmentation. Journal of Information and Data Management, 2(3), 259.
Google Scholar
Toda, G., Cortez, E., Mesquita, F., da Silva, A., Moura, E., & Neubert, M. (2009). Automatically filling form-based web interfaces with free text inputs. In Proceedings of the WWW International World Wide Web Conferences (pp. 1163–1164), Madrid, Spain.
Google Scholar
Toda, G., Cortez, E., da Silva, A. S., & de Moura, E. S. (2010). A probabilistic approach for automatically filling form-based web interfaces. Proceedings of the VLDB Endowment, 4(3), 151–160.
Google Scholar
Vidal, M., da Silva, A., de Moura, E., & Cavalcanti, J. (2006). Structure-driven crawler generation by example. In Proceedings of the International ACM SIGIR Conference on Research & Development of Information Retrieval (pp. 292–299), Seattle, USA.
Google Scholar
Zhao, C., Mahmud, J., & Ramakrishnan, I. (2008). Exploiting structured reference data for unsupervised text segmentation with conditional random fields. In Proceedings of the SIAM International Conference on Data Mining (pp. 420–431), Atlanta, USA.
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Computação, Universidade Federal do Amazonas, Manaus, AM, Brazil
Eli Cortez & Altigran S. da Silva

Authors

Eli Cortez
View author publications
You can also search for this author in PubMed Google Scholar
Altigran S. da Silva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eli Cortez .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cortez, E., da Silva, A.S. (2013). Introduction. In: Unsupervised Information Extraction by Text Segmentation. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-02597-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-02597-1_1
Published: 24 October 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02596-4
Online ISBN: 978-3-319-02597-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics