Introduction

Rodrigues, Mário; Teixeira, António

doi:10.1007/978-3-319-15563-0_1

Mário Rodrigues⁴ &
António Teixeira⁵

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

1883 Accesses

Abstract

Chapter 1 introduces the problem of extracting information from natural language unstructured documents, which is becoming more and more relevant in our “document society”. Despite the many useful applications that the information in these documents can potentiate, it is harder and harder to obtain the wanted information. Major problems result from the fact that much of the documents are in a format non usable by humans or machines. There is the need to create ways to extract relevant information from the vast amount of natural language sources.

After this, the chapter presents, briefly, background information on Semantics, knowledge representation and Natural Language Processing, to support the presentation of the area of Information Extraction [IE, “the analysis of unstructured text in order to extract information about pre-specified types of events, entities or relationships, such as the relationship between disease and genes or disease and food items; in so doing value and insight are added to the data.” (Text mining of web-based medical content, Berlin, p 50)], its challenges, different approaches and general architecture, which is organized as a processing pipeline including domain independent components—tokenization, morphological analysis, part-of-speech tagging, syntactic parsing—and domain specific IE components—named entity recognition and co-reference resolution, relation identification, information fusion, among others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Allen JF (2000) Natural language processing. In: Ralston A, Reilly ED, Hemmendinger D (eds) Encyclopedia of computer science, 4th edn. Wiley, Chichester, pp 1218–1222
Google Scholar
Andersen PM et al (1992) Automatic extraction of facts from press releases to generate news stories. In: Proceedings of the third conference on applied natural language processing. pp 170–177
Google Scholar
Antoniou G, van Harmelen F (2009) Web Ontology Language: OWL. In: Staab S, Studer R (eds) Handbook on ontologies, 2nd edn. International handbooks on information systems. Springer, Berlin, pp 91–110
Google Scholar
Appelt DE et al (1993) FASTUS: a finite-state processor for information extraction from real-world text. In: IJCAI. pp 1172–1178
Google Scholar
Buckland M (2013) The quality of information in the web. BiD: textos universitaris de biblioteconomia i documentació (31)
Google Scholar
Califf ME, Mooney RJ (1999) Relational learning of pattern-match rules for information extraction. In: AAAI/IAAI. pp 328–334
Google Scholar
Chang C-H, Hsu C-N, Lui S-C (2003) Automatic information extraction from semi-structured web pages by pattern discovery. Decis Support Syst 35(1):129–147
Article Google Scholar
Chiticariu L, Li Y, Reiss FR (2013) Rule-based information extraction is dead! Long live rule-based information extraction systems! In: EMNLP. pp 827–832
Google Scholar
Ciravegna F (2001) Adaptive information extraction from text by rule induction and generalisation. In: International joint conference on artificial intelligence. pp 1251–1256
Google Scholar
Gaizauskas R, Wilks Y (1998) Information extraction: beyond document retrieval. J Doc 54(1):70–105
Article Google Scholar
Gruber TR (1993) A translation approach to portable ontology specifications. Knowl Acquis 5(2):199–220
Article Google Scholar
Guarino N (1998) Formal ontology and information systems. In: FOIS 98—proceedings of the international conference on formal ontology in information systems. IOS Press, Amsterdam, pp 3–15
Google Scholar
Guha R, McCool R, Miller E (2003) Semantic search. In: The twelfth international World Wide Web conference (WWW), Budapest. p 779
Google Scholar
Hsu C-N, Dung M-T (1998) Generating finite-state transducers for semi-structured data extraction from the web. Inf Syst 23(8):521–538
Article Google Scholar
Jurafsky D, Martin JH (2008) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd edn. Prentice Hall, New York
Google Scholar
Kasneci G et al (2008) The YAGO-NAGA approach to knowledge discovery. ACM SIGMOD 37:7
Google Scholar
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the international conference on machine learning (ICML-2001)
Google Scholar
Lee L (2004) “I’m sorry Dave, I’m afraid I can’t do that”: linguistics, statistics, and natural language processing circa 2001. In: Committee on the Fundamentals of Computer Science: Challenges and Computer Science Opportunities and National Research Council Telecommunications Board (ed) Computer science: reflections on the field, reflections from the field. The National Academies Press, Washington, pp 111–118
Google Scholar
Lehnert W et al (1993) UMass/Hughes: description of the CIRCUS system used for Tipster text. In: Proceedings of TIPSTER’93, 19–23 September 1993. pp 241–256
Google Scholar
Makhoul J et al (1999) Performance measures for information extraction. In: Proceedings of DARPA broadcast news workshop. pp 249–252
Google Scholar
Màrquez L et al (2008) Semantic role labeling: an introduction to the special issue. Comput Linguist 34(2):145–159
Article Google Scholar
McNaught J, Black W (2006) Information extraction. In: Ananiadou S, McNaught J (eds) Text mining for biology and biomedicine. Artech House, Boston
Google Scholar
Muslea I (1999) Extraction patterns for information extraction tasks: a survey. In: Proceedings of the AAAI 99 workshop on machine learning for information extraction, Orlando, July 1999. pp 1–6
Google Scholar
Neustein A et al (2014) Application of text mining to biomedical knowledge extraction: analyzing clinical narratives and medical literature. In: Neustein A (ed) Text mining of web-based medical content. De Gruyter, Berlin, p 50
Google Scholar
Piskorski J, Yangarber R (2013) Information extraction: past, present and future. In: Multi-source, multilingual information extraction and summarization. Springer, Berlin, pp 23–49
Google Scholar
Ratnaparkhi A (1999) Learning to parse natural language with maximum entropy models. Mach Learn 34(1–3):151–175
Article MATH Google Scholar
Santos D (1992) Natural language and knowledge representation. In: Proceedings of the ERCIM workshop on theoretical and experimental aspects of knowledge representation. pp 195–197
Google Scholar
Sarawagi S (2008) Information extraction. Found Trends Database 1(3):261–377
Article Google Scholar
Soderland S (1999) Learning information extraction rules for semi-structured and free text. Mach Learn 34(1–3):233–272
Article MATH Google Scholar
Sowa JF (2000) Knowledge representation: logical, philosophical, and computational foundations. Brooks Cole, Pacific Grove
Google Scholar
Suchanek F, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web. ACM Press, New York, p 697
Google Scholar
Teixeira A, Ferreira L, Rodrigues M (2014) Online health information semantic search and exploration: reporting on two prototypes for performing extraction on both a hospital intranet and the world wide web. In: Neustein A (ed) Text mining of web-based medical content. De Gruyter, Berlin, p 50
Google Scholar
Viola P, Narasimhan M (2005) Learning to extract information from semi-structured text using a discriminative context free grammar. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval. pp 330–337
Google Scholar
Wimalasuriya DC, Dou D (2010) Ontology-based information extraction: an introduction and a survey of current approaches. J Inf Sci 36(3):306–323
Article Google Scholar

Download references

Author information

Authors and Affiliations

ESTGA/IEETA, University of Aveiro, Aveiro, Portugal
Mário Rodrigues
DETI/IEETA, University of Aveiro, Aveiro, Portugal
António Teixeira

Authors

Mário Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
António Teixeira
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rodrigues, M., Teixeira, A. (2015). Introduction. In: Advanced Applications of Natural Language Processing for Performing Information Extraction. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-15563-0_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-15563-0_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15562-3
Online ISBN: 978-3-319-15563-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics