Abstract
Chapter 1 introduces the problem of extracting information from natural language unstructured documents, which is becoming more and more relevant in our “document society”. Despite the many useful applications that the information in these documents can potentiate, it is harder and harder to obtain the wanted information. Major problems result from the fact that much of the documents are in a format non usable by humans or machines. There is the need to create ways to extract relevant information from the vast amount of natural language sources.
After this, the chapter presents, briefly, background information on Semantics, knowledge representation and Natural Language Processing, to support the presentation of the area of Information Extraction [IE, “the analysis of unstructured text in order to extract information about pre-specified types of events, entities or relationships, such as the relationship between disease and genes or disease and food items; in so doing value and insight are added to the data.” (Text mining of web-based medical content, Berlin, p 50)], its challenges, different approaches and general architecture, which is organized as a processing pipeline including domain independent components—tokenization, morphological analysis, part-of-speech tagging, syntactic parsing—and domain specific IE components—named entity recognition and co-reference resolution, relation identification, information fusion, among others.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allen JF (2000) Natural language processing. In: Ralston A, Reilly ED, Hemmendinger D (eds) Encyclopedia of computer science, 4th edn. Wiley, Chichester, pp 1218–1222
Andersen PM et al (1992) Automatic extraction of facts from press releases to generate news stories. In: Proceedings of the third conference on applied natural language processing. pp 170–177
Antoniou G, van Harmelen F (2009) Web Ontology Language: OWL. In: Staab S, Studer R (eds) Handbook on ontologies, 2nd edn. International handbooks on information systems. Springer, Berlin, pp 91–110
Appelt DE et al (1993) FASTUS: a finite-state processor for information extraction from real-world text. In: IJCAI. pp 1172–1178
Buckland M (2013) The quality of information in the web. BiD: textos universitaris de biblioteconomia i documentació (31)
Califf ME, Mooney RJ (1999) Relational learning of pattern-match rules for information extraction. In: AAAI/IAAI. pp 328–334
Chang C-H, Hsu C-N, Lui S-C (2003) Automatic information extraction from semi-structured web pages by pattern discovery. Decis Support Syst 35(1):129–147
Chiticariu L, Li Y, Reiss FR (2013) Rule-based information extraction is dead! Long live rule-based information extraction systems! In: EMNLP. pp 827–832
Ciravegna F (2001) Adaptive information extraction from text by rule induction and generalisation. In: International joint conference on artificial intelligence. pp 1251–1256
Gaizauskas R, Wilks Y (1998) Information extraction: beyond document retrieval. J Doc 54(1):70–105
Gruber TR (1993) A translation approach to portable ontology specifications. Knowl Acquis 5(2):199–220
Guarino N (1998) Formal ontology and information systems. In: FOIS 98—proceedings of the international conference on formal ontology in information systems. IOS Press, Amsterdam, pp 3–15
Guha R, McCool R, Miller E (2003) Semantic search. In: The twelfth international World Wide Web conference (WWW), Budapest. p 779
Hsu C-N, Dung M-T (1998) Generating finite-state transducers for semi-structured data extraction from the web. Inf Syst 23(8):521–538
Jurafsky D, Martin JH (2008) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd edn. Prentice Hall, New York
Kasneci G et al (2008) The YAGO-NAGA approach to knowledge discovery. ACM SIGMOD 37:7
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the international conference on machine learning (ICML-2001)
Lee L (2004) “I’m sorry Dave, I’m afraid I can’t do that”: linguistics, statistics, and natural language processing circa 2001. In: Committee on the Fundamentals of Computer Science: Challenges and Computer Science Opportunities and National Research Council Telecommunications Board (ed) Computer science: reflections on the field, reflections from the field. The National Academies Press, Washington, pp 111–118
Lehnert W et al (1993) UMass/Hughes: description of the CIRCUS system used for Tipster text. In: Proceedings of TIPSTER’93, 19–23 September 1993. pp 241–256
Makhoul J et al (1999) Performance measures for information extraction. In: Proceedings of DARPA broadcast news workshop. pp 249–252
Màrquez L et al (2008) Semantic role labeling: an introduction to the special issue. Comput Linguist 34(2):145–159
McNaught J, Black W (2006) Information extraction. In: Ananiadou S, McNaught J (eds) Text mining for biology and biomedicine. Artech House, Boston
Muslea I (1999) Extraction patterns for information extraction tasks: a survey. In: Proceedings of the AAAI 99 workshop on machine learning for information extraction, Orlando, July 1999. pp 1–6
Neustein A et al (2014) Application of text mining to biomedical knowledge extraction: analyzing clinical narratives and medical literature. In: Neustein A (ed) Text mining of web-based medical content. De Gruyter, Berlin, p 50
Piskorski J, Yangarber R (2013) Information extraction: past, present and future. In: Multi-source, multilingual information extraction and summarization. Springer, Berlin, pp 23–49
Ratnaparkhi A (1999) Learning to parse natural language with maximum entropy models. Mach Learn 34(1–3):151–175
Santos D (1992) Natural language and knowledge representation. In: Proceedings of the ERCIM workshop on theoretical and experimental aspects of knowledge representation. pp 195–197
Sarawagi S (2008) Information extraction. Found Trends Database 1(3):261–377
Soderland S (1999) Learning information extraction rules for semi-structured and free text. Mach Learn 34(1–3):233–272
Sowa JF (2000) Knowledge representation: logical, philosophical, and computational foundations. Brooks Cole, Pacific Grove
Suchanek F, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web. ACM Press, New York, p 697
Teixeira A, Ferreira L, Rodrigues M (2014) Online health information semantic search and exploration: reporting on two prototypes for performing extraction on both a hospital intranet and the world wide web. In: Neustein A (ed) Text mining of web-based medical content. De Gruyter, Berlin, p 50
Viola P, Narasimhan M (2005) Learning to extract information from semi-structured text using a discriminative context free grammar. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval. pp 330–337
Wimalasuriya DC, Dou D (2010) Ontology-based information extraction: an introduction and a survey of current approaches. J Inf Sci 36(3):306–323
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2015 The Authors
About this chapter
Cite this chapter
Rodrigues, M., Teixeira, A. (2015). Introduction. In: Advanced Applications of Natural Language Processing for Performing Information Extraction. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-15563-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-15563-0_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15562-3
Online ISBN: 978-3-319-15563-0
eBook Packages: EngineeringEngineering (R0)