Skip to main content

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

  • 1883 Accesses

Abstract

Chapter 1 introduces the problem of extracting information from natural language unstructured documents, which is becoming more and more relevant in our “document society”. Despite the many useful applications that the information in these documents can potentiate, it is harder and harder to obtain the wanted information. Major problems result from the fact that much of the documents are in a format non usable by humans or machines. There is the need to create ways to extract relevant information from the vast amount of natural language sources.

After this, the chapter presents, briefly, background information on Semantics, knowledge representation and Natural Language Processing, to support the presentation of the area of Information Extraction [IE, “the analysis of unstructured text in order to extract information about pre-specified types of events, entities or relationships, such as the relationship between disease and genes or disease and food items; in so doing value and insight are added to the data.” (Text mining of web-based medical content, Berlin, p 50)], its challenges, different approaches and general architecture, which is organized as a processing pipeline including domain independent components—tokenization, morphological analysis, part-of-speech tagging, syntactic parsing—and domain specific IE components—named entity recognition and co-reference resolution, relation identification, information fusion, among others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Allen JF (2000) Natural language processing. In: Ralston A, Reilly ED, Hemmendinger D (eds) Encyclopedia of computer science, 4th edn. Wiley, Chichester, pp 1218–1222

    Google Scholar 

  • Andersen PM et al (1992) Automatic extraction of facts from press releases to generate news stories. In: Proceedings of the third conference on applied natural language processing. pp 170–177

    Google Scholar 

  • Antoniou G, van Harmelen F (2009) Web Ontology Language: OWL. In: Staab S, Studer R (eds) Handbook on ontologies, 2nd edn. International handbooks on information systems. Springer, Berlin, pp 91–110

    Google Scholar 

  • Appelt DE et al (1993) FASTUS: a finite-state processor for information extraction from real-world text. In: IJCAI. pp 1172–1178

    Google Scholar 

  • Buckland M (2013) The quality of information in the web. BiD: textos universitaris de biblioteconomia i documentació (31)

    Google Scholar 

  • Califf ME, Mooney RJ (1999) Relational learning of pattern-match rules for information extraction. In: AAAI/IAAI. pp 328–334

    Google Scholar 

  • Chang C-H, Hsu C-N, Lui S-C (2003) Automatic information extraction from semi-structured web pages by pattern discovery. Decis Support Syst 35(1):129–147

    Article  Google Scholar 

  • Chiticariu L, Li Y, Reiss FR (2013) Rule-based information extraction is dead! Long live rule-based information extraction systems! In: EMNLP. pp 827–832

    Google Scholar 

  • Ciravegna F (2001) Adaptive information extraction from text by rule induction and generalisation. In: International joint conference on artificial intelligence. pp 1251–1256

    Google Scholar 

  • Gaizauskas R, Wilks Y (1998) Information extraction: beyond document retrieval. J Doc 54(1):70–105

    Article  Google Scholar 

  • Gruber TR (1993) A translation approach to portable ontology specifications. Knowl Acquis 5(2):199–220

    Article  Google Scholar 

  • Guarino N (1998) Formal ontology and information systems. In: FOIS 98—proceedings of the international conference on formal ontology in information systems. IOS Press, Amsterdam, pp 3–15

    Google Scholar 

  • Guha R, McCool R, Miller E (2003) Semantic search. In: The twelfth international World Wide Web conference (WWW), Budapest. p 779

    Google Scholar 

  • Hsu C-N, Dung M-T (1998) Generating finite-state transducers for semi-structured data extraction from the web. Inf Syst 23(8):521–538

    Article  Google Scholar 

  • Jurafsky D, Martin JH (2008) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd edn. Prentice Hall, New York

    Google Scholar 

  • Kasneci G et al (2008) The YAGO-NAGA approach to knowledge discovery. ACM SIGMOD 37:7

    Google Scholar 

  • Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the international conference on machine learning (ICML-2001)

    Google Scholar 

  • Lee L (2004) “I’m sorry Dave, I’m afraid I can’t do that”: linguistics, statistics, and natural language processing circa 2001. In: Committee on the Fundamentals of Computer Science: Challenges and Computer Science Opportunities and National Research Council Telecommunications Board (ed) Computer science: reflections on the field, reflections from the field. The National Academies Press, Washington, pp 111–118

    Google Scholar 

  • Lehnert W et al (1993) UMass/Hughes: description of the CIRCUS system used for Tipster text. In: Proceedings of TIPSTER’93, 19–23 September 1993. pp 241–256

    Google Scholar 

  • Makhoul J et al (1999) Performance measures for information extraction. In: Proceedings of DARPA broadcast news workshop. pp 249–252

    Google Scholar 

  • Màrquez L et al (2008) Semantic role labeling: an introduction to the special issue. Comput Linguist 34(2):145–159

    Article  Google Scholar 

  • McNaught J, Black W (2006) Information extraction. In: Ananiadou S, McNaught J (eds) Text mining for biology and biomedicine. Artech House, Boston

    Google Scholar 

  • Muslea I (1999) Extraction patterns for information extraction tasks: a survey. In: Proceedings of the AAAI 99 workshop on machine learning for information extraction, Orlando, July 1999. pp 1–6

    Google Scholar 

  • Neustein A et al (2014) Application of text mining to biomedical knowledge extraction: analyzing clinical narratives and medical literature. In: Neustein A (ed) Text mining of web-based medical content. De Gruyter, Berlin, p 50

    Google Scholar 

  • Piskorski J, Yangarber R (2013) Information extraction: past, present and future. In: Multi-source, multilingual information extraction and summarization. Springer, Berlin, pp 23–49

    Google Scholar 

  • Ratnaparkhi A (1999) Learning to parse natural language with maximum entropy models. Mach Learn 34(1–3):151–175

    Article  MATH  Google Scholar 

  • Santos D (1992) Natural language and knowledge representation. In: Proceedings of the ERCIM workshop on theoretical and experimental aspects of knowledge representation. pp 195–197

    Google Scholar 

  • Sarawagi S (2008) Information extraction. Found Trends Database 1(3):261–377

    Article  Google Scholar 

  • Soderland S (1999) Learning information extraction rules for semi-structured and free text. Mach Learn 34(1–3):233–272

    Article  MATH  Google Scholar 

  • Sowa JF (2000) Knowledge representation: logical, philosophical, and computational foundations. Brooks Cole, Pacific Grove

    Google Scholar 

  • Suchanek F, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web. ACM Press, New York, p 697

    Google Scholar 

  • Teixeira A, Ferreira L, Rodrigues M (2014) Online health information semantic search and exploration: reporting on two prototypes for performing extraction on both a hospital intranet and the world wide web. In: Neustein A (ed) Text mining of web-based medical content. De Gruyter, Berlin, p 50

    Google Scholar 

  • Viola P, Narasimhan M (2005) Learning to extract information from semi-structured text using a discriminative context free grammar. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval. pp 330–337

    Google Scholar 

  • Wimalasuriya DC, Dou D (2010) Ontology-based information extraction: an introduction and a survey of current approaches. J Inf Sci 36(3):306–323

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2015 The Authors

About this chapter

Cite this chapter

Rodrigues, M., Teixeira, A. (2015). Introduction. In: Advanced Applications of Natural Language Processing for Performing Information Extraction. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-15563-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-15563-0_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-15562-3

  • Online ISBN: 978-3-319-15563-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics