Multi-source, Multilingual Information Extraction and Summarization

Part of the series Theory and Applications of Natural Language Processing pp 23-49


Information Extraction: Past, Present and Future

  • Jakub PiskorskiAffiliated withInstitute for Computer Science, Polish Academy of Sciences
  • , Roman YangarberAffiliated withDepartment of Computer Science, University of Helsinki Email author 

* Final gross prices may vary according to local VAT.

Get Access


In this chapter we present a brief overview of Information Extraction, which is an area of natural language processing that deals with finding factual information in free text. In formal terms, facts are structured objects, such as database records. Such a record may capture a real-world entity with its attributes mentioned in text, or a real-world event, occurrence, or state, with its arguments or actors: who did what to whom, where and when. Information is typically sought in a particular target setting, e.g., corporate mergers and acquisitions. Searching for specific, targeted factual information constitutes a large proportion of all searching activity on the part of information consumers. There has been a sustained interest in Information Extraction for over two decades, due to its conceptual simplicity on one hand, and to its potential utility on the other. Although the targeted nature of this task makes it more tractable than some of the more open-ended tasks in NLP, it is replete with challenges as the information landscape evolves, which also makes it an exciting research subject.