International Conference of the Pacific Association for Computational Linguistics

Computational Linguistics pp 193-208

Detecting Vital Documents Using Negative Relevance Feedback in Distributed Realtime Computation Framework

Conference paper

DOI: 10.1007/978-981-10-0515-2_14

Part of the Communications in Computer and Information Science book series (CCIS, volume 593)
Cite this paper as:
Kawahara S., Seki K., Uehara K. (2016) Detecting Vital Documents Using Negative Relevance Feedback in Distributed Realtime Computation Framework. In: Hasida K., Purwarianti A. (eds) Computational Linguistics. Communications in Computer and Information Science, vol 593. Springer, Singapore

Abstract

Existing knowledge bases including Wikipedia are typically written and maintained by a group of voluntary editors. Meanwhile, numerous web documents are being published partly due to the popularization of online news and social media. Some of the web documents contain novel information, called “vital documents”, that should be taken into account to update articles of the knowledge bases. However, it is virtually impossible for the editors to manually monitor all the relevant web documents. As a result, there is a considerable time lag between an edit to knowledge base and the publication dates of the web documents. This paper proposes a realtime detection framework of web documents containing novel information flowing in massive document streams. The framework consists of two-step filter using statistical language models. Further, the framework is implemented on the distributed and fault-tolerant realtime computation system, Apache Storm, in order to process the sheer amount of web documents. The validity of the proposed framework is demonstrated on a publicly available web document data set, the TREC KBA Stream Corpus.

Keywords

Negative feedback Realtime processing Text data streams Wikipedia 

Copyright information

© Springer Science+Business Media Singapore 2016

Authors and Affiliations

  1. 1.Graduate Schools of System InformaticsKobe UniversityKobeJapan
  2. 2.Faculty of Intelligence and InformaticsKonan UniversityKobeJapan

Personalised recommendations