Advertisement

Information Retrieval

, Volume 9, Issue 5, pp 521–541 | Cite as

The nature of novelty detection

  • Le Zhao
  • Min Zhang
  • Shaoping Ma
Article

Abstract

Sentence level novelty detection aims at spotting sentences with novel information from an ordered sentence list. In the task, sentences appearing later in the list with no new meanings are eliminated. For the task of novelty detection, the contributions of this paper are three-fold. First, conceptually, this paper reveals the computational nature of the task currently overlooked by the Novelty community—Novelty as a combination of partial overlap (PO) and complete overlap (CO) relations between sentences. We define partial overlap between two sentences as a sharing of common facts, while complete overlap is when one sentence covers all of the meanings of the other sentence. Second, technically, a novel approach, the selected pool method is provided which follows naturally from the PO-CO computational structure. We provide formal error analysis for selected pool and methods based on this PO-CO framework. We address the question how accurate must the PO judgments be to outperform the baseline pool method. Third, experimentally, results were presented for all the three novelty datasets currently available. Results show that the selected pool is significantly better or no worse than the current methods, an indication that the term overlap criterion for the PO judgments could be adequately accurate.

Keywords

Novelty detection Overlap relations Meanings TREC 

Notes

Acknowledgments

Special thanks to Ellen Voorhees and the anonymous reviewers of Journal of Information Retrieval for numerous suggestions about the organization, the presentation, the evaluation methodologies of this paper, and several important issues related to the PO relation property (5) and the semantic modeling of sentences in Section 4.1 that has made this paper greatly better. The authors would also like to thank Prof. Xiaotie Deng for help on the final revision of the paper that improved its readability a lot.

References

  1. Allan, J., Wade, C., & Bolivar, A. (2003). Retrieval and novelty detection at the sentence level. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003) (pp. 314–321).Google Scholar
  2. Broder, A. (1997). On the resemblance and containment of documents sequences. In Compression and Complexity of Sequences 1997 (pp. 21–29).Google Scholar
  3. Collins-Thompson. K., Ogilvie, P., Zhang, Y., & Callan, J. (2002). Information filtering, novelty detection and named-page finding. In Proceedings of the eleventh Text REtrieval Conference (TREC 2002).Google Scholar
  4. Duda, R., Hart, P., & Stork, D. (2000). Pattern Classification, 2nd Ed. Wiley-Interscience.Google Scholar
  5. Gabrilovich, E., Dumais, S., & Horvitz, E. (2004). Newsjunkie: Providing personalized newsfeeds via analysis of information novelty. In Proceedings of the 13th International Conference on World Wide Web (WWW 2004) (pp. 482–490).Google Scholar
  6. Gamut, L. (1991). Logic, Language and Meaning, Chicago: The University of Chicago Press.Google Scholar
  7. Harman, D. (2002). Overview of the TREC 2002 novelty track. In Proceedings of the Eleventh Text REtrieval Conference (TREC 2002).Google Scholar
  8. Hull, D. (1993). Using statistical testing in the evaluation of retrieval experiments. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1993) (pp. 329–338).Google Scholar
  9. Li, X., & Croft, W. (2005). Novelty detection based on sentence level patterns. In Proceedings of ACM Fourteenth Conference on Information and Knowledge Management (CIKM 2005) (pp. 744–751).Google Scholar
  10. Opitz, B, Mecklinger, A., Friederici, A., & von Cramon, D. (1999). The functional neuroanatomy of novelty processing: Integrating ERP and fMRI results. Cerebral Cortex, 9, 379–391.Google Scholar
  11. Ponte, J., & Croft, W. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998) (pp. 275–281).Google Scholar
  12. Ru, L., Zhao, L., Zhang, M., & Ma, S. (2004). Improved feature selection and redundance computing—thuir at trec 2004 novelty track. In Proceedings of the 13th Text REtrieval Conference (TREC 2004).Google Scholar
  13. Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24, 513–523.CrossRefGoogle Scholar
  14. Saunders, R., & Gero, J. (2001). Designing for interest and novelty, motivating design agents. In Proceedings of the 9th International Conference on Computer aided Architectural Design Futures (pp. 725–738).Google Scholar
  15. Schiffman, B., & McKeown, K. (2005). Context and learning in novelty detection. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005).Google Scholar
  16. Soboroff, I. (2004). Overview of the TREC 2004 Novelty Track. In Proceedings of the 13th Text REtrieval Conference (TREC 2004).Google Scholar
  17. Soboroff, I., & Harman, D. (2003). Overview of the TREC 2003 Novelty Track. In Proceedings of the twelfth Text REtrieval Conference (TREC 2003).Google Scholar
  18. Soboroff, I., & Harman, D. (2005). Novelty Detection: The TREC Experience. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005).Google Scholar
  19. Yang, Y., Zhang, J., Carbonell, J., & Jin, C. (2002). Topic-conditioned novelty detection. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 2002).Google Scholar
  20. YZhang, Y., Callan, J., & Minka, T. (2002). Novelty and redundancy detection in adaptive filtering. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002) (pp. 81–88).Google Scholar
  21. Zhang, M., Lin, C., Liu, Y., Zhao, L., & Ma, S. (2003). THUIR at TREC 2003: Novelty, robust and web. In Proceedings of the 12th Text REtrieval Conference (TREC 2003) (pp. 556–567).Google Scholar
  22. Zhang, M., Song, R., Lin, C., Jiang. Z., Jin, Y., Liu, Y., Zhao, L., & Ma, S. (2002). Expansion-based technologies in finding relevant and new information: THU TREC2002 novelty track experiments. In Proceedings of the Eleventh Text REtrieval Conference (TREC 2002).Google Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  1. 1.State Key Lab of Intelligent Technologies and System, Department of Computer Science and TechnologyTsinghua UniversityBeijingP.R. China

Personalised recommendations