Abstract
Online novelty detection is an important technology in understanding and exploiting streaming data. One application of online novelty detection is First Story Detection (FSD) which attempts to find the very first story about a new topic, e.g. the first news report discussing the “Beast from the East” hitting Ireland. Although hundreds of FSD models have been developed, the vast majority of these only aim at improving the performance of the detection for some specific dataset, and very few focus on the insight of novelty itself. We believe that online novelty detection, framed as an unsupervised learning problem, always requires a clear definition of novelty. Indeed, we argue the definition of novelty is the key issue in designing a good detection model. Within the context of FSD, we first categorise online novelty detection models into three main categories, based on different definitions of novelty scores, and then compare the performances of these model categories in different features spaces. Our experimental results show that the challenge of FSD varies across novelty scores (and corresponding model categories); and, furthermore, that the detection of novelty in the very popular Word2Vec feature space is more difficult than in a normal frequency-based feature space because of a loss of word specificity.
Keywords
- Online novelty detection
- First Story Detection (FSD)
- Unsupervised learning
- Novelty score
- Feature space
- Word2Vec
This is a preview of subscription content, access via your institution.
Buying options


References
Allan, J., et al.: Topic detection and tracking pilot study final report (1998)
Allan, J., et al.: Detections, bounds, and timelines: UMass and TDT-3. In: Proceedings of Topic Detection and Tracking Workshop. sn (2000)
Fiscus, J., et al.: NISTs 1998 topic detection and tracking evaluation (TDT2). In: Proceedings of the 1999 DARPA Broadcast News Workshop (1999)
Ma, J., Perkins, S.: Online novelty detection on temporal sequences. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2003)
Martin, A., et al.: The DET curve in assessment of detection task performance. National Institute of Standards and Technology, Gaithersburg, MD (1997)
Mikolov, T, et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Moran, S., et al.: Enhancing first story detection using word embeddings. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (2016)
Petrovic, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics (2010)
Petrovic, S., Osborne, M., Lavrenko, V.: Using paraphrases for improving first story detection in news and Twitter. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics (2012)
Pimentel, M.A.F.: A review of novelty detection. Signal Process. 99, 215–249 (2014)
Qiu, Y., et al.: Time-aware first story detection in Twitter stream. In: IEEE International Conference on Data Science in Cyberspace (DSC). IEEE (2016)
Schlkopf, B.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Wang, F., Franco-Penya, H.-H., Kelleher, J.D., Pugh, J., Ross, R.: An analysis of the application of simplified silhouette to the evaluation of k-means clustering validity. In: Perner, P. (ed.) MLDM 2017. LNCS (LNAI), vol. 10358, pp. 291–305. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62416-7_21
Wurzer, D., Lavrenko, V., Osborne, M.: Twitter-scale new event detection via K-term hashing. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (2015)
Yang, Y., Pierce, T., Carbonell, J.: A study of retrospective and on-line event detection. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (1998)
Acknowledgement
The authors wish to acknowledge the support of the ADAPT Research Centre. The ADAPT Centre for Digital Content Technology is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Funds.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, F., Ross, R.J., Kelleher, J.D. (2018). Exploring Online Novelty Detection Using First Story Detection Models. In: Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2018. IDEAL 2018. Lecture Notes in Computer Science(), vol 11314. Springer, Cham. https://doi.org/10.1007/978-3-030-03493-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-03493-1_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03492-4
Online ISBN: 978-3-030-03493-1
eBook Packages: Computer ScienceComputer Science (R0)