Abstract
Since datasets with annotation for novelty at the document and/or word level are not easily available, we present a simulation framework that allows us to create different textual datasets in which we control the way novelty occurs. We also present a benchmark of existing methods for novelty detection in textual data streams. We define a few tasks to solve and compare several state-of-the-art methods. The simulation framework allows us to evaluate their performances according to a set of limited scenarios and test their sensitivity to some parameters. Finally, we experiment with the same methods on different kinds of novelty in the New York Times Annotated Dataset.
Keywords
- Novelty Detection
- Text mining
- Evaluation framework
- Natural Language Processing
This is a preview of subscription content, access via your institution.
Buying options







Notes
- 1.
The code for simulation is available at https://github.com/clechristophe/NoveltySimulator.
- 2.
References
Allan, J., Lavrenko, V., Malin, D., Swan, R.: Detections, bounds, and timelines: UMass and TDT-3. In: Proceedings of Topic Detection and Tracking Workshop, pp. 167–174. SN (2000)
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120. ACM (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Eckhoff, R., Markus, M., Lassnig, M., Schon, S.: Detecting weak signals with technologies overview of current technology-enhanced approaches for the detection of weak signals. Int. J. Trends Econ. Manag. Technol. (IJTEMT) 3(5) (2014)
Gerrish, S., Blei, D.M.: A language-based approach to measuring scholarly impact. In: ICML, vol. 10, pp. 375–382. Citeseer (2010)
Hiltunen, E., et al.: Weak signals in organizational futures learning. Helsinki School of Economics (2010)
Lau, J.H., Collier, N., Baldwin, T.: On-line trend analysis with topic models: \(\backslash \)# Twitter trends detection topic model online. In: Proceedings of COLING 2012, pp. 1519–1534 (2012)
Long, R., Wang, H., Chen, Y., Jin, O., Yu, Y.: Towards effective event detection, tracking and summarization on microblog data. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds.) WAIM 2011. LNCS, vol. 6897, pp. 652–663. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23535-1_55
Mannermaa, M.: Heikoista signaaleista vahva tulevaisuus. Wsoy (2004)
Markou, M., Singh, S.: Novelty detection: a review–part 1: statistical approaches. Signal Process. 83(12), 2481–2497 (2003)
Marsland, S.: Novelty detection in learning systems. Neural Comput. Surv. 3(2), 157–195 (2003)
Metzler, D., Cai, C., Hovy, E.: Structured event retrieval over microblog archives. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 646–655. Association for Computational Linguistics (2012)
Murena, P.A., Al-Ghossein, M., Abdessalem, T., Cornuéjols, A.: Adaptive window strategy for topic modeling in document streams. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2018)
Ng, K.W., Tsai, F.S., Chen, L., Goh, K.C.: Novelty detection for text documents using named entity recognition. In: 2007 6th International Conference on Information, Communications & Signal Processing, pp. 1–5. IEEE (2007)
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)
Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014)
Ritter, G., Gallegos, M.T.: Outliers in statistical pattern recognition and an application to automatic chromosome classification. Pattern Recogn. Lett. 18(6), 525–539 (1997)
Suzuki, Y., Fukumoto, F.: Detection of topic and its extrinsic evaluation through multi-document summarization. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, pp. 241–246 (2014)
Xie, W., Zhu, F., Jiang, J., Lim, E.P., Wang, K.: Topicsketch: real-time bursty topic detection from Twitter. IEEE Trans. Knowl. Data Eng. 28(8), 2216–2229 (2016)
Yang, Y., Zhang, J., Carbonell, J., Jin, C.: Topic-conditioned novelty detection. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 688–693. ACM (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Christophe, C., Velcin, J., Cugliari, J., Suignard, P., Boumghar, M. (2020). How to Detect Novelty in Textual Data Streams? A Comparative Study of Existing Methods. In: Lemaire, V., Malinowski, S., Bagnall, A., Bondu, A., Guyet, T., Tavenard, R. (eds) Advanced Analytics and Learning on Temporal Data. AALTD 2019. Lecture Notes in Computer Science(), vol 11986. Springer, Cham. https://doi.org/10.1007/978-3-030-39098-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-39098-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39097-6
Online ISBN: 978-3-030-39098-3
eBook Packages: Computer ScienceComputer Science (R0)
-
Published in cooperation with
http://www.ecmlpkdd.org/