Abstract
Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.
This is a preview of subscription content, access via your institution.

















References
- 1.
Al-qahtani, F.H.: Multivariate k-Nearest Neighbour Regression for Time Series data—a novel Algorithm for Forecasting UK Electricity Demand Multivariate KNN Regression for Time Series. Neural Networks (IJCNN), The 2013 International Joint Conference on pp 228–235 (2013)
- 2.
Ban, T., Zhang, R., Pang, S., Sarrafzadeh, A., Inoue, D.: Referential kNN regression for financial time series forecasting. In: Lee, M., Hirose, A., Hou, Z.G., Kil, R.M. (eds.) Neural Information Processing, pp. 601–608. Springer, Heidelberg (2013)
- 3.
Beyer, C., Niemann, U., Unnikrishnan, V., Ntoutsi, E., Spiliopoulou, M.: Predicting document polarities on a stream without reading their contents. In: Proceedings of the Symposium on Applied Computing (SAC) (2018)
- 4.
Che, Z., Purushotham, S., Cho, K., Sontag, D., Liu, Y.: Recurrent neural networks for multivariate time series with missing values. Sci. Reports 8(1), 6085 (2018)
- 5.
Clarke, R.N.: SICs as Delineators of Economic Markets. J. Bus. 62(1), 17–31, (1989) https://ideas.repec.org/a/ucp/jnlbus/v62y1989i1p17-31.html. Accessed 1 Feb 2018
- 6.
Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in nonstationary environments: a survey. IEEE Comput. Intell. Mag. 10(4), 12–25 (2015)
- 7.
Dyer, K.B., Capo, R., Polikar, R.: Compose: A semisupervised learning framework for initially labeled nonstationary streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 12–26 (2014)
- 8.
Hartmann, C., Ressel, F., Hahmann, M., Habich, D., Lehner, W.: Csar: the cross-sectional autoregression model for short and long-range forecasting. Int. J. Data Sci. Anal. (2019). https://doi.org/10.1007/s41060-018-00169-7
- 9.
Hiller, W., Goebel, G.: When tinnitus loudness and annoyance are discrepant: audiological characteristics and psychological profile. Audiol. Neurotol. 12(6), 391–400 (2007)
- 10.
Iosifidis, V., Ntoutsi, E.: Large scale sentiment learning with limited labels. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 1823–1832 (2017)
- 11.
Keogh, E.J., Pazzani, M.J.: Scaling up dynamic time warping for datamining applications. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 285–289 (2000)
- 12.
Kia, A.N., Haratizadeh, S., Shouraki, S.B.: A hybrid supervised semi-supervised graph-based model to predict one-day ahead movement of global stock markets and commodity prices. Expert Syst. Appl. 105, 159–173 (2018)
- 13.
Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017)
- 14.
Krempl, G., Žliobaite, I., Brzeziński, D., Hüllermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., et al.: Open challenges for data stream mining research. ACM SIGKDD Explorations Newslett. 16(1), 1–10 (2014)
- 15.
Längkvist, M., Karlsson, L., Loutfi, A.: A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recogn. Lett. 42, 11–24 (2014)
- 16.
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp 1188–1196 (2014)
- 17.
Lora, A.T., Santos, J.R., Santos, J.R., Ramos, J.L.M., Expósito, A.G.: Electricity market price forecasting: Neural networks versus weighted-distance k nearest neighbours. In: International Conference on Database and Expert Systems Applications, Springer, pp 321–330 (2002)
- 18.
Lora, A.T., Santos, J.M.R., Riquelme, J.C., Expósito, A.G., Ramos, J.L.M.: Time-series prediction: Application to the short-term electric energy demand. Current Topics in Artificial Intelligence pp 577–586 (2004)
- 19.
Lora, A.T., Santos, J.M.R., Exposito, A.G., Ramos, J.L.M., Santos, J.C.R.: Electricity market price forecasting based on weighted nearest neighbors techniques. IEEE Trans. Power Syst. 22(3), 1294–1301 (2007)
- 20.
McAuley, J., Yang, A.: Addressing complex and subjective product-related queries with customer reviews. In: Proceedings of the 25th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp 625–635 (2016)
- 21.
Polson, N.G., Sokolov, V.O.: Deep learning for short-term traffic flow prediction. Transportation Research Part C: Emerging Technologies 79, (2017)
- 22.
Pryss, R., Probst, T., Schlee, W., Schobel, J., Langguth, B., Neff, P., Spiliopoulou, M., Reichert, M.: Prospective crowdsensing versus retrospective ratings of tinnitus variability and tinnitus-stress associations based on the trackyourtinnitus mobile platform. Int. J. Data Sci. Anal. (2018). https://doi.org/10.1007/s41060-018-0111-4
- 23.
Řehůřek, R., Sojka, P.: Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, Valletta, Malta, pp 45–50 (2010)
- 24.
Serrao, E., Spiliopoulou, M.: Active stream learning with an oracle of unknown availability for sentiment prediction. In: 2nd Int. Workshop on Interactive Adaptive Learning (IAL2018) at ECML PKDD 2018, Dublin, Ireland, accepted in July 2018, to appear (2018)
- 25.
Suresh, H., Hunt, N., Johnson, A., Celi, L.A., Szolovits, P., Ghassemi, M.: Clinical intervention prediction and understanding with deep neural networks. In: Machine Learning for Healthcare Conference, pp 322–337 (2017)
- 26.
Troncoso Lora, A., Riquelme, J.C., Martínez Ramos, J.L., Riquelme Santos, J.M., Gómez Expósito, A.: Influence of kNN-based load forecasting errors on optimal energy production. In: Pires, F.M., Abreu, S. (eds.) Progress in Artificial Intelligence, pp. 189–203. Springer, Heidelberg (2003)
- 27.
Wagner, T., Guha, S., Kasiviswanathan, S.P., Mishra, N.: Semi-supervised learning on data streams via temporal label propagation. In: International Conference on Machine Learning, pp 5082–5091 (2018)
- 28.
Yakowitz, S.: Nearest-neighbour methods for time series analysis. J. time Series Anal. 8(2), 235–247 (1987)
- 29.
Zhang, J., Zheng, Y., Qi, D.: Deep spatio-temporal residual networks for citywide crowd flows prediction. In: AAAI, pp 1655–1661 (2017)
Acknowledgements
Work of Authors 1 and 2 was partially supported by the German Research Foundation (DFG) within the DFG-project OSCAR Opinion Stream Classification with Ensembles and Active Learners. The last two authors are the project’s principal investigators.
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We use the terms observations and instances interchangeably, since we speak of a time series of property values that exist at all times but are observed at different points in time.
Rights and permissions
About this article
Cite this article
Unnikrishnan, V., Beyer, C., Matuszyk, P. et al. Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity. Int J Data Sci Anal 9, 1–15 (2020). https://doi.org/10.1007/s41060-019-00177-1
Received:
Accepted:
Published:
Issue Date:
Keywords
- Stream classification
- kNN
- Entity similarity