Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity

Abstract

Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

References

  1. 1.

    Al-qahtani, F.H.: Multivariate k-Nearest Neighbour Regression for Time Series data—a novel Algorithm for Forecasting UK Electricity Demand Multivariate KNN Regression for Time Series. Neural Networks (IJCNN), The 2013 International Joint Conference on pp 228–235 (2013)

  2. 2.

    Ban, T., Zhang, R., Pang, S., Sarrafzadeh, A., Inoue, D.: Referential kNN regression for financial time series forecasting. In: Lee, M., Hirose, A., Hou, Z.G., Kil, R.M. (eds.) Neural Information Processing, pp. 601–608. Springer, Heidelberg (2013)

    Google Scholar 

  3. 3.

    Beyer, C., Niemann, U., Unnikrishnan, V., Ntoutsi, E., Spiliopoulou, M.: Predicting document polarities on a stream without reading their contents. In: Proceedings of the Symposium on Applied Computing (SAC) (2018)

  4. 4.

    Che, Z., Purushotham, S., Cho, K., Sontag, D., Liu, Y.: Recurrent neural networks for multivariate time series with missing values. Sci. Reports 8(1), 6085 (2018)

    Article  Google Scholar 

  5. 5.

    Clarke, R.N.: SICs as Delineators of Economic Markets. J. Bus. 62(1), 17–31, (1989) https://ideas.repec.org/a/ucp/jnlbus/v62y1989i1p17-31.html. Accessed 1 Feb 2018

    Article  Google Scholar 

  6. 6.

    Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in nonstationary environments: a survey. IEEE Comput. Intell. Mag. 10(4), 12–25 (2015)

    Article  Google Scholar 

  7. 7.

    Dyer, K.B., Capo, R., Polikar, R.: Compose: A semisupervised learning framework for initially labeled nonstationary streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 12–26 (2014)

    Article  Google Scholar 

  8. 8.

    Hartmann, C., Ressel, F., Hahmann, M., Habich, D., Lehner, W.: Csar: the cross-sectional autoregression model for short and long-range forecasting. Int. J. Data Sci. Anal. (2019). https://doi.org/10.1007/s41060-018-00169-7

    Article  Google Scholar 

  9. 9.

    Hiller, W., Goebel, G.: When tinnitus loudness and annoyance are discrepant: audiological characteristics and psychological profile. Audiol. Neurotol. 12(6), 391–400 (2007)

    Article  Google Scholar 

  10. 10.

    Iosifidis, V., Ntoutsi, E.: Large scale sentiment learning with limited labels. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 1823–1832 (2017)

  11. 11.

    Keogh, E.J., Pazzani, M.J.: Scaling up dynamic time warping for datamining applications. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 285–289 (2000)

  12. 12.

    Kia, A.N., Haratizadeh, S., Shouraki, S.B.: A hybrid supervised semi-supervised graph-based model to predict one-day ahead movement of global stock markets and commodity prices. Expert Syst. Appl. 105, 159–173 (2018)

    Article  Google Scholar 

  13. 13.

    Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017)

    Article  Google Scholar 

  14. 14.

    Krempl, G., Žliobaite, I., Brzeziński, D., Hüllermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., et al.: Open challenges for data stream mining research. ACM SIGKDD Explorations Newslett. 16(1), 1–10 (2014)

    Article  Google Scholar 

  15. 15.

    Längkvist, M., Karlsson, L., Loutfi, A.: A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recogn. Lett. 42, 11–24 (2014)

    Article  Google Scholar 

  16. 16.

    Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp 1188–1196 (2014)

  17. 17.

    Lora, A.T., Santos, J.R., Santos, J.R., Ramos, J.L.M., Expósito, A.G.: Electricity market price forecasting: Neural networks versus weighted-distance k nearest neighbours. In: International Conference on Database and Expert Systems Applications, Springer, pp 321–330 (2002)

  18. 18.

    Lora, A.T., Santos, J.M.R., Riquelme, J.C., Expósito, A.G., Ramos, J.L.M.: Time-series prediction: Application to the short-term electric energy demand. Current Topics in Artificial Intelligence pp 577–586 (2004)

  19. 19.

    Lora, A.T., Santos, J.M.R., Exposito, A.G., Ramos, J.L.M., Santos, J.C.R.: Electricity market price forecasting based on weighted nearest neighbors techniques. IEEE Trans. Power Syst. 22(3), 1294–1301 (2007)

    Article  Google Scholar 

  20. 20.

    McAuley, J., Yang, A.: Addressing complex and subjective product-related queries with customer reviews. In: Proceedings of the 25th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp 625–635 (2016)

  21. 21.

    Polson, N.G., Sokolov, V.O.: Deep learning for short-term traffic flow prediction. Transportation Research Part C: Emerging Technologies 79, (2017)

    Article  Google Scholar 

  22. 22.

    Pryss, R., Probst, T., Schlee, W., Schobel, J., Langguth, B., Neff, P., Spiliopoulou, M., Reichert, M.: Prospective crowdsensing versus retrospective ratings of tinnitus variability and tinnitus-stress associations based on the trackyourtinnitus mobile platform. Int. J. Data Sci. Anal. (2018). https://doi.org/10.1007/s41060-018-0111-4

    Article  Google Scholar 

  23. 23.

    Řehůřek, R., Sojka, P.: Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, Valletta, Malta, pp 45–50 (2010)

  24. 24.

    Serrao, E., Spiliopoulou, M.: Active stream learning with an oracle of unknown availability for sentiment prediction. In: 2nd Int. Workshop on Interactive Adaptive Learning (IAL2018) at ECML PKDD 2018, Dublin, Ireland, accepted in July 2018, to appear (2018)

  25. 25.

    Suresh, H., Hunt, N., Johnson, A., Celi, L.A., Szolovits, P., Ghassemi, M.: Clinical intervention prediction and understanding with deep neural networks. In: Machine Learning for Healthcare Conference, pp 322–337 (2017)

  26. 26.

    Troncoso Lora, A., Riquelme, J.C., Martínez Ramos, J.L., Riquelme Santos, J.M., Gómez Expósito, A.: Influence of kNN-based load forecasting errors on optimal energy production. In: Pires, F.M., Abreu, S. (eds.) Progress in Artificial Intelligence, pp. 189–203. Springer, Heidelberg (2003)

    Google Scholar 

  27. 27.

    Wagner, T., Guha, S., Kasiviswanathan, S.P., Mishra, N.: Semi-supervised learning on data streams via temporal label propagation. In: International Conference on Machine Learning, pp 5082–5091 (2018)

  28. 28.

    Yakowitz, S.: Nearest-neighbour methods for time series analysis. J. time Series Anal. 8(2), 235–247 (1987)

    MathSciNet  Article  Google Scholar 

  29. 29.

    Zhang, J., Zheng, Y., Qi, D.: Deep spatio-temporal residual networks for citywide crowd flows prediction. In: AAAI, pp 1655–1661 (2017)

Download references

Acknowledgements

Work of Authors 1 and 2 was partially supported by the German Research Foundation (DFG) within the DFG-project OSCAR Opinion Stream Classification with Ensembles and Active Learners. The last two authors are the project’s principal investigators.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Vishnu Unnikrishnan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We use the terms observations and instances interchangeably, since we speak of a time series of property values that exist at all times but are observed at different points in time.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Unnikrishnan, V., Beyer, C., Matuszyk, P. et al. Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity. Int J Data Sci Anal 9, 1–15 (2020). https://doi.org/10.1007/s41060-019-00177-1

Download citation

Keywords

  • Stream classification
  • kNN
  • Entity similarity