Advertisement

Ensemble Clustering for Novelty Detection in Data Streams

  • Kemilly Dearo GarciaEmail author
  • Elaine Ribeiro de Faria
  • Cláudio Rebelo de Sá
  • João Mendes-Moreira
  • Charu C. Aggarwal
  • André C. P. L. F. de Carvalho
  • Joost N. Kok
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11828)

Abstract

In data streams new classes can appear over time due to changes in the data statistical distribution. Consequently, models can become outdated, which requires the use of incremental learning algorithms capable of detecting and learning the changes over time. However, when a single classification model is used for novelty detection, there is a risk that its bias may not be suitable for new data distributions. A solution could be the combination of several models into an ensemble. Besides, because models can only be updated when labeled data arrives, we propose two unsupervised ensemble approaches: one combining clustering partitions using the same clustering technique; and other using different clustering techniques. We compare the performance of the proposed methods with well known novelty detection algorithms. The methods were tested on datasets commonly used in the novelty detection literature. The experimental results show that proposed ensembles have competitive performance for novelty detection in data streams.

Keywords

Novelty detection Ensembles Clustering Data streams 

References

  1. 1.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB 2003, Proceedings of 29th International Conference on Very Large Data Bases, 9–12 September 2003, Berlin, Germany, pp. 81–92 (2003)Google Scholar
  2. 2.
    Al-Khateeb, T., Masud, M.M., Khan, L., Aggarwal, C.C., Han, J., Thuraisingham, B.M.: Stream classification with recurring and novel class detection using class-based ensemble. In: 12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium, 10–13 December 2012, pp. 31–40 (2012)Google Scholar
  3. 3.
    Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the Sixth SIAM International Conference on Data Mining, 20–22 April 2006, Bethesda, MD, USA, pp. 328–339 (2006)Google Scholar
  4. 4.
    Faria, E.R., Gama, J., Carvalho, A.C.P.L.F.: Novelty detection algorithm for data streams multi-class problems. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC 2013, Coimbra, Portugal, 18–22 March 2013, pp. 795–800 (2013)Google Scholar
  5. 5.
    Faria, E.R., Gonçalves, I.J.C.R., de Carvalho, A.C.P.L.F., Gama, J.: Novelty detection in data streams. Artif. Intell. Rev. 45(2), 235–269 (2016)CrossRefGoogle Scholar
  6. 6.
    Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)CrossRefGoogle Scholar
  7. 7.
    Garcia, K.D., de Carvalho, A.C.P.L.F., Mendes-Moreira, J.: A cluster-based prototype reduction for online classification. In: Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A.J. (eds.) IDEAL 2018. LNCS, vol. 11314, pp. 603–610. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-03493-1_63CrossRefGoogle Scholar
  8. 8.
    Haque, A., Khan, L., Baron, M.: Semi supervised adaptive framework for classifying evolving data stream. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9078, pp. 383–394. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-18032-8_30CrossRefGoogle Scholar
  9. 9.
    Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)CrossRefGoogle Scholar
  10. 10.
    Masud, M.M., et al.: Classification and adaptive novel class detection of feature-evolving data streams. IEEE Trans. Knowl. Data Eng. 25(7), 1484–1497 (2013)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans. Knowl. Data Eng. 23(6), 859–874 (2011)CrossRefGoogle Scholar
  12. 12.
    Spinosa, E.J., de Leon Ferreira de Carvalho, A.C.P., Gama, J.: Novelty detection with application to data streams. Intell. Data Anal. 13(3), 405–422 (2009)CrossRefGoogle Scholar
  13. 13.
    Vega-Pons, S., Ruiz-Shulcloper, J.: A survey of clustering ensemble algorithms. IJPRAI 25(3), 337–372 (2011)MathSciNetGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Kemilly Dearo Garcia
    • 1
    • 2
    Email author
  • Elaine Ribeiro de Faria
    • 3
  • Cláudio Rebelo de Sá
    • 1
  • João Mendes-Moreira
    • 4
  • Charu C. Aggarwal
    • 5
  • André C. P. L. F. de Carvalho
    • 2
  • Joost N. Kok
    • 1
  1. 1.University of TwenteEnschedeThe Netherlands
  2. 2.University of São PauloSão PauloBrazil
  3. 3.Fed. University of UberlandiaUberlandiaBrazil
  4. 4.LIAAD-INESC TEC, Faculty of EngineeringUniversity of PortoPortoPortugal
  5. 5.IBM T.J. Watson Research CenterYorktownUSA

Personalised recommendations