Skip to main content

Tapped Delay Lines for GP Streaming Data Classification with Label Budgets

  • Conference paper
  • First Online:
Genetic Programming (EuroGP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9025))

Included in the following conference series:

Abstract

Streaming data classification requires that a model be available for classifying stream content while simultaneously detecting and reacting to changes to the underlying process generating the data. Given that only a fraction of the stream is ‘visible’ at any point in time (i.e. some form of window interface) then it is difficult to place any guarantee on a classifier encountering a ‘well mixed’ distribution of classes across the stream. Moreover, streaming data classifiers are also required to operate under a limited label budget (labelling all the data is too expensive). We take these requirements to motivate the use of an active learning strategy for decoupling genetic programming training epochs from stream throughput. The content of a data subset is controlled by a combination of Pareto archiving and stochastic sampling. In addition, a significant benefit is attributed to support for a tapped delay line (TDL) interface to the stream, but this also increases the dimensionality of the task. We demonstrate that the benefits of assuming the TDL can be maintained through the use of oversampling without recourse to additional label information. Benchmarking on 4 dataset demonstrates that the approach is particularly effective when reacting to shifts in the underlying properties of the stream. Moreover, an online formulation for class-wise detection rate is assumed, where this is able to robustly characterize classifier performance throughout the stream.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Note that the label information is limited to that of \(\mathbf {x}(t)\) alone.

  2. 2.

    The only source of labelled data.

  3. 3.

    http://web.cs.dal.ca/~mheywood/Code/SBB/Stream/StreamData.html.

  4. 4.

    MOA prerelease 2014.03; http://moa.cms.waikato.ac.nz/overview/.

  5. 5.

    Also referred to as the Wilcoxon rank-sum test or Wilcoxon-Mann-Whitney test.

References

  1. Atwater, A., Heywood, M.I.: Benchmarking Pareto archiving heuristics in the presence of concept drift: diversity versus age. In: ACM Genetic and Evolutionary Computation Conference, pp. 885–892 (2013)

    Google Scholar 

  2. Atwater, A., Heywood, M.I., Zincir-Heywood, A.N.: GP under streaming data constraints: a case for Pareto archiving? In: ACM Genetic and Evolutionary Computation Conference, pp. 703–710 (2012)

    Google Scholar 

  3. Behdad, M., French, T.: Online learning classifiers in dynamic environments with incomplete feedback. In: IEEE Congress on Evolutionary Computation, pp. 1786–1793 (2013)

    Google Scholar 

  4. Bifet, A., Read, J., Žliobaitė, I., Pfahringer, B., Holmes, G.: Pitfalls in benchmarking data stream classification and how to avoid them. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part I. LNCS, vol. 8188, pp. 465–479. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  5. Cervantes, A., Isasi, P., Gagné, C., Parizeau, M.: Learning from non-stationary data using a growing network of prototypes. In: IEEE Congress on Evolutionary Computation, pp. 2634–2641 (2013)

    Google Scholar 

  6. Dempsey, I., O’Neill, M., Brabazon, A.: Foundations in Grammatical Evolution for Dynamic Environments. SCI, vol. 194. Springer, Heidelberg (2009)

    Google Scholar 

  7. Dempsey, I., O’Neill, M., Brabazon, A.: Survey of EC in dynamic environments (chap. 3). In: [6], pp. 25–54. Springer, Heidelberg (2009)

    Google Scholar 

  8. Doucette, J.A., McIntyre, A.R., Lichodzijewski, P., Heywood, M.I.: Symbiotic coevolutionary genetic programming: a benchmarking study under large attribute spaces. Genet. Program. Evolvable Mach. 13(1), 71–101 (2012)

    Article  Google Scholar 

  9. Fan, W., Huang, Y., Wang, H., Yu, P.S.: Active mining of data streams. In: Proceedings of SIAM International Conference on Data Mining, pp. 457–461 (2004)

    Google Scholar 

  10. Folino, G., Papuzzo, G.: Handling different categories of concept drifts in data streams using distributed GP. In: Esparcia-Alcázar, A.I., Ekárt, A., Silva, S., Dignum, S., Uyar, A.Ş. (eds.) EuroGP 2010. LNCS, vol. 6021, pp. 74–85. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  11. Gama, J.: Knowledge Discovery from Data Streams. CRC Press, Boca Raton (2010)

    Book  MATH  Google Scholar 

  12. Gama, J.: A survey on learning from data streams: current and future trends. Prog. Artif. Intell. 1(1), 45–55 (2012)

    Article  Google Scholar 

  13. Gama, J., Sebastião, R., Rodrigues, P.: On evaluating stream learning algorithms. Mach. Learn. 90(3), 317–346 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  14. Harries, M.: Splice-2 comparative evaluation: electricity pricing. Technical report, University of New South Wales (1999)

    Google Scholar 

  15. Heywood, M.I.: Evolutionary model building under streaming data for classification tasks: opportunities and challenges. Genet. Program. Evolvable Mach. (2015). doi:10.1007/s10710-014-9236-y

  16. Lindstrom, P., MacNamee, B., Delany, S.J.: Drift detection using uncertainty distribution divergence. Evol. Intel. 4(1), 13–25 (2013)

    Google Scholar 

  17. Polikar, R., Alippi, C.: Guest editorial: learning in non-stationary and evolving environments. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 1–3 (2014)

    Article  Google Scholar 

  18. Z̆liobaitė, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 27–54 (2014)

    Article  Google Scholar 

  19. Z̆liobaitė, I., Gabrys, B.: Adaptive preprocessing for streaming data. IEEE Trans. Knowl. Data Eng. 26(2), 309–321 (2014)

    Article  Google Scholar 

  20. Vahdat, A., Atwater, A., McIntyre, A.R., Heywood, M.I.: On the application of GP to streaming data classification tasks with label budgets. In: ACM Genetic and Evolutionary Computation Conference: ECBDL Workshop, pp. 1287–1294 (2014)

    Google Scholar 

  21. Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active learning from stream data using optimal weight classifier ensemble. IEEE Trans. Syst. Man Cybern. Part B 40(6), 1607–1621 (2010)

    Article  Google Scholar 

Download references

Acknowledgments

The authors gratefully acknowledge support from NSERC Discovery and CRD programs (Canada) and RUAG Schweiz AG (Switzerland) while conducting this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Malcolm I. Heywood .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Vahdat, A., Morgan, J., McIntyre, A.R., Heywood, M.I., Zincir-Heywood, A.N. (2015). Tapped Delay Lines for GP Streaming Data Classification with Label Budgets. In: Machado, P., et al. Genetic Programming. EuroGP 2015. Lecture Notes in Computer Science(), vol 9025. Springer, Cham. https://doi.org/10.1007/978-3-319-16501-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16501-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16500-4

  • Online ISBN: 978-3-319-16501-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics