Skip to main content

Optimizing Stream Data Classification Using Improved Hoeffding Bound

  • Conference paper
  • First Online:
Advances in Communication and Computational Technology (ICACCT 2019)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 668))

  • 1772 Accesses

Abstract

Classification of online stream data is must for network analysis and providing Quality of Service (QoS). Stream data has properties which requires the algorithm to be incremental and should handle concept drift. Traffic classification is the prominent solution to handle bulk data streams to provide services like packet filtering, routing policies, traffic shaping, limiting traffic, etc. Many stream-based classification algorithms exists in literature to meet the requirements like scanning the data only once, any time analysis and fast response, and limited memory utilization. Further, more accurate, fast, and limited memory supporting algorithms and concepts are required to handle everyday increasing data over Internet. This research work proposes an improvement in accuracy of the classification performed using lesser number of training instances to decide a split during induction of the decision tree (Hoeffding tree). Jensens’s inequality concept is used, and the Hoeffding bound reduces to minimize the bound for the bad events (i.e., it limits the margin of error of the algorithm). Number of examples reduced results in fast execution and decrease the memory used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning, ICML ’06. ACM, New York, NY, USA, pp 161–168

    Google Scholar 

  2. Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’00. ACM, New York, NY, USA, pp 71–80

    Google Scholar 

  3. From SG, Swift Andrew W (2013) A refinement of Heffding’s inequality. J Stat Comput Simul 83(5):977–983

    Article  MATH  Google Scholar 

  4. Gama João (2012) A survey on learning from data streams: current and future trends. Prog Artif Intell 1(1):45–55

    Article  Google Scholar 

  5. Gehrke J, Ganti V, Ramakrishnan R, Loh W-Y (1999) Boat—optimistic decision tree construction. SIGMOD Rec 28(2):169–180

    Google Scholar 

  6. Gehrke J, Ramakrishnan R, Ganti V (2000) Rainforest—a framework for fast decision tree construction of large datasets. Data Min Knowl Discov 4(2):127–162

    Article  Google Scholar 

  7. Hoeffding Wassily (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30

    Article  MathSciNet  MATH  Google Scholar 

  8. Kalles D, Morris Tim (1996) Efficient incremental induction of decision trees. Mach Learn 24(3):231–242

    Article  Google Scholar 

  9. Mehta M, Agrawal R, Rissanen J (1996) Sliq: a fast scalable classifier for data mining. In: Apers P, Bouzeghoub M, Gardarin G (eds) Advances in database technology—EDBT ’96. Springer, Berlin, Heidelberg, pp 18–32

    Google Scholar 

  10. Toshniwal D (2013) Clustering techniques for streaming data—a survey. In: 2013 3rd IEEE international advance computing conference (IACC), pp 951–956

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arvind Pillania .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pillania, A., Singh, P., Gupta, V. (2021). Optimizing Stream Data Classification Using Improved Hoeffding Bound. In: Hura, G.S., Singh, A.K., Siong Hoe, L. (eds) Advances in Communication and Computational Technology. ICACCT 2019. Lecture Notes in Electrical Engineering, vol 668. Springer, Singapore. https://doi.org/10.1007/978-981-15-5341-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-5341-7_19

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-5340-0

  • Online ISBN: 978-981-15-5341-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics