Abstract
Classification of online stream data is must for network analysis and providing Quality of Service (QoS). Stream data has properties which requires the algorithm to be incremental and should handle concept drift. Traffic classification is the prominent solution to handle bulk data streams to provide services like packet filtering, routing policies, traffic shaping, limiting traffic, etc. Many stream-based classification algorithms exists in literature to meet the requirements like scanning the data only once, any time analysis and fast response, and limited memory utilization. Further, more accurate, fast, and limited memory supporting algorithms and concepts are required to handle everyday increasing data over Internet. This research work proposes an improvement in accuracy of the classification performed using lesser number of training instances to decide a split during induction of the decision tree (Hoeffding tree). Jensens’s inequality concept is used, and the Hoeffding bound reduces to minimize the bound for the bad events (i.e., it limits the margin of error of the algorithm). Number of examples reduced results in fast execution and decrease the memory used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning, ICML ’06. ACM, New York, NY, USA, pp 161–168
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’00. ACM, New York, NY, USA, pp 71–80
From SG, Swift Andrew W (2013) A refinement of Heffding’s inequality. J Stat Comput Simul 83(5):977–983
Gama João (2012) A survey on learning from data streams: current and future trends. Prog Artif Intell 1(1):45–55
Gehrke J, Ganti V, Ramakrishnan R, Loh W-Y (1999) Boat—optimistic decision tree construction. SIGMOD Rec 28(2):169–180
Gehrke J, Ramakrishnan R, Ganti V (2000) Rainforest—a framework for fast decision tree construction of large datasets. Data Min Knowl Discov 4(2):127–162
Hoeffding Wassily (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30
Kalles D, Morris Tim (1996) Efficient incremental induction of decision trees. Mach Learn 24(3):231–242
Mehta M, Agrawal R, Rissanen J (1996) Sliq: a fast scalable classifier for data mining. In: Apers P, Bouzeghoub M, Gardarin G (eds) Advances in database technology—EDBT ’96. Springer, Berlin, Heidelberg, pp 18–32
Toshniwal D (2013) Clustering techniques for streaming data—a survey. In: 2013 3rd IEEE international advance computing conference (IACC), pp 951–956
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Pillania, A., Singh, P., Gupta, V. (2021). Optimizing Stream Data Classification Using Improved Hoeffding Bound. In: Hura, G.S., Singh, A.K., Siong Hoe, L. (eds) Advances in Communication and Computational Technology. ICACCT 2019. Lecture Notes in Electrical Engineering, vol 668. Springer, Singapore. https://doi.org/10.1007/978-981-15-5341-7_19
Download citation
DOI: https://doi.org/10.1007/978-981-15-5341-7_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5340-0
Online ISBN: 978-981-15-5341-7
eBook Packages: EngineeringEngineering (R0)