Abstract
This research addresses and resolves the issues with the confidence level of sampled big streaming data that is dynamic with respect to the speed of the streaming data and the dynamically changing sample space. Based on a preliminary work and results from [8], this research focuses more on the confidence level and threshold of dynamic size of the population in order to ensure a better confidence level of the sampled data with respect to a few variables such as speed of the streaming data, population size dynamic over time, sample space (or size), speed of sampling algorithm, size of streaming data, and time duration of data streaming. Theoretical thresholds of the processing of big streaming data with respect to a set of variables as mentioned above are identified in an effort for optimization. Simulation results along with experimental results are provided to validate the efficacy of the proposed theoretical thresholds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tang, F., Li, L., Barolli, L., Tang, C.: An efficient sampling and classification approach for flow detection in SDN-based big data centers. Journal 2(5), 99–110 (2016)
Gadepally, V., Herr, T., Johnson, L., Milechin, L., Milosavljevic, M., Miller, BA.: Sampling operations on big data. In: 2015 49th Asilomar Conference on Signals, Systems and Computers, 8 November 2015
Xu, K., Wang, F., Jia, X., Wang, H.: The impact of sampling on big data analysis of social media: a case study on Flu and Ebola. In: 2015 49th Asilomar Conference on Signals, Systems and Computers, 6 December 2015
Johnson, T., Muralikrishnan, S., Rozenbaum, I.: Sampling algorithms in a stream operator. In: SIGMOD Conference (2005)
Zafar, M.B., Bhattacharya, P., Ganguly, N., Gummadi, K.P., Ghosh, S.: Sampling content from online social networks: comparing random vs. xpert sampling of the twitter stream. ACM Trans. Web 9(3), 12 (2015)
Teddlie, C., Yu, F.: Mixed methods sampling: a topology with examples. J. Mixed Methods Res. 1(1), 77–100 (2007)
Park, B.H., Ostrouchov, G., Samatova, N.F., Geist, A.: Reservoir based random sampling with replacement from data stream. In: Proceedings of 2004 SIAM International Conference on Data Mining (2015)
Kancharla, A., Kim, J., Park, N.-J., Park, N.: Big streaming data buffering optimization. In: International Conference on Computational Science/Intelligence/Applied Informatics (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kancharala, A., Park, N., Kim, J., Park, N. (2018). Big Streaming Data Sampling and Optimization. In: Kim, K., Kim, H., Baek, N. (eds) IT Convergence and Security 2017. Lecture Notes in Electrical Engineering, vol 449. Springer, Singapore. https://doi.org/10.1007/978-981-10-6451-7_27
Download citation
DOI: https://doi.org/10.1007/978-981-10-6451-7_27
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6450-0
Online ISBN: 978-981-10-6451-7
eBook Packages: EngineeringEngineering (R0)