Skip to main content
Log in

A Bottom-Up Tree Based Storage Approach for Efficient IoT Data Analytics in Cloud Systems

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Internet of Things (IoT) has been widely applied in various domains, e.g. environmental monitoring, intelligent transport system, video surveillance, etc. In most of the IoT applications, the IoT data is generated from a number of data sources, not just only one source. In addition, IoT data has various types with different processing requirements. The high-priority IoT data should have better storage and processing manners than the low-priority IoT data. The objective of this paper is to propose an efficient cloud storage approach for considering the multi-aspect requirements of IoT data. In the approach, a light-weight data structure is used to depict the distribution and calculate the size of each IoT subset (type) in all data sources. Then, we form a number of storage-locality groups from cloud storage blocks. However, the storage-locality groups have different storage sizes and locality capabilities. We would like to place the high-priority IoT subset in the storage-locality group with a strong locality capability. Therefore, there is the placement-combinational problem between IoT subsets and the storage-locality groups. To efficiently solve the IoT placement problem, we propose a bottom-up tree based approach associated with the solution of the well-known combinatorial problem: knapsack. Considering the knapsack problem with the NP-hard computational complexity, we also propose a heuristic placement approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availability

The datasets generated during and analysed during the current study are available from the corresponding authors.

References

  1. Stankovic, J.A.: Research directions for the internet of things. IEEE Internet of Things Journal 1(1), 3–9 (2014)

    Article  Google Scholar 

  2. Cai, H., Xu, B., Jiang, L., Vasilakos, A.V.: IoT-based big data storage systems in cloud computing: perspectives and challenges. IEEE Internet of Things Journal 4(1), 75–87 (2017)

    Article  Google Scholar 

  3. Mallapuram, S., Ngwum, N., Yuan, F., Lu, C., Yu, W.: Smart City: the state of the art, datasets, and evaluation platforms. In: Proc. 16th IEEE/ACIS, Int. Conf. Comput. Inf. Sci. (ICIS), pp 447–452 (2017)

  4. Mallapuram, S., Ngwum, N., Yuan, F., Lu, C., Yu, W.: City environmental monitoring [Online]. Available: https://aqicn.org/city/china/dalizhou/dalishihuanjingjia-ncezhan/ (2020)

  5. Yu, J., Fu, B., Cao, A., He, Z., Wu, D.: EdgeCNN: a hybrid architecture for agile learning of healthcare data from IoT devices. In: 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS) (2018)

  6. Pan, J., Yin, Y., Xiong, J., Luo, W., Gui, G., Sari, H.: Deep learning-based unmanned surveillance systems for observing water levels. IEEE Access 6, 73561–73571 (2018)

    Article  Google Scholar 

  7. Marjani, M., et al.: Big IoT data analytics: architecture, opportunities, and open research challenges. IEEE Access 5, 5247–5261 (2017)

    Article  Google Scholar 

  8. Ahlgren, B., Hidell, M., Ngai, E.C.-: Internet of things for smart cities: interoperability and open data. IEEE Internet Computing 20(6), 52–56 (2016)

    Article  Google Scholar 

  9. Wang, J., Zhang, X., Yin, J., Wu, H., Han, D.: Speed up big data analytics by unveiling the storage distribution of sub-datasets. IEEE Transactions on Big Data 5(2), 231–244 (2018)

    Article  Google Scholar 

  10. Viles, C.L., French, J.C.: Content locality in distributed digital libraries. Inf. Process. Manage 35(3), 317–336 (1999)

    Article  Google Scholar 

  11. Viles, C.L., French, J.C.: Open source log collection system. [Online]. Available: https://flume.apache.org/ (2020)

  12. Chen, Q., Yao, J., Xiao, Z.: LIBRA: lightweight data skew mitigation in MapReduce. IEEE Transactions on Parallel and Distributed Systems 26(9), 2520–2533 (2015)

    Article  Google Scholar 

  13. Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: Skewtune: mitigating skew in MapReduce applications. In: Proc. ACM SIGMOD Int. Conf. Manage. Data, pp 25–36 (2012)

  14. Grover, R., Carey, M.J.: Extending map-reduce for efficient predicate-based sampling. In: Proc. IEEE 28th Int. Conf. Data Eng., pp 486–497 (2012)

  15. Chen, Z., Wu, D., Xie, W., Zeng, J., He, J., Wu, D.: A bloom filter based approach for efficient MapReduce query processing on ordered datasets. In: Proc. Int. Conf. Advanced Cloud Big Data, pp 93–98 (2013)

  16. Chen, Z., Wu, D., Xie, W., Zeng, J., He, J., Wu, D.: Apache Hadoop Project. [Online]. Available: http://hadoop.apache.org/ (2020)

  17. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proc. IEEE 26th Symp. Mass Storage Systems and Technologies (MSST), pp 1–10 (2010)

  18. White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Yahoo! Press (2009)

  19. Dasgupta, S., Papadimitriou, C.H., Vazirani, U.V.: Algorithms. McGraw-Hill (2008)

  20. Kellerer, H., Pferschy, U., Pisinger, D.: Knapsack Problems. Springer, Berlin (2004)

    Book  Google Scholar 

  21. Schrijver, A.: Theory of Linear and Integer Programming. Wiley, New York (1998)

    MATH  Google Scholar 

  22. Mehlhorn, K., Sanders, P.: Algorithms and Data Structures: the Basic Toolbox. Springer, Berlin (2007)

    MATH  Google Scholar 

  23. IEEE Standard for Local and Metropolitan Area Networks: Media AccessControl (MAC) Bridges, IEEE 802.1D Std. (2004)

  24. Lin, J.W., Chen, C.H., Chang, J.: Qos-aware data replication for data-intensive applications in cloud computing systems. IEEE Trans. on Cloud Computing 1(1), 101–115 (2013)

    Article  Google Scholar 

  25. Kumar, A., Rendra, N.C., Bellur, U.: Uploading and replicating internet of things (IoT) data on distributed cloud storage. In: 2016 IEEE 9th International Conference on Cloud Computing, vol. 4, pp 670–677 (2016)

  26. Bryk, P., Malawski, M., Juve, G., Deelman, E.: Storage-aware algorithms for scheduling of workflow ensembles in clouds. Journal of Grid Computing 14, 359–378 (2015)

    Article  Google Scholar 

  27. Hsieh, H.C., Chiang, M.L.: The incremental load balance cloud algorithm by using dynamic data deployment. Journal of Grid Computing 17, 553–575 (Mar. 2019)

    Article  Google Scholar 

  28. Yin, J., Liao, Y., Baldi, M., Gao, L., Nucci, A.: A scalable distributed framework for efficient analytics on ordered datasets. In: 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing, pp 131–138 (2013)

  29. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. ACM Commun. 51(1), 107–113 (2008)

    Article  Google Scholar 

  30. Sonbol, K., Özkasap, Ö., Al-oqily, I., Aloqaily, M.: EdgeKV: decentralized, scalable, and consistent storage for the edge. Journal of Parallel and Distributed Computing (2020)

  31. Kotb, Y., Ridhawi, I.A., Aloqaily, M., Baker, T., Jararweh, Y., Tawfik, H.: Cloud-based multi-agent cooperation for IoT devices using workflow-nets. J. Grid Comput. 17(4), 625–650 (2019)

    Article  Google Scholar 

  32. Li, T., Liu, Y., Tian, Y., Shen, S., Mao, W.: A storage solution for massive IoT data based on NoSQL. In: IEEE International Conference on Green Computing and Communications, pp 50–57 (2012)

  33. Wu, J.J., Ho, L.Y., Liu, P.: 2011 Optimal algorithms for cross-rack communication optimization in mapreduce framework. In: IEEE 4th International Conference on Cloud Computing, pp 420–427 (2011)

  34. Wu, J.J., Ho, L.Y., Liu, P.: Lindo Software. [Online]. Available: https://www.lindo.com/ (2020)

  35. Wu, J.J., Ho, L.Y., Liu, P.: NS3. [Online]. Available: https://www.nsnam.org/ (2020)

  36. Kumar, A.R.A., Rao, S.V., Goswami, D.: NS3 simulator for a study of data center networks. In: 2013 IEEE 12th International Symposium on Parallel and Distributed Computing, pp 224–231 (2013)

  37. Shukla, S.N., Champaneria, T.A.: Survey of various data collection ways for smart transportation domain of smart city. In: Proc. Int. Conf. IoT Soc. Mobile Anal. Cloud (I-SMAC), pp 681–685 (2017)

  38. Shukla, S.N., Champaneria, T.A.: Bevywise. [Online]. Available: https://www.bevywise.com/iot-simulator/ (2020)

Download references

Acknowledgments

This research was supported by the Ministry of Science and Technology, Taiwan, R.O.C, under Grant MOST 109-2221-E-030-015-.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jenn-Wei Lin.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, JW., Arul, J.M. & Kao, JT. A Bottom-Up Tree Based Storage Approach for Efficient IoT Data Analytics in Cloud Systems. J Grid Computing 19, 10 (2021). https://doi.org/10.1007/s10723-021-09553-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10723-021-09553-3

Keywords

Navigation