The Journal of Supercomputing

, Volume 75, Issue 2, pp 662–687 | Cite as

Aggregating correlated cold data to minimize the performance degradation and power consumption of cold storage nodes

  • Cheng Hu
  • Yuhui DengEmail author


Under the circumstance of big data, traditional storage systems face the big challenge of energy consumption. Switching some storage nodes, which do not experience workloads, to a low-power state is a typical approach to reduce the consumption of energy. This method divides the storage nodes into an active group and a low-power one. That is, the frequently accessed data are stored into the active group which maintains the nodes in an active state to offer service, and the cold data accessed infrequently are stored into the low-power group. The storage nodes in this low-power group are normally called cold nodes, because they can be switched to a low-power state to save energy for a certain amount of time. In cold nodes, one fact, which is often neglected, is that the placement of cold data has a significant impact on the system performance and power consumption. To some extent, switching a storage node from a low-power state to an active state incurs a crucial delay and energy consumption. This paper proposes to aggregate and store the correlated cold data in the same cold node within the low-power group. Now that the correlated data are normally accessed together, our approach can greatly reduce the number of power state transitions and lengthen the idle periods that the cold nodes experience. On the other hand, it can also minimize the performance degradation and power consumption. Experimental results demonstrate that this method effectively reduces the energy consumption while maintaining system performance at an acceptable level in contrast to some state-of-the-art methods.


Big data Clustered storage system Power state switching Energy-aware Data placement Data correlation 



This work is supported by the National Natural Science Foundation (NSF) of China under Grant (No. 61572232), in part by the Science and Technology Planning Project of Guangzhou under Grant 201604016100, in part by the Science and Technology Planning Project of Nansha (2016CX007), in part by the Open Research Fund of Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences (CARCH201705). The corresponding author is Yuhui Deng from Jinan University.


  1. 1.
    Hu C, Yuhui D (2015) An energy-aware file relocation strategy based on file-access frequency and correlations. In: Proceedings of the 15th International Conference on Algorithms and Architectures for Parallel Processing, Springer, pp 640–653Google Scholar
  2. 2.
    Scardapane S, Wang D, Panella M (2016) A decentralized training algorithm for echo state networks in distributed big data applications. Neural Netw 78:65–74CrossRefGoogle Scholar
  3. 3.
    Brown R (2008) Report to congress on server and data center energy efficiency: public law 109-431, Lawrence Berkeley National LaboratoryGoogle Scholar
  4. 4.
    Wan J, Qu X, Wang J, Xie C (2015) ThinRAID: thinning down RAID array for energy conservation. IEEE Trans Parallel Distrib Syst 26(10):2903–2915CrossRefGoogle Scholar
  5. 5.
    Pinheiro E, Bianchini R, Carrera EV, Heath T (2003) Dynamic cluster reconfiguration for power and performance. In: Benini L, Kandemir M, Ramanujam J (eds) Compilers and operating systems for low power. Springer, Berlin, pp 75–93CrossRefGoogle Scholar
  6. 6.
    Thereska E, Donnelly A, Narayanan D (2011) Sierra: practical power-proportionality for data center storage. In: Proceedings of the Sixth Conference on Computer Systems, ACM, pp 169–182Google Scholar
  7. 7.
    Entrialgo J, Medrano R, Garca DF, Garca J (2015) Autonomic power management with self-healing in server clusters under QoS constraints. Computing 98(9):1–24MathSciNetGoogle Scholar
  8. 8.
    Maccio VJ, Down DG (2015) On optimal policies for energy-aware servers. Perform Eval 90:36–52CrossRefGoogle Scholar
  9. 9.
    Ferreira AM, Pernici B (2016) Managing the complex data center environment: an integrated energy-aware framework. Computing 96(7):709–749MathSciNetCrossRefGoogle Scholar
  10. 10.
    Chase JS, Anderson DC, Thakar PN, Vahdat AM, Doyle RP (2001) Managing energy and server resources in hosting centers. ACM SIGOPS Oper Syst Rev 35(5):103–116CrossRefGoogle Scholar
  11. 11.
    Krioukov A et al (2011) Napsac: design and implementation of a power-proportional web cluster. ACM SIGCOMM Comput Commun Rev 41(1):102–108CrossRefGoogle Scholar
  12. 12.
    Okamura H, Miyata S, Dohi T (2016) A markov decision process approach to dynamic power management in a cluster system. IEEE Access 3:3039–3047CrossRefGoogle Scholar
  13. 13.
    Deng Y, Hu Y, Meng X, Zhu Y, Zhang Z, Han J (2014) Predictively booting nodes to minimize performance degradation of a power-aware web cluster. Clust Comput 17(4):1309–1322CrossRefGoogle Scholar
  14. 14.
    Zhang L, Deng Y, Zhu W, Peng J, Wang F (2015) Skewly replicating hot data to construct a power-efficient storage cluster. J Netw Comput Appl 50:168–179CrossRefGoogle Scholar
  15. 15.
    EMC VNX Virtual Provisioning Applied Technology, White Paper, EMC Corporation (2013)Google Scholar
  16. 16.
    Staelin C, Garcia-Molina H (1990) Clustering active disk data to improve disk performance. Technical Report CSTR-283-90, Department of Computer Science, Princeton UniversityGoogle Scholar
  17. 17.
    Cherkasova L, Ciardo G (2000) Characterizing temporal locality and its impact on web server performance. Technical Report HPL-2000-82, Hewlett Packard Laboratories, July 2000Google Scholar
  18. 18.
    Gomez ME, Santonja V (2002) Characterizing temporal locality in I/O workload. In: Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication SystemsGoogle Scholar
  19. 19.
  20. 20.
    Narayanan D, Donnelly A, Rowstron A (2008) Write off-loading: practical power management for enterprise storage. ACM Trans Storage 4(3):256–267CrossRefGoogle Scholar
  21. 21.
    Weddle C et al (2007) PARAID: a gear-shifting power-aware RAID. ACM Trans Storage 3(3):13CrossRefGoogle Scholar
  22. 22.
    Mao B et al (2008) GRAID: a green RAID storage architecture with improved energy efficiency and reliability. In: Proceedings of the 16th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2008), IEEEGoogle Scholar
  23. 23.
    Bui DM, Nguyen HQ, Yoon Y, Jun S, Amin MB, Lee S (2015) Gaussian process for predicting CPU utilization and its application to energy efficiency. Appl Intell 43(4):874–891CrossRefGoogle Scholar
  24. 24.
    Deng Y (2011) What is the future of disk drives, death or rebirth? ACM Comput Surv 43(3):23CrossRefGoogle Scholar
  25. 25.
    Patterson DA, Gibson G, Katz RH (1988) A case for redundant arrays of inexpensive disks (RAID). In: Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data (SIGMOD ’88), pp 109–116Google Scholar
  26. 26.
    Tait CD, Duchamp D (1991) Detection and exploitation of file working sets. In: Proceedings of the 11th International Conference on Distributed Computing Systems, pp 2–9Google Scholar
  27. 27.
    Lei H, Duchamp D (1997) An analytical approach to file prefetching. In: Proceedings of the Annual Conference on USENIX Annual Technical Conference (UATEC ’97)Google Scholar
  28. 28.
    Kroeger TM, Long DDE (1999) The case for efficient file access pattern modeling. In: Proceedings of the 7th Workshop on Hot Topics in Operating Systems, IEEE, pp 14–19Google Scholar
  29. 29.
    Kroeger TM, Long DDE (2001) Design and implementation of a predictive file prefetching algorithm. In: Proceedings of the General Track: 2001 USENIX Annual Technical Conference, pp 105–118Google Scholar
  30. 30.
    Ishii Y, Inaba M, Hiraki K (2011) Access map pattern matching for high performance data cache prefetch. J Instr Level Parallelism 13:1–24Google Scholar
  31. 31.
    Wu Y, Otagiri K, Watanabe Y, Yokota H (2011) A file search method based on intertask relationships derived from access frequency and rmc operations on files. In: Proceedings of the 22nd International Conference on Database and Expert Systems Applications (DEXA ’11), pp 364–378Google Scholar
  32. 32.
    He J, Sun XH, Thakur R (2012) Knowac, I/O prefetch via accumulated knowledge. In: Proceedings of the 2012 IEEE International Conference on CLUSTER Computing, pp 429–437Google Scholar
  33. 33.
    Jiang S, Ding X, Xu Y, Davis K (2013) A prefetching scheme exploiting both data layout and access history on disk. ACM Trans Storage 9(3):1–23CrossRefGoogle Scholar
  34. 34.
    Xia P, Feng D, Jiang H, Tian L, Wang F (2008) FARMER: a novel approach to file access correlations mining and evaluation reference model for optimizing peta-scale file system performance. In: Proceedings of the 17th International Symposium on High Performance Distributed Computing, ACMGoogle Scholar
  35. 35.
    Agrawal R, Imieliski T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2):207–216CrossRefGoogle Scholar
  36. 36.
    Iritani M, Yokota H (2012) Effects on performance and energy reduction by file relocation based on file-access correlations. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops (EDBT-ICDT ’12), ACM, pp 79–86Google Scholar
  37. 37.
    Aye KN, Thein T (2015) A platform for big data analytics on distributed scale-out storage system. Int J Big Data Intell 2(2):127–141CrossRefGoogle Scholar
  38. 38.
    Lin W, Wu W, Wang H, Wang JZ, Hsu CH (2016) Experimental and quantitative analysis of server power model for cloud data centers. Future Gener Comput Syst. Google Scholar
  39. 39.
    Sarwesh P et al (2017) Effective integration of reliable routing mechanism and energy efficient node placement technique for low power IoT networks. Int J Grid High Perform Comput 9(4):16–35CrossRefGoogle Scholar
  40. 40.
    Xie J, Deng Y, Min G, Zhou Y (2017) An incrementally scalable and cost-efficient interconnection structure for datacenters. IEEE Trans Parallel Distrib Syst 28(6):1578–1592CrossRefGoogle Scholar
  41. 41.
    Deng Y (2009) Deconstructing network attached storage systems. J Netw Comput Appl 32(5):1064–1072CrossRefGoogle Scholar
  42. 42.
    Li Z, Chen Z, Srinivasan SM, Zhou Y (2004) C-Miner: mining block correlations in storage systems. In: Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST ’04), pp 173–186Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceJinan UniversityGuangzhouChina
  2. 2.State Key Laboratory of Computer Architecture, Institute of ComputingChinese Academy of SciencesBeijingChina

Personalised recommendations