Skip to main content
Log in

Operating system level data tiering using online workload characterization

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Over the past decade, storage has been the performance bottleneck in I/O-intensive programs such as online transaction processing applications. To alleviate this bottleneck with minimal cost penalty, cost-effective design of a high-performance disk subsystem is of decisive importance in enterprise applications. Data tiering is an efficient way to optimize cost, performance, and reliability in storage servers. With the promising advantages of solid-state drives (SSDs) over hard disk drives (HDDs) such as lower power consumption and higher performance, traditional data tiering techniques should be revisited to use SSDs in a more efficient way. Previously proposed tiering solutions have attempted to enhance performance based on different parameters such as request size or randomness. These solutions, however, are mostly optimized towards one type of I/O workloads and are not applicable to workloads with different characteristics. This paper presents an online data tiering technique at the operating system level with a linear weighted formulation to enhance I/O performance with minimal cost overhead. The proposed technique characterizes the workload access pattern with respect to metadata versus user data, frequency of accesses, random versus sequential accesses, and read versus write accesses. To evaluate the proposed technique, it is implemented on a Linux 3.1.4 equipped with ext2 filesystem. The experimental results over I/O-intensive workloads show that the proposed technique improves performance up to 30 % as compared to the previous techniques while imposing negligible memory overhead to the system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. This workload will be detailed in Sect. 4.2.1.

References

  1. Agrawal N, Bolosky WJ, Douceur JR, Lorch JR (2007) A five-year study of file-system metadata. ACM Trans Storage 3(3). doi:10.1145/1288783.1288788

  2. Amdahl GM (1967) Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the spring joint computer conference, AFIPS ’67 (Spring), pp 483–485

  3. Appuswamy R, van Moolenbroek D, Tanenbaum A (2012) Integrating flash-based SSDs into the storage stack. In: 28th IEEE symposium on mass storage systems and technologies (MSST), pp 1–12. doi:10.1109/MSST.2012.6232365

  4. Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–192. doi:10.1016/0743-7315(90)90028-N. http://www.sciencedirect.com/science/article/pii/074373159090028N

  5. Brunelle AD (2006) Block I/O layer tracing: blktrace. In: Gelato-Itanium conference and expo (gelato-ICE)

  6. Card R, Ts’o T, Tweedie S (1994) Design and implementation of the second extended filesystem. In: Proceedings of the 1st Dutch international symposium on Linux, pp 1–6

  7. Chen F, Koufaty DA, Zhang X (2009) Understanding intrinsic characteristics and system implications of flash memory based solid state drives. SIGMETRICS Perform Eval Rev 37(1):181–192. doi:10.1145/2492101.1555371

  8. Chen F, Koufaty DA, Zhang X (2011) Hystor: making the best use of solid state drives in high performance storage systems. In: Proceedings of the 13th international conference on supercomputing (ICS), pp 22–32. doi:10.1145/1995896.1995902

  9. Corbet J, Rubini A, Kroah-Hartman G (2009) Linux device drivers. O’Reilly Media, Sebastopol

  10. EMC Corporation (2011) EMC FASTVP for unified storage systems. Technical Report h8058.3, EMC

  11. Guerra J, Pucha H, Glider J, Belluomini W, Rangaswami R (2011) Cost effective storage using extent based dynamic tiering. In: Proceedings of the 9th USENIX conference on file and storage technologies (FAST), pp 20–20

  12. Jin X, Jung S, Song YH (2010) Write-aware buffer management policy for performance and durability enhancement in NAND flash memory. IEEE Trans Consum Electron 56(4):2393–2399. doi:10.1109/TCE.2010.5681118

    Article  Google Scholar 

  13. Kaiser J, Meister D, Hartung T, Brinkmann A (2012) Esb: Ext2 split block device. In: 18th IEEE international conference on parallel and distributed systems (ICPADS), pp 181–188

  14. Katcher J (1997) Postmark: a new filesystem benchmark. Technical Report TR3022, Network Appliance

  15. Kim J, Seo S, Jung D, Kim JS, Huh J (2012) Parameter-aware I/O management for solid state disks (SSDs). IEEE Trans Comput 61(5):636–649. doi:10.1109/TC.2011.76

    Article  MathSciNet  Google Scholar 

  16. Kim JK, Lee HG, Choi S, Bahng KI (2008) A PRAM and NAND flash hybrid architecture for high-performance embedded storage subsystems. In: Proceedings of the 8th ACM international conference on embedded software (EMSOFT), pp 31–40. doi:10.1145/1450058.1450064

  17. Kim Y, Gupta A, Urgaonkar B, Berman P, Sivasubramaniam A (2011) HybridStore: a cost-efficient, high-performance storage system combining SSDs and HDDs. In: 19th IEEE international symposium on modeling, analysis and simulation of computer and telecommunication systems (MASCOTS), pp 227–236

  18. Klonatos Y, Makatos T, Marazakis M, Flouris M, Bilas A (2011) Azor: using two-level block selection to improve SSD-based I/O caches. In: 6th IEEE international conference on networking, architecture and storage (NAS), pp 309–318. doi:10.1109/NAS.2011.50

  19. Koltsidas I, Viglas SD (2008) Flashing up the storage layer. Proc VLDB Endow 1(1):514–525. doi:10.1145/1453856.1453913

    Article  Google Scholar 

  20. Lin L, Zhu Y, Yue J, Cai Z, Segee B (2011) Hot random off-loading: a hybrid storage system with dynamic data migration. In: Proceedings of the 19th IEEE annual international symposium on modelling, analysis, and simulation of computer and telecommunication systems (MASCOTS), pp 318–325. doi:10.1109/MASCOTS.2011.41

  21. Liu S, Jiang J, Yang G (2012) Macss: a metadata-aware combo storage system. In: International conference on systems and informatics (ICSAI), pp 919–923. doi:10.1109/ICSAI.2012.6223157

  22. Liu Y, Huang J, Xie C, Cao Q (2010) RAF: a random access first cache management to improve SSD-based disk cache. In: 5th IEEE international conference on networking, architecture and storage (NAS), pp 492–500. doi:10.1109/NAS.2010.9

  23. Luo T, Lee R, Mesnier M, Chen F, Zhang X (2012) hStorage-DB: heterogeneity-aware data management to exploit the full capability of hybrid storage systems. Proc VLDB Endow 5(10):1076–1087

    Article  Google Scholar 

  24. Martin J, Clayton N, Frese LL, Hossain K, McNutt B, Xu Y (2011) IBM system storage DS8800 and DS8700 performance with easy tier 3rd generation. Technical Report WP102024, International Business Machines Corporation

  25. Mesnier MP, Akers JB (2011) Differentiated storage services. SIGOPS Oper Syst Rev 45(1):45–53. doi:10.1145/1945023.1945030

  26. Miller EL, Brand SA, Long DDE (2001) HeRMES: high-performance reliable MRAM-enabled storage. In: Proceedings of the 8th workshop on hot topics in operating systems (HotOS), pp 95–99

  27. Narayanan D, Thereska E, Donnelly A, Elnikety S, Rowstron A (2009) Migrating server storage to SSDs: analysis of tradeoffs. In: Proceedings of the 4th ACM European conference on computer systems (EuroSys), pp 145–158. doi:10.1145/1519065.1519081

  28. Norcott W, Capps D (2002) Iozone filesystem benchmark program

  29. Ou Y, Härder T (2010) Clean first or dirty first?: a cost-aware self-adaptive buffer replacement policy. In: Proceedings of the 14th international database engineering and applications symposium (IDEAS), pp 7–14. doi:10.1145/1866480.1866482

  30. Roselli D, Lorch JR, Anderson TE (2000) A comparison of file system workloads. In: Proceedings of the 11th USENIX conference on annual technical conference (ATC), p 4

  31. Salkhordeh R (2014) Data tiering kernel module. http://dsn.ce.sharif.edu/tiering/tiering-kernel-module.tar.gz. Accessed 2014-08-01

  32. Shaw S (2012) HammerDB: the open source oracle load test tool

  33. Shi L, Li J, Jason Xue C, Zhou X (2013) Hybrid nonvolatile disk cache for energy-efficient and high-performance systems. ACM Trans Des Autom Electron Syst 18(1):8:1–8:23. doi:10.1145/2390191.2390199

  34. Soundararajan G, Prabhakaran V, Balakrishnan M, Wobber T (2010) Extending SSD lifetimes with disk-based write caches. In: Proceedings of the 8th USENIX conference on file and storage technologies (FAST), p 8

  35. Tweedie S (2000) Ext3, journaling filesystem. In: Ottawa Linux symposium, pp 24–29

  36. Wilson A (2008) The new and improved filebench. In: Proceedings of the 6th USENIX conference on file and storage technologies (FAST)

  37. Yang P, Jin P, Yue L (2011) Hybrid storage with disk based write cache. In: Xu J, Yu G, Zhou S, Unland R (eds) Database systems for advanced applications. Lecture notes in computer science, vol 6637. Springer, Berlin, pp 264–275

  38. Zhou K, Huang P, Li C, Wang H (2012) An empirical study on the interplay between filesystems and SSD. In: 7th IEEE international conference on networking, architecture and storage (NAS), pp 124–133. doi:10.1109/NAS.2012.21

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hossein Asadi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Salkhordeh, R., Asadi, H. & Ebrahimi, S. Operating system level data tiering using online workload characterization. J Supercomput 71, 1534–1562 (2015). https://doi.org/10.1007/s11227-015-1377-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-015-1377-0

Keywords

Navigation