Skip to main content
Log in

Stabilizing and boosting I/O performance for file systems with journaling on NVMe SSD

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Many journaling file systems currently use non-volatile memory-express (NVMe) solid-state drives (SSDs) as external journal devices to improve the input and output (I/O) performance. However, when facing microwrite workloads, which are typical of many applications, they suffer from severe I/O fluctuations and the NVMe SSD utilization is extremely low. The experimental results indicate that this phenomenon arises mainly because writing back data to backend file systems on hard disk drives is much slower than journal writing, causing journal writing to frequently freeze because of the two-phase mechanism. We, therefore, propose a merging-in-memory (MIM) acceleration architecture to stabilize and boost the I/O performance for such journaling file systems. MIM employs an efficient data structure of hash-table-based multiple linked lists in memory, which not only merges random microwrites into sequential large blocks to speed up writebacks but also provides additional gains in terms of reducing the frequency of write addressing and object opening and closing. Using a prototype implementation in Ceph FileStore, we experimentally show that MIM not only eliminates severe fluctuations but also improves the I/O operations per second by roughly 1×–12× and reduces the write latency by 75%–98%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Tweedie S. Ext3, journaling filesystem. In: Proceedings of Ottawa Linux Symposium, Ottawa, 2000. 24–29

  2. Mathur A, Cao M, Bhattacharya S, et al. The new EXT4 filesystem: current status and future plans. In: Proceedings of the Linux symposium, Ottawa, 2007. 21–33

  3. Weil S A, Brandt S A, Miller E L, et al. Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI), Seattle, 2006. 307–320

  4. Wei Q, Chen J, Chen C. Accelerating file system metadata access with byte-addressable nonvolatile memory. ACM Trans Storage, 2015, 11: 1–28

    Google Scholar 

  5. Sehgal P, Basu S, Srinivasan K, et al. An empirical study of file systems on NVM. In: Proceedings of the 31st International Conference on Mass Storage Systems and Technologies (MSST), Santa Clara, 2015. 1–14

  6. Chen C, Yang J, Wei Q, et al. Optimizing file systems with fine-grained metadata journaling on byte-addressable NVM. ACM Trans Storage, 2017, 13: 1–25

    Google Scholar 

  7. Lee D-Y, Jeong K, Han S-H, et al. Understanding write behaviors of storage backends in ceph object store. In: Proceedings of the 33rd IEEE International Conference on Massive Storage Systems and Technology (MSST), Santa Clara, 2017

  8. Roselli D S, Lorch J R, Anderson T E, et al. A comparison of file system workloads. In: Proceedings of 2000 USENIX Annual Technical Conference (ATC), San Diego, 2000. 41–54

  9. Leung A W, Pasupathy S, Goodson G R, et al. Measurement and analysis of large-scale network file system workloads. In: Proceedings of 2008 USENIX Annual Technical Conference (ATC), Boston, 2008. 2–5

  10. Dong M, Ota K, Yang L T, et al. LSCD: a low-storage clone detection protocol for cyber-physical systems. IEEE Trans Comput-Aided Des Integr Circ Syst, 2016, 35: 712–723

    Article  Google Scholar 

  11. Li D, Dong M, Tang Y, et al. A novel disk I/O scheduling framework of virtualized storage system. Cluster Comput, 2019, 22: 2395–2405

    Article  Google Scholar 

  12. Joo Y, Park S, Bahn H. Exploiting I/O reordering and I/O interleaving to improve application launch performance. ACM Trans Storage, 2017, 13: 1–17

    Article  Google Scholar 

  13. Chahal D, Nambiar M. Cloning io intensive workloads using synthetic benchmark. In: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, L’AQUILA, 2017. 317–320

  14. Madireddy S, Balaprakash P, Carns P, et al. Analysis and correlation of application I/O performance and system-wide I/O activity. In: Proceedings of the 12th IEEE International Conference on Networking, Architecture, and Storage (NAS), Shenzhen, 2017. 1–10

  15. Li D, Dong M, Tang Y, et al. Triple-L: improving CPS disk I/O performance in a virtualized NAS environment. IEEE Syst J, 2015, 11: 152–162

    Article  Google Scholar 

  16. Jannen W, Yuan J, Zhan Y, et al. BetrFS: a right-optimized write-optimized file system. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, 2015. 301–315

  17. Best S. Journaling file systems. Linux Magaz, 2002, 4: 24–31

    Google Scholar 

  18. Chen J, Tan Z, Wu F, et al. sJournal: a new design of journaling for file systems to provide crash consistency. In: Proceedings of the 9th IEEE International Conference on Networking, Architecture, and Storage (NAS), Tianjin, 2014. 53–62

  19. Lee W, Lee K, Son H, et al. WALDIO: eliminating the filesystem journaling in resolving the journaling of journal anomaly. In: Proceedings of 2015 USENIX Annual Technical Conference (ATC), Santa Clara, 2015. 235–247

  20. Dua R, Kohli V, Patil S, et al. Performance analysis of union and cow file systems with docker. In: Proceedings of 2016 International Conference on Computing, Analytics and Security Trends (CAST), India, 2016. 550–555

  21. Son M, Ahn J, Yoo S. Nonvolatile write buffer-based journaling bypass for storage write reduction in mobile devices. IEEE Trans Comput-Aided Design Integr Circ Syst, 2017, 37: 1747–1759

    Article  Google Scholar 

  22. Huang K, Zhou J, Huang L, et al. NVHT: an efficient key-value storage library for non-volatile memory. J Parall Distrib Comput, 2018, 12: 339–354

    Article  Google Scholar 

  23. Nightingale E B, Veeraraghavan K, Chen P M, et al. Rethink the sync. ACM Trans Comput Syst, 2018, 26: 6

    Google Scholar 

  24. Aghayev A, Ts’o T, Gibson G, et al. Evolving EXT4 for shingled disks. In: Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, 2017. 105–120

  25. Rodeh O, Teperman A. ZFS-a scalable distributed file system using object disks. In: Proceedings of 2003 International Conference on Mass Storage Systems and Technologies (MSST), San Diego, 2003. 207–218

  26. Rodeh O, Bacik J, Mason C. Btrfs: the linux B-tree filesystem. ACM Trans Storage, 2013, 9: 1–32

    Article  Google Scholar 

  27. Chen J, Wang J, Tan Z H, et al. Effects of recursive update in copy-on-write file systems: a BTRFS case study. Can J Electr Comput Eng, 2014, 37: 113–122

    Article  Google Scholar 

  28. Choi H J, Lim S-H, Park K H. JFTL: a flash translation layer based on a journal remapping for flash memory. ACM Trans Storage, 2009, 4: 1–22

    Article  Google Scholar 

  29. Lee E, Yoo S, Jang J-E, et al. Shortcut-JFS: a write efficient journaling file system for phase change memory. In: Proceedings of 2012 IEEE Conference on Mass Storage Systems and Technologies (MSST), Pacific Grove, 2012. 1–6

  30. Chen T-Y, Chang Y-H, Chen S-H. Enabling write-reduction strategy for journaling file systems over byte-addressable NVRAM. In: Proceedings of the 54th International Conference on Design Automation Conference (DAC), Austin, 2017. 1–6

  31. O’Neil P, Cheng E, Gawlick D, et al. The log-structured merge-tree (LSM-tree). Acta Inform, 1996, 33: 351–385

    Article  Google Scholar 

  32. Shetty P J, Spillane R P, Malpani R R, et al. Building workload-independent storage with VT-trees. In: Proceedings of 11th USENIX Conference on File and Storage Technologies (FAST), San Jose, 2013. 17–30

  33. Wu X, Xu Y, Shao Z, et al. LSM-trie: an LSM-tree-based ultra-large key-value store for small data items. In: Proceedings of 2015 USENIX Annual Technical Conference (ATC), Santa Clara, 2015. 71–82

  34. Lu L, Pillai T S, Gopalakrishnan H, et al. Wisckey: separating keys from values in SSD-conscious storage. ACM Trans Storage, 2017, 13: 5

    Article  Google Scholar 

  35. Griffiths N. nmon performance: a free tool to analyze AIX and linux performance. 2003. https://sourceforge.net/projects/nmon/

  36. Son Y, Kim S, Yeom H Y, et al. High-performance transaction processing in journaling file systems. In: Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST), Oakland, 2018. 227–240

  37. Rajimwale A, Prabhakaran V, Davis J D. Block management in solid-state devices. In: Proceedings of 2009 USENIX Annual Technical Conference (ATC), San Diego, 2009

  38. Tarasov V, Zadok E, Shepler S. Filebench: a flexible framework for file system benchmarking. The USENIX Magaz, 2016, 41: 6–12

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Key R&D Program of China (Grant No. 2018YFB1004704), National Natural Science Foundation of China (Grant Nos. 61832005, 61872171), Natural Science Foundation of Jiangsu Province (Grant No. BK20190058), Key R&D Program of Jiangsu Province (Grant No. BE2017152), Science and Technology Program of State Grid Corporation of China (Grant No. 52110418001M), and Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Bin Tang or Baoliu Ye.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qian, L., Tang, B., Ye, B. et al. Stabilizing and boosting I/O performance for file systems with journaling on NVMe SSD. Sci. China Inf. Sci. 65, 132102 (2022). https://doi.org/10.1007/s11432-019-2808-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-019-2808-x

Keywords

Navigation