Loop transformations for flash memory: cost models and performance effects

Abstract

Loop optimization, made of a sequence of loop transformations, plays an important role in performance improvement in data centric applications. Programs using flash memory are no exception to this, but, under certain conditions careless applications of specific loop transformations might cause unexpected results, due to the characteristics of flash memory and underlying management systems. In this article, we analyze how loop transformations affect the performance in flash translation layers (FTLs). First, we choose four loop structures which have distinct reference patterns and propose a cost model for each structure, reflecting the properties of flash memory. Then, using these cost models, we investigate how loop transformations affect the block associative sector translation (BAST)’s and fully associative sector translation (FAST)’s internal operations and analyze the performance effect of loop transformations experimentally. As a result, we find that some of the major loop transformations cause unexpected performance effects in those major FTLs under certain conditions.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Notes

  1. 1.

    To simplify the analysis and cost models, our work does not consider any layers interleaved between applications and FTL. We believe that such an assumption is reasonable starting point of complex models analysis. In addition, this assumption is immediately applicable for explicit I/O operations [19, 20]. In Fig. 1, iowrite() indicates a function to request explicit write operations [19].

  2. 2.

    lbn is calculated by dividing the lpn by the number of pages in a block. In Fig. 3, the spare area has lpn, and the number of pages per block is 4. Thus, the lbn within pbn, 10, is 1 (4 div 4).

    Fig. 3
    figure3

    Distinct FTL designs between BAST and FAST [4]

  3. 3.

    The immature state means that a log block has available space.

References

  1. 1.

    Intel Corporation (1998) Understanding the flash translation layer (FTL) specification. Intel Technical Report AP-684

  2. 2.

    Chung TS, Park DJ, Park S, Lee DH, Lee SW, Song HJ (2009) A survey of flash translation layer. J Syst Archit 55:332–343

    Article  Google Scholar 

  3. 3.

    Kim JS, Kim JM, Noh SH, Min SL, Cho YK (2002) A space-efficient flash translation layer for compactflash systems. IEEE Trans Consum Electron 48:366–375

    Article  Google Scholar 

  4. 4.

    Lee SW, Park DJ, Chung TS, Lee DH, Park S, Song HJ (2007) A log buffer-based flash translation layer using fully-associative sector translation. ACM Trans Embed Comput Syst doi:10.1145/1275986.1275990

  5. 5.

    Jung D, Kang J, Jo H, Kim JS, Lee J (2010) Superblock FTL: a superblock-based flash translation layer with a hybrid address translation scheme. ACM Trans Embed Comput Syst 9:1–41

    Article  Google Scholar 

  6. 6.

    Park C, Cheon W, Kang J, Roh K, Cho W, Kim JS (2008) A re-configurable FTL architecture for NAND flash-based applications. ACM Trans Embed Comput Syst 7:1–23

    Google Scholar 

  7. 7.

    Kim H, Ahn S (2008) BPLRU: a buffer management scheme for improving random writes in flash storage. In: Proceeding of USENIX conference on file and storage technologies (FAST), pp 1–14

  8. 8.

    Jo H, Kang JU, Park SY, Kim JS, Lee J (2006) FAB: flash-aware buffer management policy for portable media players. IEEE Trans Consum Electron 52:485–493

    Article  Google Scholar 

  9. 9.

    Kang S, Park S, Jung H, Shim H, Cha J (2009) Performance trade-offs in using nvram write buffer for flash memory-based storage devices. IEEE Trans Comput 58:744–758

    Article  MathSciNet  Google Scholar 

  10. 10.

    Shi L, Li J, Xue CJ, Yang C, Zhou X (2011) ExLRU: a unified write buffer cache management for flash memory. In: Proceeding of ACM international conference on embedded software (EMSOFT), pp 339–348

  11. 11.

    Lee SW, Moon B (2007) Design of flash-based DBMS: an in-page logging approach. In: Proceeding of SIGMOD conference, pp 55–66

  12. 12.

    Kang WH, Lee SW, Moon B, Oh GH, Min C (2013) X-FTL: transactional FTL for SQLite databases. In: Proceeding of SIGMOD conference, pp 97–108

  13. 13.

    Chen F, Luo T, Zhang X (2011) CAFTL: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In: Proceeding of USENIX Conference file and storage technologies (FAST), pp 77–90

  14. 14.

    Kim J, Lee C, Lee S, Son S, Choi J, Yoon S, Lee HU, Kang S, Won Y, Cha J (2012) Deduplication in SSDs: model and quantitative analysis. In: Proceeding of IEEE conference massive data storage (MSST), pp 1–12

  15. 15.

    Gupta A, Pisolkar R, Urgaonkar B, Sivasubramaniam A (2011) Leveraging value locality in optimizing NAND flash-based SSDs. In: Proceeding of USENIX conference on file and storage technologies (FAST), pp 91–103

  16. 16.

    Debnath B, Sengupta S, Li J (2010) ChunkStash: speeding up inline storage deduplication using flash memory. In: Proceeding of USENIX Conference on file and storage technologies (FAST)

  17. 17.

    Muchnick SS (1998) Advanced compiler design and implementation. Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  18. 18.

    Mckinley KS, Carr S, Tseng CW (1996) Improving data locality with loop transformations. ACM Trans Program Lang Syst 18(4):424–453

    Article  Google Scholar 

  19. 19.

    Kandemir M, Choudhary A, Ramanujam J (2002) An I/O-conscious tiling strategy for disk-resident data sets. J Supercomput 21(3):257–284

    Article  MATH  Google Scholar 

  20. 20.

    Mowry TC, Demke AK, Krieger O (1996) Automatic compiler-inserted I/O prefetching for out-of-core applications. In: Proceeding of USENIX symposium on operating systems design and implementation (OSDI), pp 3–17

  21. 21.

    Samsung Electronics (2004) NAND flash memory & smartmedia data book

  22. 22.

    Shin D (2010) Workload-driven adaptive log buffer-based FTL. IEICE Electron Express 7(11):804–809

    Article  Google Scholar 

  23. 23.

    Lee D, Shin F, Kim Y, Kim J (2008) LAST: locality-aware sector translation for NAND flash memory-based storage systems. Proc ACM SIGOPS Oper Syst Rev 42(6):36–42

    Article  Google Scholar 

  24. 24.

    Shin H, Jung D, Kim J, Kim J, Maeng S (2010) Co-optimization of buffer layer and FTL in high-performance flash-based storage systems. Des Autom Embed Syst 14:415–443

    Article  Google Scholar 

  25. 25.

    Bouganim L, Jonsson B, Bonnet P (2009) uFlip: understanding flash IO patterns. In: Proceeding of conference on innovative data systems research (CIDR)

  26. 26.

    Paik JY, Chung TS, Cho ES (2013) Cost model based analyses on performance effects of loop transformations in block associative sector translation. In: Proceeding of IEEE/IFIP international conference on embedded and ubiquitous computing (EUC), pp 1998–2005

  27. 27.

    Zhang W, Leiss EL (2001) A compiler driven out-of-core programming approach for optimizing data locality in loop nests. In: Proceeding of international conference on parallel and distributed processing techniques and applications (PDPTA), pp 25–28

  28. 28.

    Johnson T and Shasha D (1994) 2Q: A low overhead high performance buffer management replacement algorithm. In: Proceeding of international conference on very large data bases (VLDB), pp 439–450 Zhang W, pp 25–28

  29. 29.

    Megiddo N and Modha DS (2003) ARC: a self-tuning, low overhead replacement cache. In: Proceeding of USENIX conference on file and storage technogies (FAST), pp 115–130

  30. 30.

    Jo H, Kang K, Park S, Kim J, Lee J (2006) FAB: flash-aware buffer management replacment algorithm. IEEE Trans Consum Electron 52(2):485–493

    Article  Google Scholar 

  31. 31.

    Kim H and Ahn S (2008) BPLRU: a buffer management scheme for improving random writes in flash storage. In: Proceeding of USENIX conference on file and storage technogies (FAST), pp 1–14

  32. 32.

    Laforest E (2010) Survey of loop transformation techniques. Technical Report. University of Toronto, Toronto

  33. 33.

    Qiu M, Wu J, Xue C.J., Hu JA, Tseng W-C, and Sha E H-M (2008) Loop scheduling and assignment to minimize energy while hiding latency for heterogeneous multi-bank memory. In: Proceedings of IEEE international conference on field programmable logic and applications (FPL), pp 459–462

  34. 34.

    Xue C, Shao Z, Chen Y, and Sha E H-M (2005) Optimizing DSP scheduling via address assignment with array and loop transformation. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 85–88

  35. 35.

    Xue CJ, Sha EH-M, Shao Z, Qiu M (2008) Effective loop partitioning and scheduling under memory and register dual constraints. In: Proceedings of IEEE/ACM design, automation and test in Europe (DATE), pp 1202–1207

  36. 36.

    Hosomi M, Yamagishi H, Yamamoto Y, Bessho K, Higo Y, Yamane K, Yamada H, Shoji M, Hachino H, Fukumoto C, Nagao H, Kano H (2005) A novel nonvolatile memory with spin torque transfer magnetization switching: spin-RAM. In: Proceeding of international electron devices meeting, pp 459–462

  37. 37.

    Qiu K, Zhao M, Fu C, Shi L, Xue CJ (2013) Migration-aware loop retiming for STT-RAM based hybrid cache for embedded systems. Proceeding of IEEE international conference on application-specific systems, architectures and processors (ASAP), pp 83–89

  38. 38.

    Paik JY, Cho ES, Chung TS (2009) Performance improvement for flash memories using loop optimization. In: Proceeding of IEEE international conference on computational science and engineering (CSE), pp 508–513

  39. 39.

    Kim S, Kwon K, Kim C, Jang C, Lee J, Min SL (2010) Demand paging techniques for flash memory using compiler post-pass optimizations. ACM Trans Embed Comput Syst 10(4):40

  40. 40.

    Lin CC, Chen CL, and Tseng CH (2007) Source code arrangement of embedded java virtual machine for NAND flash memory. In: Proceeding of international conference on symposium on communications and information technologies (ISCIT), pp 152–157

  41. 41.

    Gupta A, Kim Y, Urgaonkar B (2009) DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings. In: Proceeding of internationl conference on architectural support for programming languages and operating systems (ASPLOS) pp 229–240

  42. 42.

    Qin Z, Wang Y, Liu D, Shao Z (2010) Demand-based block-level address mapping in large-scale NAND flash storage systems. In: Proceeding of CODES + ISSS, pp 173–182

  43. 43.

    Qin Z, Wang Y, Liu D, Shao Z (2011) A two-level caching mechanism for demand-based page-level address mapping in NAND flash memory storage systems. In: Proceeding of international conference on IEEE real-time and embedded technology and applications symposium, pp 157–166

  44. 44.

    Gniady C, Butt AR, Hu YC (2004) Program-counter-based pattern classification in buffer caching. In: Proceeding of USENIX conference, symposium on opearting systems design and implementation (OSDI), pp 27–27

  45. 45.

    Brock J, Gu X, Bao B, Ding C (2013) Pacman: Program-assisted cache management. In: Proceeding of international symposium on memory management (ISMM), pp 39–50

Download references

Acknowledgments

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2010-0013386).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Eun-Sun Cho.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Paik, JY., Chung, TS. & Cho, ES. Loop transformations for flash memory: cost models and performance effects. Des Autom Embed Syst 17, 627–667 (2013). https://doi.org/10.1007/s10617-014-9144-7

Download citation

Keywords

  • Loop structures
  • Loop transformations
  • Flash translation layer (FTL)
  • Cost models
  • Performance