PAHDFS: Preference-Aware HDFS for Hybrid Storage

  • Wei Zhou
  • Dan Feng
  • Zhipeng TanEmail author
  • Yingfei Zheng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9529)


In order to satisfy requirements of real-time processing and large capacity put forwarded by big data, hybrid storage has become a trend. There’s asymmetric read/write performance for storage devices, and asymmetric read/write access characteristics for data. Data may obtain different access performance on the same device due to access characteristics waving, and the most suitable device of data may also change at different time points. As data prefer to reside on device on which they can obtain higher access performance, this paper distributes data on device with highest preference degree to improve performance and efficiency of whole storage system. A Preference-Aware HDFS (PAHDFS) with high efficiency and scalability is implemented. PAHDFS shows good performance in experiments.


Hybrid storage HDFS Big data Preference-aware Access characteristics 



This work is supported by National Basic Research 973 Program of China under Grant No. 2011CB302301, National University’s Special Research Fee No. 2015XJGH010, NSFC No. 61173043.


  1. 1.
    Chen, S, Gibbons, P, Nath, S.: Rethinking database algorithms for phase change memory. In: 5th Biennial Conference on Innovative Data Systems Research (CIDR), pp. 21–31. Asilomar, California, USA (2011)Google Scholar
  2. 2.
    Gao, S., Xu, J.-L., He, B., et al.: PCMLogging: reducing transaction logging overhead with PCM. In: 20th Conference on Information and Knowledge Management (CIKM), pp. 2401–2404. Glasgow, Scotland, UK (2011)Google Scholar
  3. 3.
    Sun, G.-Y., Joo Y, Chen Y-B, Niu D-M, et al.: A Hybrid solid-state storage architecture for the performance, energy consumption, and-lifetime-improvement. In: 16th International Conference on High-Performance Computer Architecture (HPCA), pp. 1–12. Bangalore, India (2010)Google Scholar
  4. 4.
  5. 5.
    Apache Hadoop.
  6. 6.
  7. 7.
    Chen, S.: FlashLogging: exploiting flash devices for synchronous logging performance. In: 35th SIGMOD International Conference on Management of Data, pp. 73–86. Rhode Island, USA (2009)Google Scholar
  8. 8.
    Lv, Y., Li, J., Cui, B., Chen, X.: Log-compact R-tree: an efficient spatial index for SSD. In: 16th International Conference on Database Systems for Advanced Applications, pp. 202–213. Hong Kong, China (2011)Google Scholar
  9. 9.
    Kang, W.-H., Lee, S.-W., Moon, B.: Flash-based extended cache for higher throughput and faster recovery. Proc. VLDB Endowment 5(11), 1615–1626 (2012)CrossRefGoogle Scholar
  10. 10.
  11. 11.
    Harter, T., Dragga, C., Vaughn, M., et al.: A file is not a file: understanding the I/O behavior of apple desktop applications. In: 23rd ACM Symposium on Operating Systems Principles (SOSP), Cascais, Portugal (2011)Google Scholar
  12. 12.
    Chen, Y., Srinivasan, K., Goodson, G., Katz, R.: Design implications for enterprise storage systems via multi-dimensional trace analysis. In: 23rd ACM Symposium on Operating Systems Principles (SOSP), pp. 43–56. Cascais, Portugal (2011)Google Scholar
  13. 13.
    Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: a cross-industry study of mapreduce workloads. Proc. VLDB Endowment 5(12), 1802–1813 (2012)CrossRefGoogle Scholar
  14. 14.
    Krish, K.R., Anwar, A, Butt, A.R.: hatS: a heterogeneity-aware tiered storage for Hadoop. In: 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 502–511. Chicago, Illinois, USA (2014)Google Scholar
  15. 15.
    Ioannis, K., Stratis, V.: Flashing up the storage layer. Proc. VLDB Endowment 1(1), 514–525 (2008)CrossRefGoogle Scholar
  16. 16.
    Yang, P.-Y., Jin, P.-Q., Yue, L.-H.: A time-sensitive and efficient hybrid storage model involving SSD and HDD. Chin. J. Comput. 35(11), 2294–2305 (2012)CrossRefGoogle Scholar
  17. 17.
    Soundararajan, G., Prabhakaran, V., Balakrishnan, M., Wobber, T.: Extending SSD lifetimes with disk-based write caches. In: 8th USENIX Conference on File and Storage Technologies (FAST), Berkeley, USA (2010)Google Scholar
  18. 18.
    Lu, Y., Shu, J., Zheng, W.: Extending the lifetime of flash-based storage through reducing write amplification from file systems. In: 11th Conference on File and Storage Technologies (FAST), pp. 257–270. San, CA (2013)Google Scholar
  19. 19.
    Yang, Q., Ren, J.: I-CASH: intelligently coupled array of SSD and HDD. In: 17th International Conference on High-Performance Computer Architecture (HPCA), pp. 278–289. San Antonio, Texas (2011)Google Scholar
  20. 20.
    Chen, F., Koufaty, D., Zhang, X.: Hystor: making the best use of solid state drives in high performance storage systems. In: 25th International Conference on Supercomputing, pp. 22–32. Tuscon, Arizona, USA (2011)Google Scholar
  21. 21.
    He, S., Sun, X.-H., Feng, B.: S4D-cache: smart selective SSD cache for parallel I/O systems. In: 34th IEEE International Conference on Distributed Computing Systems (ICDCS), pp. 514–523. IEEE Press, Madrid, Spain (2014)Google Scholar
  22. 22.
    Wang, L., Zhan, J., Luo, C., et al.: BigDataBench: a big data benchmark suite from internet services. In: 20th IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 488–499. Orlando, Florida, USA (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Wei Zhou
    • 1
  • Dan Feng
    • 1
  • Zhipeng Tan
    • 1
    Email author
  • Yingfei Zheng
    • 1
  1. 1.School of Computer Science and Technology, Wuhan National Laboratory for OptoelectronicsHuazhong University of Science and TechnologyWuhanChina

Personalised recommendations