Building Your Private Cloud Storage on Public Cloud Service Using Embedded GPUs

  • Wangzhao Cheng
  • Fangyu Zheng
  • Wuqiong Pan
  • Jingqiang Lin
  • Huorong Li
  • Bingyu Li
Conference paper
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 254)


When the public cloud provides infrastructure as a service (IaaS), the customer can outsource its data to the public cloud and release itself from the burden of storing data locally. At this point, the customer can not guarantee the security of the data in the public cloud. Encrypting data before using cloud storage is a viable solution, but frequent data encryption operations cause the original limited local computing resources to be even more stretched. In this paper, we used Jetson TX1 to build a client-side data encryption device that proxies the customer’s data encryption and decryption operations. Firstly, a GPU-based SM4 implementation is carefully scheduled in the integrated GPU on Jetson TX1, including instruction-level optimization and variable improvement for data arrangement. Secondly, using zero-copy access on the device, we reduce the impact of explicit data transfer operations on overall performance. Finally, our SM4 kernel is capable of encrypting data at 30.30 Gbps on Jetson TX1, it is 26.6 times faster than the CPU-based implementation on the same platform. Furthermore, data processing throughput of the device reaches 30.19Gbps, a single Jetson TX1 owns sufficiently redundant computational power for the customer in 10 Gigabit fiber network environment.


Symmetric cryptographic algorithm Jetson TX1 CUDA SM4 implementation Virtual private cloud storage 


  1. 1.
    General Purpose Computation Using Graphics Hardware. Accessed 10 Dec 2014
  2. 2.
    Cheng, W., Zheng, F., Pan, W., Lin, J., Li, H., Li, B.: High-performance symmetric cryptography server with GPU acceleration. In: Qing, S., Mitchell, C., Chen, L., Liu, D. (eds.) ICICS 2017. LNCS, vol. 10631, pp. 529–540. Springer, Cham (2018). Scholar
  3. 3.
    Cook, D.L., Ioannidis, J., Keromytis, A.D., Luck, J.: CryptoGraphics: secret key cryptography using graphics cards. In: Menezes, A. (ed.) CT-RSA 2005. LNCS, vol. 3376, pp. 334–350. Springer, Heidelberg (2005). Scholar
  4. 4.
    CygnusX1. Default Pinned Memory Vs Zero-Copy Memory. (2017). Accessed 10 Dec 2014
  5. 5.
    Fleissner, S.: GPU-accelerated montgomery exponentiation. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007. LNCS, vol. 4487, pp. 213–220. Springer, Heidelberg (2007). Scholar
  6. 6.
    Fomin, D.B.: A timing attack on CUDA implementations of an AES-type block cipher. Mat. Vopr. Kriptogr. 7(2), 121–130 (2016)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Gibbs, S.: Dropbox hack leads to leaking of 68m user passwords on the internet. (2016). Accessed 8 Dec 2014
  8. 8.
    Gilger, J., Barnickel, J., Meyer, U.: GPU-acceleration of block ciphers in the OpenSSL cryptographic library. In: Gollmann, D., Freiling, F.C. (eds.) ISC 2012. LNCS, vol. 7483, pp. 338–353. Springer, Heidelberg (2012). Scholar
  9. 9.
    Jiang, Z.H., Fei, Y., Kaeli, D.: A complete key recovery timing attack on a GPU. In: 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 394–405. IEEE (2016)Google Scholar
  10. 10.
    Kamara, S., Lauter, K.: Cryptographic cloud storage. In: Sion, R., Curtmola, R., Dietrich, S., Kiayias, A., Miret, J.M., Sako, K., Sebé, F. (eds.) FC 2010. LNCS, vol. 6054, pp. 136–149. Springer, Heidelberg (2010). Scholar
  11. 11.
    Käsper, E., Schwabe, P.: Faster and timing-attack resistant AES-GCM. In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 1–17. Springer, Heidelberg (2009). Scholar
  12. 12.
    Kirk, D.B., Wen-Mei, W.H.: Programming Massively Parallel Processors: A Hands-on Approach. Morgan kaufmann, Burlington (2016)Google Scholar
  13. 13.
    Liu, F., Ji, W., Hu, L., Ding, J., Lv, S., Pyshkin, A., Weinmann, R.-P.: Analysis of the SMS4 block cipher. In: Pieprzyk, J., Ghodosi, H., Dawson, E. (eds.) ACISP 2007. LNCS, vol. 4586, pp. 158–170. Springer, Heidelberg (2007). Scholar
  14. 14.
    Luken, B.P., Ouyang, M., Desoky, A.H.: AES and DES encryption with GPU. In: ISCA PDCCS, pp. 67–70 (2009)Google Scholar
  15. 15.
    Manavski, S.A.: CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In: 2007 IEEE International Conference on Signal Processing and Communications. ICSPC 2007, pp. 65–68. IEEE (2007)Google Scholar
  16. 16.
    Mei, C., Jiang, H., Jenness, J.: CUDA-based AES parallelization with fine-tuned GPU memory utilization. In: 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), pp. 1–7. IEEE (2010)Google Scholar
  17. 17.
    Molina, B.: iCloud not breached in celebrity photo leak. (2014). Accessed 10 Dec 2014
  18. 18.
    Moss, A., Page, D., Smart, N.P.: Toward acceleration of RSA using 3D graphics hardware. In: Galbraith, S.D. (ed.) Cryptography and Coding 2007. LNCS, vol. 4887, pp. 364–383. Springer, Heidelberg (2007). Scholar
  19. 19.
    Nikolskiy, V.P., Stegailov, V.V., Vecher, V.S.: Efficiency of the Tegra k1 and x1 systems-on-chip for classical molecular dynamics. In: 2016 International Conference on High Performance Computing & Simulation (HPCS), pp. 682–689. IEEE (2016)Google Scholar
  20. 20.
    NVIDIA: Embedded Systems. Accessed 8 Dec 2014
  21. 21.
    NVIDIA: CUDA Toolkit Documentation v9.1.85. (2017). Accessed 10 Dec 2014
  22. 22.
    NVIDIA: Parallel Thread Execution ISA Version 6.1. (2017). Accessed 10 Dec 2014
  23. 23.
    Otterness, N., et al.: An evaluation of the nvidia tx1 for supporting real-time computer-vision workloads. In: 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 353–364. IEEE (2017)Google Scholar
  24. 24.
    Rizvi, S.T.H., Cabodi, G., Francini, G.: Optimized deep neural networks for real-time object classification on embedded GPUS. Appl. Sci. 7(8), 826 (2017)CrossRefGoogle Scholar
  25. 25.
    Wikipedia: Cloud computing. (2017). Accessed 10 Dec 2014
  26. 26.
    Wikipedia. GeForce 10 series. (2017). Accessed 10 Dec 2014
  27. 27.
    Wikipedia: Infrastructure as a service. (2017). Accessed 10 Dec 2014
  28. 28.
    Wikipedia: Parallel Thread Execution. (2017). Accessed 10 Dec 2014
  29. 29.
    Wikipedia: Tegra. (2017). Accessed 10 Dec 2014
  30. 30.
    Zheng, F., Pan, W., Lin, J., Jing, J., Zhao, Y.: Exploiting the floating-point computing power of GPUs for RSA. In: Chow, S.S.M., Camenisch, J., Hui, L.C.K., Yiu, S.M. (eds.) ISC 2014. LNCS, vol. 8783, pp. 198–215. Springer, Cham (2014). Scholar
  31. 31.
    Zheng, F., Pan, W., Lin, J., Jing, J., Zhao, Y.: Exploiting the potential of GPUs for modular multiplication in ECC. In: Rhee, K.-H., Yi, J.H. (eds.) WISA 2014. LNCS, vol. 8909, pp. 295–306. Springer, Cham (2015). Scholar

Copyright information

© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2018

Authors and Affiliations

  • Wangzhao Cheng
    • 1
    • 2
    • 3
  • Fangyu Zheng
    • 1
    • 2
  • Wuqiong Pan
    • 1
    • 2
  • Jingqiang Lin
    • 1
    • 2
  • Huorong Li
    • 1
    • 2
    • 3
  • Bingyu Li
    • 1
    • 2
    • 3
  1. 1.Data Assurance and Communication Security Research CenterBeijingChina
  2. 2.State Key Laboratory of Information SecurityInstitute of Information Engineering, CASBeijingChina
  3. 3.School of Cyber SecurityUniversity of Chinese Academy of SciencesBeijingChina

Personalised recommendations