Abstract
As data-based studies continue to increase, the need for privacy protection has become a crucial issue. One proposed solution to address this obstacle is homomorphic encryption (HE); however, the complexity of handling ciphertexts used in HE poses a serious challenge due to the extended calculation time of elementary operations. As a result, it has much more complex than handling plaintexts, limiting various subsequent data analyses. This paper proposes a quantile estimation method for encrypted data, where quantiles are core statistics for understanding the data distribution in statistical analysis. We developed an HE-friendly method for large homomorphic encrypted data using an approximate quantile loss function. Numerical studies show that the proposed method significantly improves the calculation time for simulated and real homomorphically encrypted data. Specifically, the proposed method takes approximately 26 minutes for calculating a dataset of four million, which is about 14 times faster than the sorting method. Furthermore, we applied the proposed method to construct boxplots for homomorphically encrypted data.
Similar content being viewed by others
Availability of data and materials
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Adamović S, Miškovic V, Maček N, Milosavljević M, Šarac M, Saračević M, Gnjatović M (2020) An efficient novel approach for iris recognition based on stylometric features and machine learning techniques. Future Gener Comput Syst 107:144–157
Assran, M. and Rabbat, M. (2020). On the convergence of nesterov’s accelerated gradient method in stochastic settings. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org
Ben-Haim Y, Tom-Tov E (2010) A streaming parallel decision tree algorithm. J Mach Learn Res. 11:849–872
Brakerski Z, Gentry C, Vaikuntanathan V (2012) (leveled) fully homomorphic encryption without bootstrapping. Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS ’12. Association for Computing Machinery, New York, pp 309–325
Breckling J, Chambers R (1988) M-quantiles. Biometrika 75(4):761–771
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16. Association for Computing Machinery, New York, pp 785–794
Cheon JH, Kim A, Kim M, Song Y (2017) Homomorphic encryption for arithmetic of approximate numbers. In: Takagi T, Peyrin T (eds) Advances in Cryptology - ASIACRYPT 2017. Cham. Springer International Publishing, pp 409–437
Cheon JH, Kim W, Park JH (2022) Efficient homomorphic evaluation on large intervals. IEEE 17:2553–2568
Chillotti I, Gama N, Georgieva M, Izabachène M (2016) Faster fully homomorphic encryption: Bootstrapping in less than 0.1 seconds. In: Cheon JH, Takagi T (eds) Advances in Cryptology – ASIACRYPT 2016. Springer Berlin Heidelberg, Berlin, Heidelberg
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Halevi S, Rabin T (eds) Theory of Cryptography. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 265–284
Fan J, Vercauteren F (2012) Somewhat practical fully homomorphic encryption. IACR Cryptol. ePrint Arch. 2012:144
Flanders H (1973) Differentiation under the integral sign. The American Mathematical Monthly 80(6):615–627
Gentry C (2009) Fully homomorphic encryption using ideal lattices. Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, STOC ’09. Association for Computing Machinery, New York, pp 169–178
Huang H, Wang Y, Zong H (2022) Support vector machine classification over encrypted data. App Intell 52(6):5938–5948
Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46(1):33–50
Lee J-W, Kang H, Lee Y, Choi W, Eom J, Deryabin M, Lee E, Lee J, Yoo D, Kim Y-S, No J-S (2022) Privacy-preserving machine learning with fully homomorphic encryption for deep neural network. IEEE Access 10:30039–30054
Lee Y-J, Mangasarian OL (2001) Ssvm: A smooth support vector machine for classification. Comput Opt Appl 20(1):5–22
Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Indust Appl Math 11(2):431–441
Nakatani T, Huang S-T, Arden B, Tripathi S (1989) K-way bitonic sort. IEEE Trans Comput 38(2):283–288
Nesterov, Y. (2004). Introductory Lectures on Convex Optimization: A Basic Course. Springer Publishing Company, Incorporated, 1 edition
Norton RM (1984) The double exponential distribution: Using calculus to find a maximum likelihood estimator. Am Stat 38(2):135–136
Rivest RL, Adleman L, Dertouzos ML (1978) On data banks and privacy homomorphisms. Academia Press, Foundations of Secure Computation, pp 169–179
Rubin DB (1993) Statistical disclosure limitation. J Off. Stat 9:461–468
Saračević MH, Adamović SZ, Miškovic VA, Elhoseny M, Maček ND, Selim MM, Shankar K (2021) Data encryption for internet of things applications based on catalan objects and two combinatorial structures. IEEE Transact Reliabil 70(2):819–830
Tukey, J. W. (1977). Exploratory Data Analysis. Pearson
Zheng S (2011) Gradient descent algorithms for quantile regression with smooth approximation. Int J Mach Learn Cybernet 2(3):191–207
Acknowledgements
Hosik Choi was supported by the 2020 Research Fund of the University of Seoul. Cheolwoo Park’s work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (NRF-2021R1A2C1092925, NRF-2022M3J6A1063021). Sungchul Shin’s work was supported by Ministry of Land, Infrastructure and Transport (RS-2022-0144012).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that there is no conflict of interest regarding the publication of this article.
Ethical standard
The authors state that this research complies with ethical standards. This research does not involve either human participants or animals.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
The pseudo-code of the NAG method is described in Algorithm 1.
The pseudo-code of the Newton method is described in Algorithm 2.
1.1 Choosing the best epoch
As it can be challenging to monitor the objective value in HE, the number of epochs should be predefined appropriately. To determine the epoch number and examine the convergence of objective values, we utilized two methods with randomly generated data in plaintext. In Fig. 3, we present the trajectories of the epoch absolute errors at \(\tau =0.25\) for the NAG method (blue, dashed line) with \(\alpha =0.1\) and \(\eta =0.4\), and the Newton method (red, solid line) with \(\alpha =0.1\). Across multiple s values, the NAG method converges within 10 epochs, while the Newton method converges within 5 epochs, indicating that Newton converges faster than NAG. Based on this observation, we used 20 and 10 epochs for NAG and Newton, respectively, in our numerical study to ensure convergence. We find that this strategy works well.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Park, M., Kim, J., Shin, S. et al. Quantile estimation for encrypted data. Appl Intell 53, 24782–24791 (2023). https://doi.org/10.1007/s10489-023-04837-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04837-5