Skip to main content
Log in

Did we personalize? Assessing personalization by an online reinforcement learning algorithm using resampling

  • Published:
Machine Learning Aims and scope Submit manuscript

Abstract

There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user’s context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this problem as it learns based on each user’s historical responses and uses that knowledge to personalize these decisions. However, to decide whether the RL algorithm should be included in an “optimized” intervention for real-world deployment, we must assess the data evidence indicating that the RL algorithm is actually personalizing the treatments to its users. Due to the stochasticity in the RL algorithm, one may get a false impression that it is learning in certain states and using this learning to provide specific treatments. We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity. We illustrate our methodology with a case study by analyzing the data from a physical activity clinical trial called HeartSteps, which included the use of an online RL algorithm. We demonstrate how our approach enhances data-driven truth-in-advertising of algorithm personalization both across all users as well as within specific users in the study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availibility

Under the current data policies for HeartSteps V2/V3, the research team cannot make the data publicly available.

Code availability

The code used for generating resampled trajectories and reproducing the plots is available at the following link.

Notes

  1. This is our informal working definition of personalization. In Sect. 2.3 we formally define personalization and ways to measure personalization.

  2. Details on this feature and others are provided further in Table 1.

  3. We remark that throughout this paper, we use resampling and resimulations interchangeably. In particular, our methodology resimulates user trajectories that we generate by resampling states and rewards using generative models, and resampling actions by re-running the RL algorithm.

  4. Note that the underlying problem might restrict the allowed actions depending on the value of s. For example, in HeartSteps, sending a notification is not allowed when the user is driving.

  5. We demonstrate the stability of our results to the choice of \(\delta\) in Appendix D. Also, note that the number of users with \(\texttt {Score\_int}_{1} \ge 0.9\) and \(\texttt {Score\_int}_{1} \le 0.1\) are 17 and 1 respectively; these counts are slightly obscured in the histogram Fig. 2.

References

  • Albers, N., Neerincx, M. A., & Brinkman, W.-P. (2022). Addressing people’s current and future states in a reinforcement learning algorithm for persuading to quit smoking and to be physically active. PLoS ONE, 17(12), 0277295.

    Article  Google Scholar 

  • Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2–3), 235–256. https://doi.org/10.1023/A:1013689704352

    Article  Google Scholar 

  • Auer, P., & Ortner, R. (2010). UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica, 61(1–2), 55–65. https://doi.org/10.1007/s10998-010-3055-6

    Article  MathSciNet  Google Scholar 

  • Bellman, R. (1957). A Markovian decision process. Journal of Mathematics and Mechanics, 6, 679–684.

    MathSciNet  Google Scholar 

  • Bibaut, A., Chambaz, A., Dimakopoulou, M., Kallus, N., & Laan, M. (2021). Post-contextual-bandit inference.

  • Boger, J., Poupart, P., Hoey, J., Boutilier, C., Fernie, G. R., & Mihailidis, A. (2005). A decision-theoretic approach to task assistance for persons with dementia. In: Kaelbling, L.P., Saffiotti, A. (eds.), IJCAI-05, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK, July 30–August 5, 2005, pp. 1293–1299. Professional Book Center, UK. http://ijcai.org/Proceedings/05/Papers/1186.pdf.

  • Boruvka, A., Almirall, D., Witkiewitz, K., & Murphy, S. A. (2018). Assessing time-varying causal effect moderation in mobile health. Journal of the American Statistical Association, 113(523), 1112–1121.

    Article  MathSciNet  Google Scholar 

  • Bousquet, O., & Elisseeff, A. (2002). Stability and generalization. The Journal of Machine Learning Research, 2, 499–526.

    MathSciNet  Google Scholar 

  • Buja, A., Cook, D., & Swayne, D. F. (1996). Interactive high-dimensional data visualization. Journal of Computational and Graphical Statistics. https://doi.org/10.2307/1390754

    Article  Google Scholar 

  • Dempsey, W., Liao, P., Klasnja, P., Nahum-Shani, I., & Murphy, S. A. (2015). Randomised trials for the fitbit generation. Significance, 12(6), 20–23. https://doi.org/10.1111/j.1740-9713.2015.00863.x

    Article  Google Scholar 

  • Ding, P., Feller, A., & Miratrix, L. (2016). Randomization inference for treatment effect variation. Journal of the Royal Statistical Society: Series B: Statistical Methodology, 78, 655–671.

    Article  MathSciNet  Google Scholar 

  • Dwaracherla, V., Lu, X., Ibrahimi, M., Osband, I., Wen, Z., & Roy, B. V. (2020). Hypermodels for exploration. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, Addis Ababa. https://openreview.net/forum?id=ryx6WgStPB.

  • Dwivedi, R., Tian, K., Tomkins, S., Klasnja, P., Murphy, S., & Shah, D. (2022). Counterfactual inference for sequential experiments.

  • Eckles, D., & Kaptein, M. (2019). Bootstrap thompson sampling and sequential decision problems in the behavioral sciences. SAGE Open. https://doi.org/10.1177/2158244019851675

    Article  Google Scholar 

  • Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. Monographs on Statistics and applied Probability. https://doi.org/10.1201/9780429246593

    Article  Google Scholar 

  • Elmachtoub, A. N., McNellis, R., Oh, S., & Petrik, M. (2017). A practical method for solving contextual bandit problems using decision trees. https://doi.org/10.48550/arxiv.1706.04687.

  • Fisher, R. A. (1935). The design of experiments. Oliver and Boyd.

    Google Scholar 

  • Forman, E. M., Berry, M. P., Butryn, M. L., Hagerman, C. J., Huang, Z., Juarascio, A. S., LaFata, E. M., Ontañón, S., Tilford, J. M., & Zhang, F. (2023). Using artificial intelligence to optimize delivery of weight loss treatment: Protocol for an efficacy and cost-effectiveness trial. Contemporary Clinical Trials, 124, 107029.

    Article  Google Scholar 

  • Forman, E. M., Kerrigan, S. G., Butryn, M. L., Juarascio, A. S., Manasse, S. M., Ontañón, S., Dallal, D. H., Crochiere, R. J., & Moskow, D. (2019). Can the artificial intelligence technique of reinforcement learning use continuously-monitored digital data to optimize treatment for weight loss? Journal of Behavioral Medicine, 42(2), 276–290.

    Article  Google Scholar 

  • Gelman, A. (2004). Exploratory data analysis for complex models. Journal of Computational and Graphical Statistics. https://doi.org/10.1198/106186004X11435

    Article  MathSciNet  Google Scholar 

  • Good, P. I. (2006). Resampling methods. Springer.

    Google Scholar 

  • Hadad, V., Hirshberg, D. A., Zhan, R., Wager, S., & Athey, S. (2019). Confidence intervals for policy evaluation in adaptive experiments.

  • Hanna, J. P., Stone, P., & Niekum, S. (2017). Bootstrapping with models: Confidence intervals for off-policy evaluation. In Proceedings of the international joint conference on autonomous agents and multiagent systems, AAMAS, vol. 1.

  • Hao, B., Abbasi-Yadkori, Y., Wen, Z., & Cheng, G. (2019). Bootstrapping upper confidence bound. In Advances in neural information processing systems, vol. 32.

  • Hao, B., Ji, X., Duan, Y., Lu, H., Szepesvari, C., & Wang, M. (2021). Bootstrapping fitted q-evaluation for off-policy inference. In Proceedings of the 38th international conference on machine learning, vol. 139.

  • Hoey, J., Poupart, P., Boutilier, C., & Mihailidis, A.(2005). POMDP models for assistive technology. In: Bickmore, T.W. (ed.), Caring machines: AI in Eldercare, Papers from the 2005 AAAI Fall Symposium, Arlington, Virginia, USA, November 4-6, 2005. AAAI Technical Report, vol. FS-05-02, pp. 51–58. AAAI Press, Washington, D.C. https://www.aaai.org/Library/Symposia/Fall/2005/fs05-02-009.php.

  • Liang, D., Charlin, L., McInerney, J., & Blei, D. M. (2016). Modeling user exposure in recommendation. In Proceedings of the 25th international conference on World Wide Web. WWW ’16, pp. 951–961. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE. https://doi.org/10.1145/2872427.2883090.

  • Liao, P., Greenewald, K., Klasnja, P., & Murphy, S. (2020). Personalized heartsteps: A reinforcement learning algorithm for optimizing physical activity. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4(1), 1–22.

    Article  Google Scholar 

  • Liao, P., Greenewald, K., Klasnja, P., & Murphy, S. (2020). Personalized heartsteps: A reinforcement learning algorithm for optimizing physical activity. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. https://doi.org/10.1145/3381007

    Article  Google Scholar 

  • Piette, J. D., Newman, S., Krein, S. L., Marinec, N., Chen, J., Williams, D. A., Edmond, S. N., Driscoll, M., LaChappelle, K. M., Maly, M., et al. (2022). Artificial intelligence (AI) to improve chronic pain care: Evidence of AI learning. Intelligence-Based Medicine, 6, 100064.

    Article  Google Scholar 

  • Qian, T., Yoo, H., Klasnja, P., Almirall, D., & Murphy, S. A. (2021). Estimating time-varying causal excursion effects in mobile health with binary outcomes. Biometrika, 108(3), 507–527.

    Article  MathSciNet  Google Scholar 

  • Ramprasad, P., Li, Y., Yang, Z., Wang, Z., Sun, W. W., & Cheng, G. (2021). Online bootstrap inference for policy evaluation in reinforcement learning.

  • Rosenbaum, P. (2002). Observational studies. Springer.

    Book  Google Scholar 

  • Russo, D., & Roy, B. V. (2014). Learning to optimize via information-directed sampling. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.), Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp. 1583–1591. https://proceedings.neurips.cc/paper/2014/hash/301ad0e3bd5cb1627a2044908a42fdc2-Abstract.html.

  • Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., & Wen, Z. et al (2018). A tutorial on thompson sampling. Foundations and Trends® in Machine Learning11(1):1–96.

  • Russo, D., Roy, B. V., Kazerouni, A., Osband, I., & Wen, Z. (2018). A tutorial on thompson sampling. Foundations and Trends in Machine Learning, 11(1), 1–96. https://doi.org/10.1561/2200000070

    Article  Google Scholar 

  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. IEEE Transactions on Neural Networks. https://doi.org/10.1109/tnn.1998.712192

    Article  Google Scholar 

  • Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285–294.

    Article  Google Scholar 

  • Tomkins, S., Liao, P., Yeung, S., Klasnja, P., & Murphy, S. (2019). Intelligent pooling in Thompson sampling for rapid personalization in mobile health.

  • Tukey, J. W. (1977). Exploratory data analysis (Vol. 2). Reading.

    Google Scholar 

  • Vapnik, V., & Chervonenkis, A. (1974). Theory of pattern recognition. Nauka.

    Google Scholar 

  • Wang, C.-H., Yu, Y., Hao, B., & Cheng, G. (2020). Residual bootstrap exploration for bandit algorithms.

  • White, M., & White, A. (2010). Interval estimation for reinforcement-learning algorithms in continuous-state domains. In Advances in neural information processing systems 23: 24th annual conference on neural information processing systems 2010, NIPS 2010.

  • Yang, J., Eckles, D., Dhillon, P., & Aral, S. (2020). Targeting for long-term outcomes. arXiv:2010.15835.

  • Yom-Tov, E., Feraru, G., Kozdoba, M., Mannor, S., Tennenholtz, M., & Hochberg, I. (2017). Encouraging physical activity in patients with diabetes: Intervention using a reinforcement learning system. Journal of Medical Internet Research, 19(10), 338.

    Article  Google Scholar 

  • Yom-Tov, E., Feraru, G., Kozdoba, M., Mannor, S., Tennenholtz, M., & Hochberg, I. (2017). Encouraging physical activity in patients with diabetes: Intervention using a reinforcement learning system. Journal of Medical Internet Research. https://doi.org/10.2196/JMIR.7994

    Article  Google Scholar 

  • Zhang, K. W., Janson, L., & Murphy, S. A. (2020). Inference for batched bandits.

  • Zhang, K. W., Janson, L., & Murphy, S. A. (2023). Statistical inference after adaptive sampling for longitudinal data.

  • Zhou, M., Mintz, Y., Fukuoka, Y., Goldberg, K., Flowers, E., Kaminsky, P. M., Castillejo, A., & Aswani, A. (2018). Personalizing mobile fitness apps using reinforcement learning. In: Said, A., Komatsu, T. (eds.), Joint Proceedings of the ACM IUI 2018 Workshops Co-located with the 23rd ACM Conference on Intelligent User Interfaces (ACM IUI 2018), Tokyo, Japan, March 11. CEUR Workshop Proceedings, vol. 2068. CEUR-WS.org, Tokyo (2018). https://ceur-ws.org/Vol-2068/humanize7.pdf.

Download references

Funding

RK is supported by NIGMS Biostatistics Training Grant Program under Grant No. T32GM135117. SG and SM acknowledge support by NIH/NIDA P50DA054039, and NIH/NIBIB and OD P41EB028242. RD acknowledges support by NSF DMS-2022448, and DSO National Laboratories grant DSO-CO21070. RD and SM also acknowledge support by NSF under Grant No. DMS-2023528 for the Foundations of Data Science Institute (FODSI). PK acknowledges support by NIH NHLBI R01HL125440 and 1U01CA229445. SM also acknowledges support by NIH/NCI U01CA229437, and NIH/NIDCR UH3DE028723. KWZ is supported by the Siebel Foundation and by NSF CBET-2112085 and by the NSF Graduate Research Fellowship Program under Grant No. DGE1745303.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to conceptualization and methodology. Data Preparation and Software: PC, PL, RK, and SG. Analysis: RK. Writing: RK, RD, SM, and SG. Funding and Administration: PK, SM. Supervision: RD, KZ, SM. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Raphael Kim, Raaz Dwivedi or Susan Murphy.

Ethics declarations

Conflict of interest

KWZ worked as a summer intern at Apple. The other authors declare no conflict of interest.

Ethical approval

We adhere to the policies outlined in https://www.springer.com/gp/editorial-policies/ethical-responsibilities-of-authors.

Consent to participate

The reported study was approved by the Kaiser Permanente Washington Institutional Review Board (IRB 1257484-16). All participants completed a written informed consent to take part in the study.

Consent for publication

Adhering to the IRB and consent forms, individual level data that could be used for identification are not released.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Work done by Peng Liao and Prasidh Chhabria authors while they were at Harvard University.

Editors: Emma Brunskill, Minmin Chen, Omer Gottesman, Lihong Li, Yuxi Li, Yao Liu, Zonging Lu, Niranjani Prasad, Zhiwei Qin, Csaba Szepesvari, Matthew Taylor.

Appendices

Details on the RL framework for HeartSteps

We now provide further details regarding the RL framework for the HeartSteps study discussed in Sect. 3.2 (Table 1).

Table 1 Description of state features used in the HeartSteps study

Clipping probabilities

The function h appearing in (10) is given by

$$\begin{aligned} h(p) \triangleq \textrm{min}\{ 0.8, 0.2+ \frac{0.8}{0.5} \cdot \textrm{max}\{ p-0.5,0 \} \} \quad \text {for}\quad p \in [0, 1]. \end{aligned}$$
(13)

Prior and posterior formulation

Using the notation \(\theta ^\top \triangleq (\alpha _0^\top , \alpha _1^\top , \beta ^\top )\), the prior for \(\theta\) was specified as \(\mathcal {N}(\mu _0, \Sigma _0)\), where

$$\begin{aligned} \overline{\mu }_0 = \begin{bmatrix} \mu _{\alpha _0} \\ \mu _{\beta } \\ \mu _{\beta }\end{bmatrix} \quad \text {and}\quad \overline{\Sigma }_0 = \begin{bmatrix} \Sigma _{\alpha _0} &{}&{} \\ {} &{} \Sigma _{\beta } &{}\\ {} &{} &{}\Sigma _{\beta } \end{bmatrix} \end{aligned}$$
(14)

and \(\{ \mu _{\alpha _0}, \mu _\beta , \Sigma _{\alpha _0}, \Sigma _{\beta }\}\) were computed from the prior study (see Liao et al. 2020, Sect. 6) for details on how priors were constructed and (17) for the specific values used by us; note that the HeartSteps team decided to update the priors used from those those presented in Liao et al. (2020)). Given the Gaussian prior and the Gaussian working model, the posterior for \(\theta\) on the day d is also Gaussian and is given by \(\mathcal {N}(\overline{\mu }_d, \overline{\Sigma }_d)\), where these posterior parameters are recursively updated as

$$\begin{aligned} \overline{\Sigma }_{d}&= \left( \frac{1}{\sigma ^2} \sum _{t=5(d-1)}^{5d} I_t \phi (S_t, A_t) \phi (S_t,A_t)^\top + \overline{\Sigma }_{d-1}^{-1} \right) ^{-1}, \quad \text {and}\quad \end{aligned}$$
(15a)
$$\begin{aligned} \overline{\mu }_{d}&= \overline{\Sigma }_d\left( \frac{1}{\sigma ^2} \sum _{t=5(d-1)}^{5d} I_t \phi (S_t, A_t) R_t + \overline{\Sigma }_{d-1}^{-1} \overline{\mu }_{d-1} \right) \end{aligned}$$
(15b)

where \(\phi (S_t, A_t)^\top \triangleq [g(S_t)^\top , \pi _t f(S_t)^\top , (A_t-\pi _t)f(S_t)^\top ]\) collects all the feature vectors from the working model (8). The updates (15a) and (15b) are denoted by PosteriorUpdate in Algorithm 2. For k-dimensional \(\beta\), the posterior parameters \(\mu _{d, \beta }, \Sigma _{d, \beta }\) for \(\beta\) are respectively given by the last k entries of \(\overline{\mu }_d\) and \(k\times k\) sub-matrix formed by taking the last k columns and rows of \(\overline{\Sigma }_d\).

Model estimates used by ParaSim for resampling trajectories

For a user trajectory \((S_{t}, A_{t}, R_{t})_{t= 1}^{T}\) from the reward model (8), we estimate the parameters \((\alpha , \beta )\) using the updates (15) albeit without action centering. Hence the estimates \((\hat{\alpha }_T ^\top , \hat{\beta }_T^\top )\) (that inform the model parameters used by ParaSim after suitable modifications in Sect. 3.3) are given by

$$\begin{aligned} \begin{bmatrix} \hat{\alpha }_{T} \\ \hat{\beta }_{T} \end{bmatrix}&= \left( \frac{1}{\sigma ^2} \sum _{t=1}^{T} I_t \widetilde{\phi }(S_t, A_t) \widetilde{\phi }(S_t,A_t)^\top + \begin{bmatrix} \Sigma _{\alpha _0} &{}&{} \\ {} &{} \Sigma _{\beta } &{} \end{bmatrix} ^{-1}\right) ^{-1}\nonumber \\&\quad \left( \frac{1}{\sigma ^2} \sum _{t=1 }^{T} I_t \widetilde{\phi }(S_t, A_t) R_t \!+\! \begin{bmatrix} \Sigma _{\alpha _0} &{}&{} \\ {} &{} \Sigma _{\beta } &{} \end{bmatrix} ^{-1}\begin{bmatrix} \mu _{\alpha _0} \\ \mu _{\beta } \end{bmatrix} \right) \end{aligned}$$
(16)

where \(\widetilde{\phi }(S_t, A_t)^\top \triangleq \left[ g(S_t)^\top , A_t f(S_t)^\top \right]\).

Priors means and variances

We now summarize the exact values of prior parameters used by the RL algorithm. For \(\ell \in \mathbb {N}\), let \(\textrm{diag}(a_1, \ldots , a_{\ell }) \in \mathbb {R}^{\ell \times \ell }\) denote an \(\ell \times \ell\) diagonal matrix with its j-th diagonal entry equal to \(a_j\) for \(j=1, \ldots , \ell\). Then the mean and variance parameters used in (14) are given by

$$\begin{aligned} \begin{aligned} \mu _{\alpha _0}&= [0.82, 1.95, 3.81, -0.19, 0.76, 0.0, -0.92, 0.0]^\top \in \mathbb {R}^{7}, \\ \mu _{\beta }&= [0.47, 0.0, 0.0, 0.0, 0.0]^\top \in \mathbb {R}^{5}, \\ \Sigma _{\alpha _0}&= \textrm{diag}(14.24, 13.35, 3.24, 0.57, 19.00, 0.26, 17.00, 7.35) \in \mathbb {R}^{7\times 7}, \quad \text {and}\quad \\ \Sigma _{\beta }&= \textrm{diag}(4.93, 24.56, 4.95, 0.67, 0.82) \in \mathbb {R}^{5\times 5}, \end{aligned} \end{aligned}$$
(17)

where the features of \(\alpha _0\) are ordered as (Intercept, temperature, prior 30 min step count, yesterday step count, \(\texttt {dosage}\), \(\texttt {engagement}\), \(\texttt {location}\), \(\texttt {variation}\)) and that for \(\beta\) and \(\alpha _1\) are ordered as (Intercept, \(\texttt {dosage}\), \(\texttt {engagement}\), \(\texttt {location}\), \(\texttt {variation}\)).

Details on \(\texttt {Score\_int}_{}\) computation for HeartSteps

We now describe the smoothened version of \(\texttt {Score\_int}_{1}\) (1) and \(\texttt {Score\_int}_{2,\textsf{z}}\) (3) that we use to add stability in our HeartSteps results in Sects. 3.4 and 3.5.

At a high level, we use the following steps: (i) We use moving windows to average out the advantage forecasts and use an averaged forecast on a daily scale, i.e., for \(d=\lceil {(t-1)/5}\rceil\) and not for each decision time \(t\in [T]\) as in (1) and (3). (iii) While computing the interestingness scores, we omit days when the quality of data is not good due to low availability or low diversity of features. (iii) We compute the forecasts without changing any state feature from the observed data other than \(\texttt {dosage}\). That is, if the observed feature value \(\textsf{z} _t=1\), we do not compute a counterfactual forecast by artificially forcing \(\textsf{z} _t = 0\). (iv) Finally, we do not consider a user’s interestingness score if they have only a few days of good data.

Fig. 10
figure 10

Histogram of fraction of good days for the users in the original data for HeartSteps. In panel (a), the count on the vertical axis represents the number of users with the value of \(\frac{1}{D} \sum _{d=1}^{D} G_{d,1 }\) on the horizontal axis. In panels (b) to (d), the count on the vertical axis represents the number of users with the value of \(\frac{1}{D} \sum _{d=1}^{D} G_{d,2,\textsf{z}}\) on the horizontal axis, respectively for \(\textsf{z} \in \{ \texttt {variation}, \texttt {location}, \texttt {engagement} \}\)

We now describe these steps in detail for a user with total T decision times in their data trajectory (with total \(D\triangleq \lfloor {T/5}\rfloor\) days of data). We note that total decision times for each user might vary.

  1. 1.

    Sliding window: For each day \(d \in \{ 1, \ldots , D\}\), we define a sliding window \(W_d\) using all 5 decision times on day d when computing \(\texttt {Score\_int}_{1}\) and all 5 decision times on days \(\{ d-1, d, d+1\}\) (total 15 decision times) when computing \(\texttt {Score\_int}_{2, \textsf{z}}\). That is,

    $$\begin{aligned} W_d \triangleq {\left\{ \begin{array}{ll} \{ 5(d-1), 5(d-1)+1, \ldots , 5d\} \cap [T] &{}\quad \text {for}\quad \texttt {Score\_int}_{1}, \\ \{ 5(d-1)+1, 5(d-1)+2, \ldots , 5(d+1)\} \cap [T] &{}\quad \text {for}\quad \texttt {Score\_int}_{2, \textsf{z}}. \end{array}\right. } \end{aligned}$$
  2. 2.

    Characterizing good data day: Next, when considering \(\texttt {Score\_int}_{1}\), we define an indicator variable \(G_{d, 1}\) to denote a good day. It is set to 1 if the following two conditions hold: (a) if the user was available for at least 2 decision times in \(W_d\), i.e., \(\sum _{t\in W_d} I_t \ge 2\) and (b) the RL algorithm posterior was updated on the night of day \(d-1\); we impose this additional constraint to deal with the real-time and missing data update issues. We set \(G_{d, 1}=0\) in all other cases. When considering \(\texttt {Score\_int}_{2, \textsf{z}}\), we define the variable \(G_{d, 2, \textsf{z}}\) ta denote a good data day based on whether the user’s observed states exhibit enough diversity in the value of the variable \(\textsf{z}\) for the decision times in \(W_d\). In particular, we set \(G_{d, \textsf{z}}=1\) when the following two conditions hold: (a) the feature \(\textsf{z}\) takes values 1 and 0 at least twice out of the decisions times in \(W_d\) when the user was available for randomization (\(I_t=1\)), i.e.,

    $$\begin{aligned} \sum _{t\in W_d} I_t \textsf{z} _t \ge 2 \quad \text {and}\quad \sum _{t\in W_d} I_t (1-\textsf{z} _t) \ge 2, \end{aligned}$$

    where \(\textsf{z} _t\) denotes the value of the variable \(\textsf{z}\) for the user at decision time t, and (b) the RL algorithm posterior was updated on the night of at least one of the days in \(\{ d-1, d, d+1\}\). In all other cases, we set \(G_{d,\textsf{z}}=0\).

  3. 3.

    Interestingness score for a user trajectory: We consider a user for interestingness only if the fraction of good days is greater than a certain threshold, i.e.,

    $$\begin{aligned} \frac{1}{D} \sum _{d=1}^{D} G_{d,1 } \ge (1-\gamma ) \ \text {for}\ \texttt {Score\_int}_{1} \quad \text {or}\quad \frac{1}{D} \sum _{d=1}^{D} G_{d, 2, \textsf{z}} \ge (1-\gamma ) \ \text {for}\ \texttt {Score\_int}_{2, \textsf{z}}, \end{aligned}$$
    (18)

    for a suitable \(\gamma \in (0, 1)\). (Note that increasing the value of \(\gamma\) lowers the cutoff for a user to become eligible for being considered for interestingness.) For such a user, we define the interestingness scores as follows:

    $$\begin{aligned} \texttt {Score\_int}_{1} (\mathcal {U})&\triangleq \frac{1}{\sum _{d=1}^{D} G_{d, 1}} \sum _{d=1}^{D} G_{d, 1} \varvec{1}\left( \frac{\sum _{t\in W_d} I_t \hat{\Delta }_t(S_t) }{\sum _{t\in W_d}I_t}> 0 \right) \\ \texttt {Score\_int}_{2, \textsf{z}} (\mathcal {U})&\triangleq \frac{1}{\sum _{d=1}^{D} G_{d, 2, \textsf{z}}} \sum _{d=1}^{D} G_{d, 2, \textsf{z}} \varvec{1}\left( \frac{\sum _{t\in W_d} I_t \textsf{z} _t \hat{\Delta }_t(S_t) }{\sum _{t\in W_d}I_t \textsf{z} _t} > \frac{\sum _{t\in W_d}I_t (1-\textsf{z} _t) \hat{\Delta }_t(S_t) }{\sum _{t\in W_d}I_t (1-\textsf{z} _t)} \right) , \end{aligned}$$

    where we multiply by indicators \(G_{d, 1}\) and \(G_{d, 2, \textsf{z}}\) to include only “good days” in our score computations. Note that \(\frac{\sum _{t\in W_d} I_t \textsf{z} _t \hat{\Delta }_t(S_t) }{\sum _{t\in W_d}I_t \textsf{z} _t}\) is the stable proxy (without counterfactual imputation) for the quantity \(\hat{\Delta }_{t}(S_t(\textsf{z} =1))\) in (3).

Remark 3

Note that we omit all users from our results who do not satisfy the good day requirement (18). The number of omitted users depends on the value of \(\gamma\); see Fig. 10 for the histogram of \(\sum _{d}G_{d, 1}/D\) and \(\sum _{d}G_{d, 2, \textsf{z}}/D\) across the 91 users in HeartSteps. For the results in Sects. 3.4 and 3.5, we use \(\gamma =0.4\), that allows 63, 60, 12, and 43 to be considered respectively for interestingness of type 1, and of type 2 for feature \(\texttt {variation}\), \(\texttt {location}\), and \(\texttt {engagement}\).

Another look at user 2’s advantage forecasts

Figure 11a reproduces the advantage forecasts for user 2 from Fig. 1b. In addition, panels (b) and (c) of Fig. 11 show the analogs of the panel (a), where user 2’s standardized advantage forecasts are color-coded based on the values of \(\texttt {location}\) and \(\texttt {engagement}\) respectively. Overall, we observe from the three panels in Fig. 11 that user 2 does not appear interesting of type 2 for \(\textsf{z} \in \{ \texttt {location}, \texttt {engagement} \}\) since the standardized advantage forecasts are not well separated when \(\textsf{z} = 0\) versus \(\textsf{z} = 1\) like that for \(\textsf{z} = \texttt {variation}\) in panel (a) (or equivalently in Fig. 1b). In particular, for this user, we have \(\texttt {Score\_int}_{2, \texttt {location}} = 0.38\), and \(\texttt {Score\_int}_{2, \texttt {engagement}} =0.38\), while \(\texttt {Score\_int}_{2,\texttt {variation}} =0\).

Fig. 11
figure 11

User 2’s standardized advantage forecasts from Fig. 1b, color-coded by values of \(\textsf{z}\) = \(\texttt {variation}\), \(\texttt {location}\), and \(\texttt {engagement}\) in panels (a), (b), and (c) respectively. The value on the vertical axis represents the RL algorithm’s forecast of the standardized advantage of sending an activity message for the user if the user was available for sending a message on the day marked on the horizontal axis. (Note each day has 5 decision times.) The forecasts are marked as blue circles based on \(\textsf{z}\) = 1 and red triangles if \(\textsf{z}\) = 0 at the decision time. Panels (a) to (c) exhibit, respectively, \(\texttt {Score\_int}_{2, \texttt {variation}} =0\), \(\texttt {Score\_int}_{2, \texttt {location}} =0.38\), and \(\texttt {Score\_int}_{2, \texttt {engagement}} =0.38\). Note that we reproduced Fig. 1b in panel (a) for the reader’s convenience, and the three panels plot the same data and differ only in the color coding (Color figure online)

Deeper dive into interestingness of type 1 for HeartSteps

To further refine the conclusions from Fig. 3, we consider one-sided variants of the definition (2) for the number of interesting users. For reader’s convenience, we reproduce Fig. 3 in panel (a) Fig. 12, besides the corresponding results with one-sided interesting user counts, defined by counting the users with \(\texttt {Score\_int}_{1} \ge 0.9\) and \(\texttt {Score\_int}_{1} \le 0.1\) separately in panels (b) and (c).

Fig. 12
figure 12

Histogram of the number of interesting users of type 1, \(\texttt {\#User\_int}_{1}\) (2) and its one-sided variant across 500 trials. Here panel (a) with \(\texttt {\#User\_int}_{1} \triangleq \sum _{i=1}^n \varvec{1}(|\texttt {Score\_int}_{1} -0.5|\ge 0.4)\) reproduces Fig. 3 for the reader’s convenience. In panels (b) and (c), the one-sided interesting users of type 1 are defined as \(\texttt {\#User\_int}_{1} ^+ \triangleq \sum _{i=1}^n \varvec{1}(\texttt {Score\_int}_{1} \ge 0.9)\) and \(\texttt {\#User\_int}_{1} ^- \triangleq \sum _{i=1}^n \varvec{1}(\texttt {Score\_int}_{1} \le 0.4)\) respectively. The trial data is the same as that in Fig. 3. That is, each is composed of 63 resampled trajectories. The trajectories are generated such that the true advantage is zero. The proportions on the vertical axis represent the fraction of the 500 trials, with the value of \(\texttt {\#User\_int}_{1}, \texttt {\#User\_int}_{1} ^+,\) and \(\texttt {\#User\_int}_{1} ^-\) on the horizontal axis. The vertical blue dashed line in each panel marks the corresponding interesting user count observed in the original data

From Fig. 12b, we find that in the original data 17 users exhibit \(\texttt {Score\_int}_{1} \ge 0.9\); we denote this user count by \(\texttt {\#User\_int}_{1} ^+\). However, the value of \(\texttt {\#User\_int}_{1} ^+\) is always significantly smaller than 17 across the 500 trials with resampled trajectories. In Table 2, we denote this analysis as Type \(1^+\).

Table 2 Summary of results from our resampling-based exploratory data analyses for HeartSteps data

On the other hand, Fig. 12c shows that one user exhibits \(\texttt {Score\_int}_{1} \le 0.1\) in the original data; we denote this count by \(\texttt {\#User\_int}_{1} ^-\). We also find that all 500 trials have \(\texttt {\#User\_int}_{1} ^->1\). In Table 2, we denote this analysis as Type \(1^-\).

Overall, we conclude that the data presents evidence in favor that the RL algorithm is potentially personalizing by learning that many users benefit from sending an activity message. However, many users might exhibit \(\texttt {Score\_int}_{1} \le 0.1\) and it might appear that sending the message is less beneficial than not sending for these users, just due to algorithmic stochasticity. Consequently, the value of \(\texttt {\#User\_int}_{1}\)—the number of interesting users with \(|\texttt {Score\_int}_{1}-0.5| \ge 0.4\), which is also equal to \(\texttt {\#User\_int}_{1} ^+ + \texttt {\#User\_int}_{1} ^-\)—can be as high as 18 (the observed value in the original data) due to algorithmic stochasticity.

Stability of conclusions with respect to the choice of \((\delta , \gamma )\)

Next, we investigate the stability of the claims made above for \(\texttt {Score\_int}_{1}\) and the one-sided variants to the choice of hyper-parameters \(\delta\) and \(\gamma\), appearing in (2) and (18), respectively. Note that for a given definition of \(\texttt {Score\_int}_{}\), increasing \(\gamma\) in (18) for a fixed \(\delta\) in (2) allows more users to become eligible for being considered as interesting, both in original data and the resampled trials. Similarly, decreasing \(\delta\) in (2) for a fixed \(\gamma\) in (18) would typically lead to more number of interesting users, both in original data and the resampled trials.

The results of this exploration for the choices \(\delta \in \{ 0.35, 0.40, 0.45\}\), and \(\gamma \in \{ 0.65, 0.70, 0.75\}\) are presented in Fig. 14. For a given panel, the value in the cell corresponding to the value of \(\delta\) on the horizontal axis and \(\gamma\) on the vertical axis is equal to the fraction of the 500 trials for which the number of interesting users \(\texttt {\#User\_int}_{}\) computed using those hyperparameter choices was as at least as large that in was at least as large as that in the original data. Looking at Fig. 14b, c, we find the conclusions drawn from Fig. 12b, c with \((\delta , \gamma )=(0.4, 0.75)\) about \(\texttt {\#User\_int}_{1} ^+\) and \(\texttt {\#User\_int}_{1} ^-\) remains stable even if we slightly perturb the values of \(\delta\) and \(\gamma\). In particular, across the \(3\times 3\) choices for \(\delta\) and \(\gamma\), the value of \(\texttt {\#User\_int}_{1} ^+\) would not appear as high as the observed value in the original data just by chance. On the other hand, the value of \(\texttt {\#User\_int}_{1} ^-\) might appear higher than the observed value in the original data simply due to algorithmic stochasticity. Given the competing nature of these two quantities and the fact that \(\texttt {\#User\_int}_{1} = \texttt {\#User\_int}_{1} ^+ + \texttt {\#User\_int}_{1} ^-\), the resulting fraction of trials with a count at least as high as \(\texttt {\#User\_int}_{1}\) in the original data is quite sensitive to the particular choice of \((\delta , \gamma )\) as Fig. 13a illustrates.

Fig. 13
figure 13

Stability of conclusions from Figs. 3 and 12 for interestingness of type 1 with respect to the choice of hyperparameters \((\delta , \gamma )\). Panels (a) to (c) respectively plot the results for the number of interesting users of type 1, \(\texttt {\#User\_int}_{1} \triangleq \sum _{i=1}^n \varvec{1}(|\texttt {Score\_int}_{1} -0.5|\ge \delta )\), and its one-sided variants, namely \(\texttt {\#User\_int}_{1} ^+\triangleq \sum _{i=1}^n \varvec{1}(\texttt {Score\_int}_{1} \ge 0.5+\delta )\), and \(\texttt {\#User\_int}_{1} ^-\triangleq \sum _{i=1}^n \varvec{1}(\texttt {Score\_int}_{1} \le 0.5-\delta )\). For a given panel, the value in the cell corresponding to \(\delta\) on the horizontal axis and \(\gamma\) on the vertical axis is equal to the fraction of the 500 trials for which the number of interesting users was at least as large as that in the original data. Recall that the full histogram of \(\texttt {\#User\_int}_{1},\texttt {\#User\_int}_{1} ^+,\) and \(\texttt {\#User\_int}_{1} ^-\) across these 500 trials in Figs. 3 and 12 correspond to the choice of \(\delta =0.4\) and \(\gamma =0.75\)

Stability of HeartSteps results for interestingness of type 2

We perform stability analysis for \(\texttt {\#User\_int}_{2,\textsf{z}}\) with respect to the choice of \((\delta , \gamma )\) similarly to that done for \(\texttt {\#User\_int}_{1}\) above in Fig. 13 and provide the results in Fig. 14.

Panels (a), (b), and (c) display the results, respectively, for \(\textsf{z} = \texttt {variation}, \texttt {location}\), and \(\texttt {engagement}\). In a given panel, the value in the cell corresponding to the value of \(\delta\) on the horizontal axis and \(\gamma\) on the vertical axis is equal to the fraction of the 500 trials for which the number of interesting users \(\texttt {\#User\_int}_{2, \textsf{z}}\) computed using those hyperparameter choices was as at least as large that in the original data. Across the \(3\times 3\) choices for \(\delta\) and \(\gamma\), we notice that for interestingness of type 2 for the features \(\texttt {variation}\), \(\texttt {location}\), and \(\texttt {engagement}\), the fraction remains stable around 0, 0, and 1, same as the fraction in Fig. 7 for \((\delta , \gamma )=(0.4, 0.75)\).

Fig. 14
figure 14

Stability of conclusions from Fig. 7 for interestingness of type 2 with respect to the choice of hyperparameters \((\delta , \gamma )\). Panels (a) to (c) respectively plot the results for \(\texttt {\#User\_int}_{2,\textsf{z}}\) for interestingness of type 2 for feature \(\textsf{z} \in \{ \texttt {variation}, \texttt {location}, \texttt {engagement} \}\). For a given panel, the value in the cell corresponding to \(\delta\) on the horizontal axis and \(\gamma\) on the vertical axis is equal to the fraction of the 500 trials for which the number of interesting users \(\texttt {\#User\_int}_{2, \textsf{z}}\) was at least as large as that in the original data. Recall that the full histogram of \(\texttt {\#User\_int}_{2,\textsf{z}}\) across these 500 trials in Fig. 7 corresponds to the choice of \(\delta =0.4\) and \(\gamma =0.75\)

Interesting users of type 2 for \(\texttt {location}\) and \(\texttt {engagement}\)

We now demonstrate the analysis (like in Figs. 1b and  8) for two different users, who exhibit potential interestingness of type 2 for \(\texttt {location}\) and \(\texttt {engagement}\).

A potentially interesting user of type 2 for \(\texttt {location}\)

Fig. 15 displays the advantage forecasts for a user, who we call user 3, to distinguish them from the two users associated with Fig. 1. The three panels in Fig. 15 plot user 3’s advantages color-coded by the value of the three features for that user. We find that this user admits \(\texttt {Score\_int}_{2, \texttt {variation}} =0.68\), \(\texttt {Score\_int}_{2, \texttt {location}} =0\), and \(\texttt {Score\_int}_{2, \texttt {engagement}} =0.43\)—so that this user would be deemed potentially interesting of type 2 for \(\texttt {location}\) (and not other features) as per our definition (4).

Fig. 15
figure 15

Standardized advantage forecasts of user 3, an interesting user of type 2 for \(\texttt {location}\), color-coded by \(\textsf{z}\) = \(\texttt {variation}\), \(\texttt {location}\), and \(\texttt {engagement}\) in panels (a), (b), and (c) respectively. The value on the vertical axis represents the RL algorithm’s forecast of the standardized advantage of sending an activity message for the user if the user was available for sending a message on the day marked on the horizontal axis. (Note each day has 5 decision times.) The forecasts are marked as blue circles based on \(\textsf{z}\) = 1 and red triangles if \(\textsf{z}\) = 0 at the decision time. Panels (a) to (c) exhibit, respectively, \(\texttt {Score\_int}_{2, \texttt {variation}} =0.68\), \(\texttt {Score\_int}_{2, \texttt {location}} =0\), and \(\texttt {Score\_int}_{2, \texttt {engagement}} =0.43\). Note that the three panels plot the same data and differ only in the color coding

Next, we evaluate how likely the user graph in Fig. 15b would appear just by chance. Panels (a) and (b) of Fig. 16 visualize two resampled trajectories of user 3 (chosen uniformly at random from user 3’s 500 resampled trajectories) generated under the generative model that there is no differential advantage of sending a message based on the value of \(\texttt {location}\). The color coding is as in Fig. 15b, namely, the forecasts are marked in red triangles if \(\texttt {location}\) = 1 and blue circles if \(\texttt {location}\) = 0. In panel (c) of Fig. 16, we plot the histogram for the \(\texttt {Score\_int}_{2, \texttt {location}}\) for this user across all 500 resampled trajectories and denote the observed value in the original data as a vertical dotted line.

Figure 16a, b show that the resampled trajectories do not appear interesting of type 2 for \(\texttt {location}\) as in Fig. 15b; the two trajectories, respectively, have \(\texttt {Score\_int}_{2,\texttt {variation}} =\) 0.50, and 0.42. Moreover, panel (c) shows that the interestingness score of 0, which was observed for user 3 in the original data (Fig. 15b), never appears across any of the resampled trajectories. Thus we can conclude that the data presents evidence that the RL algorithm potentially personalized for user 3 by learning to treat the user based on \(\texttt {location}\) differentially, and this personalization would not likely arise simply due to algorithmic stochasticity.

Fig. 16
figure 16

Resampling results for user 3 considered in Fig. 15, whose original trajectory exhibits interestingness of type 2 for \(\texttt {location}\). Panels (a) and (b) plot two randomly chosen (out of 500) resampled trajectories generated with zero advantage; the two trajectories, respectively, have \(\texttt {Score\_int}_{2,\texttt {location}} =\) 0.50, and 0.42. In panel (c), the vertical axis represents the fraction of the 500 resampled trajectories for this user with the value of \(\texttt {Score\_int}_{2,\texttt {location}}\) on the horizontal axis; and the vertical blue dashed line marks the observed \(\texttt {Score\_int}_{2,\texttt {location}}\) (value 0) for this user

A potentially interesting user of type 2 for \(\texttt {engagement}\)

Fig. 17 displays the advantage forecasts for a user, who we call user 4, to distinguish them from the three users associated with Figs. 1 and 15. The three panels in Fig. 17 plot user 4’s advantages color-coded by the three features; which admit \(\texttt {Score\_int}_{2, \texttt {variation}} =0.65\), \(\texttt {Score\_int}_{2, \texttt {location}} =0.9\), and \(\texttt {Score\_int}_{2, \texttt {engagement}} =0.037\), respectively. Thus based on our definition (4), this user is potentially interesting of type 2 for \(\texttt {engagement}\), but not for \(\texttt {variation}\). The user does not qualify the criterion (\(\gamma = 0.75\) in (18)) for being considered as a potentially interesting user for \(\texttt {location}\) due to a lack of diversity in the values taken by its \(\texttt {location}\) feature.

Fig. 17
figure 17

Standardized advantage forecasts of user 4, an interesting user of type 2 for \(\texttt {engagement}\), color-coded by \(\textsf{z}\) = \(\texttt {variation}\), \(\texttt {location}\), and \(\texttt {engagement}\) in panels (a), (b), and (c) respectively. The value on the vertical axis represents the RL algorithm’s forecast of the standardized advantage of sending an activity message for the user if the user was available for sending a message on the day marked on the horizontal axis. (Note each day has 5 decision times.) The forecasts are marked as blue circles based on \(\textsf{z}\) = 1 and red triangles if \(\textsf{z}\) = 0 at the decision time. Panels (a) to (c) exhibit, respectively, \(\texttt {Score\_int}_{2, \texttt {variation}} =0.65\), \(\texttt {Score\_int}_{2, \texttt {location}} =0.9\), and \(\texttt {Score\_int}_{2, \texttt {engagement}} =0.037\). Note that the three panels plot the same data and differ only in the color coding

Next, we evaluate how likely the user graph in Fig. 17c would appear just by chance. Panels (a) and (b) of Fig. 18 visualize two resampled trajectories of user 4 (chosen uniformly at random from user 4’s 500 resampled trajectories) generated under the generative model that there is no differential advantage of sending a message based on the value of \(\texttt {engagement}\). The color coding is as in Fig. 17c, namely, the forecasts are marked in red triangles if \(\texttt {engagement}\) = 1 and blue circles if \(\texttt {engagement}\) = 0. In panel (c) of Fig. 18, we plot the histogram for the \(\texttt {Score\_int}_{2, \texttt {engagement}}\) for this user across all 500 resampled trajectories and denote the observed value in the original data as a vertical dotted line.

Figure 18a, b show that the resampled trajectories do not appear interesting of type 2 for \(\texttt {engagement}\) as in Fig. 17c; the two trajectories, respectively, have \(\texttt {Score\_int}_{2,\texttt {engagement}} =\) 0.94, and 0.41. However, panel (c) shows that the interestingness score of 0.037, which was observed for user 4 in the original data (Fig. 17c), appears for around 20% of the resampled trajectories. Thus we can conclude that the data presents evidence that user 4’s interestingness score for \(\texttt {engagement}\) might appear extreme simply due to algorithmic stochasticity.

Fig. 18
figure 18

Resampling results for user 4 considered in Fig. 17, whose original trajectory exhibits interestingness of type 2 for \(\texttt {engagement}\). Panels (a) and (b) plot two randomly chosen (out of 500) resampled trajectories generated with zero advantage; the two trajectories, respectively, have \(\texttt {Score\_int}_{2,\texttt {engagement}} =\) 0.94, and 0.41. In panel (c), the vertical axis represents the fraction of the 500 resampled trajectories for this user with the value of \(\texttt {Score\_int}_{2,\texttt {engagement}}\) on the horizontal axis; and the vertical blue dashed line marks the observed \(\texttt {Score\_int}_{2,\texttt {engagement}}\) (value 0.037) for this user (Color figure online)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghosh, S., Kim, R., Chhabria, P. et al. Did we personalize? Assessing personalization by an online reinforcement learning algorithm using resampling. Mach Learn (2024). https://doi.org/10.1007/s10994-024-06526-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10994-024-06526-x

Keywords

Navigation