Abstract
Purpose
As we begin to leverage Big Data in health care settings and particularly in assessing patient-reported outcomes, there is a need for novel analytics to address unique challenges. One such challenge is in coding transcribed interview data, typically free-text entries of statements made during a face-to-face interview. Latent Dirichlet Allocation (LDA) offers statistical rigor and consistency in automating the interpretation of patients’ expressed concerns and coping strategies.
Methods
LDA was applied to interview data collected as part of a prospective, longitudinal study of QOL in N = 211 patients undergoing radical cystectomy and urinary diversion for bladder cancer. LDA analyzed personal goal statements to extract the latent topics and themes, stratified by time, and on things patients wanted to accomplish and prevent. Model comparison metrics determined the number of topics to extract.
Results
LDA extracted seven latent topics. Prior to surgery, patients’ priorities were primarily in cancer surgery and recovery. Six months after the surgery, they were replaced by goals on regaining a sense of normalcy, to resume work, to enjoy life more fully, and to appreciate friends and family more. LDA model parameters showed changing priorities, e.g., immediate concerns on surgery and resuming employment decreased post-surgery and were replaced by concerns over cancer recurrence and a desire to remain healthy and strong.
Conclusions
Novel Big Data analytics such as LDA offer the possibility of summarizing personal goals without the need for conventional fixed-length measures and resource-intensive qualitative data coding.
Similar content being viewed by others
References
National Cancer Institute, S. P. (2018). Cancer stat facts: Bladder cancer. Retrieved from https://seer.cancer.gov/statfacts/html/urinb.html.
Rapkin, B. (2000). Personal goals and response shifts: Understanding the impact of illness and events on the quality of life of people living with AIDS. In C. A. Schwartz & M. A. G. Sprangers (Eds.), Adaptation to changing health: Response shift in quality-of-life research (pp. 53–71). Washington, DC: American Psychological Association.
Rapkin, B., & Schwartz, C. E. (2004). Toward a theoretical model of quality-of-life appraisal: Implications of findings from studies of response shift. Health Quality of Life Outcomes, 2, 14.
Rapkin, B. D., Smith, M. Y., DuMont, K., Correa, A., Palmer, S., & Cohen, S. (1993). Development of the ideographic functional status assessment: A measure of the personal goals and goal attainment activities of people with AIDS. Psychology and Health, 9, 111–129.
Sprangers, M. A. G., & Schwartz, C. E. (1999). Integrating response shift into health-related quality-of-life research: A theoretical model. Social Science and Medicine, 48, 1507–1515.
Schwartz, C. E., Finkelstein, J. A., & Rapkin, B. D. (2017). Appraisal assessment in patient-reported outcome research: methods for uncovering the personal context and meaning of quality of life. Quality of Life Research, 26(3), 545–554. https://doi.org/10.1007/s11136-016-1476-2.
Li, Y., & Rapkin, B. (2009). Classification and regression tree uncovered hierarchy of psychosocial determinants underlying quality-of-life response shift in HIV/AIDS. Journal of Clinical Epidemiology, 62(11), 1138–1147. https://doi.org/10.1016/j.jclinepi.2009.03.021.
Rapkin, B. D., & Schwartz, C. E. (2016). Distilling the essence of appraisal: a mixed methods study of people with multiple sclerosis. Quality of Life Research, 25(4), 793–805. https://doi.org/10.1007/s11136-015-1119-z.
Morganstern, B. A., Bochner, B., Dalbagni, G., Shabsigh, A., & Rapkin, B. (2011). The psychological context of quality of life: a psychometric analysis of a novel idiographic measure of bladder cancer patients’ personal goals and concerns prior to surgery. Health Quality of Life Outcomes, 9, 10. https://doi.org/10.1186/1477-7525-9-10.
Hart, S., Skinner, E. C., Meyerowitz, B. E., Boyd, S., Lieskovsky, G., & Skinner, D. G. (1999). Quality of life after radical cystectomy for bladder cancer in patients with an ileal conduit, cutaneous or urethral Kock pouch. The Journal of Urology, 162, 77–81.
Dutta, S. C., Chang, S. C., Coffey, C. S., Smith, J. A. Jr., Jack, G., & Cookson, M. S. (2002). Health related quality of life assessment after radical cystectomy: Comparison of ileal conduit with continent orthotopic neobladder. Journal of Urology, 168, 164–167.
Gerharz, E. W., Weingartner, E., Dopatka, T., Kohl, U. N., Basler, H. D., & Riedmiller, H. N. (1997). Quality of life after cystectomy and urinary diversion: Results of a retrospective interdisciplinary study. Journal of Urology, 158, 778–785.
Hobisch, A., Tosun, K., Kinzl, J., Kemmler, G., Bartsch, G., & Holtl, L. (2001). Life after cystectomy and orthotopic neobladder versus ileal conduit urinary diversion. Seminars in Urologic Oncology, 19, 18–23.
Yang, L. S., Shan, B. L., Shan, L. L., Chin, P., Murray, S., Ahmadi, N., & Saxena, A. (2016). A systematic review and meta-analysis of quality of life outcomes after radical cystectomy for bladder cancer. Surgical Oncology, 25(3), 281–297. https://doi.org/10.1016/j.suronc.2016.05.027.
Ali, A. S., Hayes, M. C., Birch, B., Dudderidge, T., & Somani, B. K. (2015). Health related quality of life (HRQoL) after cystectomy: comparison between orthotopic neobladder and ileal conduit diversion. Eur J Surg Oncol, 41(3), 295–299. https://doi.org/10.1016/j.ejso.2014.05.006.
Cerruto, M. A., D’Elia, C., Siracusano, S., Gedeshi, X., Mariotto, A., Iafrate, M.,.. . Artibani, W. (2016). Systematic review and meta-analysis of non RCT’s on health related quality of life after radical cystectomy using validated questionnaires: Better results with orthotopic neobladder versus ileal conduit. European Journal of Surgical Oncology, 42(3), 343–360. https://doi.org/10.1016/j.ejso.2015.10.001.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022. http://jmlr.org/papers/v3/blei03a.html. doi.
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1), 5228–5235. https://doi.org/10.1073/pnas.0307752101.
Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. In T. Landauer, D. McNamara & S. Dennis, & K. W. (Eds.), Latent semantic analysis: A road to meaning. Hillsdale: Laurence Erlbaum.
Baumer, E. P. S., Mimno, D., Guha, S., Quan, E., & Gay, G. K. (2017). Comparing grounded theory and topic modeling: Extreme divergence or unlikely convergence? Journal of the Association for Information Science and Technology, 68(6), 1397–1410.
Mittal, V., Kaul, A., Sen Gupta, S., & Arora, A. (2017). Multivariate features based Instagram post analyiss to enrich user experience. Procedia Computer Science, 122, 138–145.
Glickman, M., Brown, J., & Song, R. (2018). Assessing authorship of Beatles songs from musical content: Bayesian classification modeling from bags-of-words representations. Paper presented at the 2018 Joint Statistical Meeting, Vancouver, Canada. https://ww2.amstat.org/meetings/jsm/2018/onlineprogram/AbstractDetails.cfm?abstractid=329336.
Simon, S. H. (2018). A songwriting mystery solved: Math Proves John Lennon wrote ‘in my life’: National Public Radio.
Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M.,… Ungar, L. H. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS ONE, 8(9), e73791. https://doi.org/10.1371/journal.pone.0073791.
Azucar, D., Marengo, D., & Settanni, M. (2018). Predicting the Big 5 personality traits from digital footprints on social media: A meta-analysis. Personality and Individual Differences, 124(1), 150–159. https://doi.org/10.1016/j.paid.2017.12.018.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,.. . Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Nikita, M. (2016). ldatuning: Tuning of the Latent Dirichlet Allocation Models Parameters: R package version 0.2.0.
Arun, R., Suresh, V., Veni Madhavan, V. C. E., & Narasimha Murthy, M. N. (2010). On finding the natural number of topics with latent dirichlet allocation: Some observations. In M. J. Zaki, J. X. Yu, B. Ravindran & V. Pudi (Eds.), In Advances in knowledge discovery and data mining (pp. 391–402). Heidelberg: Springer Berlin.
Cao, J., Xia, T., Li, J., Zhang, Y., & Tang, S. (2009). A density-based method for adaptive lDA model selection. Neurocomputing—16th European Symposium on Artificial Neural Networks, 72, 1775–1781. https://doi.org/10.1016/j.neucom.2008.06.011.
Deveaud, R., SanJuan, É, & Bellot, P. (2014). Accurate and effective latent concept modeling for ad hoc information retrieval. Document numérique, 17(1), 61–84. https://doi.org/10.3166/dn.17.1.61-84.
Ipeirotis, P. (2007). Visualizing the Dirichlet. Retrieved from https://www.behind-the-enemy-lines.com/2007/10/visualizing-dirichlet.html.
Feinerer, I., Hornik, K., & Meyer, D. (2008). Text mining infrastructure in R. Journal of Statistical Software, 25(5), 1–54.
Grün, B., & Hornik, K. (2011). topicmodels: An R package for fitting topic models. Journal of Statistical Software, 40(13), 1–30. https://doi.org/10.18637/jss.v040.i13.
Hong, L., & Davison, B. D. (2010). Empirical study of topic modeling in Twitter. Paper presented at the Proceeding SOMA ‘10 Proceedings of the First Workshop on Social Media Analytics, Washington DC.
Forsyth, A. W., Barzilay, R., Hughes, K. S., Lui, D., Lorenz, K. A., Enzinger, A.,.. . Lindvall, C. (2018). Machine learning methods to extract documentation of breast cancer symptoms from electronic health records. J Pain Symptom Manage, 55(6), 1492–1499. https://doi.org/10.1016/j.jpainsymman.2018.02.016.
Tufts, C. (2018). The little book of LDA an overview of Latent Dirichlet Allocation & Gibbs Sampling. Retrieved from https://ldabook.com.
Reed, C. (2012). Latent Dirichlet allocation: Towards a deeper understanding. Retrieved from http://obphio.us/pdfs/lda_tutorial.pdf.
Ponweiser, M. (2012). Latent Dirichlet Allocation in R. WU Vienna University of Economics and Business. Retrieved from http://epub.wu.ac.at/id/eprint/3558.
Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. In Proceedings of the 20th conference on uncertainty in artificial intelligence, 487–494. https://dl.acm.org/citation.cfm?id=1036902.
Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S. K., & Rand, D. G. (2014). Structural topic models for open-ended survey responses. American Journal of Political Science, 58(4), 1064–1082. https://doi.org/10.1111/ajps.12103.
Banks, G. C., Woznyj, H. M., Wesslen, R. S., & Ross, R. L. (2019). A review of best practice recommendations for text analysis in R (and a user-friendly App). Journal of Business and Psychology, 33(4), 445–459. https://doi.org/10.1007/s10869-017-9528-3.
Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A.,… Adam, S. (2018). Applying LDA topic modeling in communication research: Toward a valid and reliable methodology. Communication Methods and Measures, 12(2–3), 93–118. https://doi.org/10.1080/19312458.2018.1430754.
Acknowledgements
The authors thank Patient-Centered Outcomes Research Institute Grant ME-1306-00781 (PI: Rapkin); National Institute of Health Grant P30 CA008748 to Memorial Sloan Kettering Cancer Center; Sidney Kimmel Center for Prostate and Urological Cancers at Memorial Sloan Kettering Cancer Center, Pin Down Bladder Cancer; and the Michael A. and Zena Wiener Research and Therapeutics Program in Bladder Cancer.
Funding
This study was funded by (1) Patient-Centered Outcomes Research Institute Grant ME-1306-00781 (PI: Rapkin); (2) National Institute of Health Grant P30 CA008748 to Memorial Sloan Kettering Cancer Center; and (3) Sidney Kimmel Center for Prostate and Urological Cancers at Memorial Sloan Kettering Cancer Center, Pin Down Bladder Cancer, and the Michael A. and Zena Wiener Research and Therapeutics Program in Bladder Cancer (PI: Bochner).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval for human subject research
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, Y., Rapkin, B., Atkinson, T.M. et al. Leveraging Latent Dirichlet Allocation in processing free-text personal goals among patients undergoing bladder cancer surgery. Qual Life Res 28, 1441–1455 (2019). https://doi.org/10.1007/s11136-019-02132-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-019-02132-w