TIQS: Targeted Iterative Question Selection for Health Interventions


While healthcare has traditionally existed within the confines of formal clinical environments, the emergence of population health initiatives has given rise to a new and diverse set of community interventions. As the number of interventions continues to grow, the ability to quickly and accurately identify those most relevant to an individual’s specific need has become essential in the care process. However, due to the diverse nature of the interventions, the determination need often requires non-clinical social and behavioral information that must be collected from the individuals themselves. Although survey tools have demonstrated success in the collection of this data, time restrictions and diminishing respondent interest have presented barriers to obtaining up-to-date information on a regular basis. In response, researchers have turned to analytical approaches to optimize surveys and quantify the importance of each question. To date, the majority of these works have approached the task from a univariate standpoint, identifying the next most important question to ask. However, such an approach fails to address the interconnected nature of the health conditions inherently captured by the broader set of survey questions. Utilizing data mining and machine learning methodology, this work demonstrates the value of capturing these relations. We present a novel framework that identifies a variable-length subset of survey questions most relevant in determining the need for a particular health intervention for a given individual. We evaluate the framework using a large national longitudinal dataset centered on aging, demonstrating the ability to identify the questions with the highest impact across a variety of interventions.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. 1.

    Agrawal R, Srikant R et al. (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, VLDB, vol 1215, pp 487–499

  2. 2.

    Berry MW, Drmac Z, Jessup ER (1999) Matrices, vector spaces, and information retrieval. SIAM Rev 41(2):335–362

    MathSciNet  Article  Google Scholar 

  3. 3.

    Berwick DM, Nolan TW, Whittington J (2008) The triple aim: care, health, and cost. Health Aff 27(3):759–769

    Article  Google Scholar 

  4. 4.

    Billings JR, Cowley S (1995) Approaches to community needs assessment: a literature review. J Adv Nurs 22(4):721–730

    Article  Google Scholar 

  5. 5.

    Braveman P (2011) Accumulating knowledge on the social determinants of health and infectious disease. Public Health Rep 126(3_suppl):28–30

    Article  Google Scholar 

  6. 6.

    Choi E, Schuetz A, Stewart WF, Sun J (2016) Medical concept representation learning from electronic health records and its application on heart failure prediction. arXiv:1602.03686

  7. 7.

    Cronin H, O’regan C, Finucane C, Kearney P, Kenny R (2013) Health and aging: development of the irish longitudinal study on ageing health assessment. J Am Geriatr Soc 61(s2):S269–S278

    Article  Google Scholar 

  8. 8.

    Dahlgren G, Whitehead M (1991) Policies and strategies to promote social equity in health. Institute for Future Studies, Stockholm

    Google Scholar 

  9. 9.

    Davis DA, Chawla NV, Christakis NA, Barabási AL (2010) Time to care: a collaborative engine for practical disease prediction. Data Min Knowl Disc 20 (3):388–415

    MathSciNet  Article  Google Scholar 

  10. 10.

    de Leeuw ED (1992) Data quality in mail, telephone and face to face surveys. ERIC

  11. 11.

    Edelen MO, Reeve BB (2007) Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Qual Life Res 16(1):5

    Article  Google Scholar 

  12. 12.

    Emanet N, Öz HR, Bayram N, Delen D (2014) A comparative analysis of machine learning methods for classification type decision problems in healthcare. Decision Analytics 1(1):6

    Article  Google Scholar 

  13. 13.

    Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 36–40

  14. 14.

    Fowler FJ (1995) Improving survey questions: design and evaluation, vol 38. Sage, Thousand Oaks

    Google Scholar 

  15. 15.

    Freund Y, Schapire RE (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: European conference on computational learning theory. Springer, pp 23–37

  16. 16.

    Hambleton RK, Swaminathan H, Rogers HJ (1991) Fundamentals of item response theory, vol 2. Sage, Thousand Oaks

    Google Scholar 

  17. 17.

    Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Disc 15(1):55–86

    MathSciNet  Article  Google Scholar 

  18. 18.

    Ireland H (2013) A framework for improved health and wellbeing 2013–2025. Department of Health

  19. 19.

    Kenny R (2014) The Irish longitudinal study on ageing (tilda), 2009–2011. icpsr34315-v1. Ann Arbor, MI: Interuniversity Consortium for Political and Social Research [distributor], pp 07–16

  20. 20.

    Kilbourne AM, Neumann MS, Pincus HA, Bauer MS, Stall R (2007) Implementing evidence-based interventions in health care: application of the replicating effective programs framework. Implement Sci 2(1):42

    Article  Google Scholar 

  21. 21.

    Kingsbury GG, Zara AR (1989) Procedures for selecting items for computerized adaptive tests. Appl Meas Educ 2(4):359–375

    Article  Google Scholar 

  22. 22.

    Krosnick JA, Presser S (2010) Question and questionnaire design. Handbook of Survey Research 2(3):263–314

    Google Scholar 

  23. 23.

    Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes P, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C (2015) DBpedia—a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web Journal 6(2):167–195

    Google Scholar 

  24. 24.

    Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive datasets. Cambridge University Press, Cambridge

    Book  Google Scholar 

  25. 25.

    McFarland SG (1981) Effects of question order on survey responses. Public Opin Q 45(2):208–215

    Article  Google Scholar 

  26. 26.

    McGovern L, Miller G, Hughes-Cromwick P (2014) Health policy brief: the relative contribution of multiple determinants to health outcomes, health affairs, August 21

  27. 27.

    Merzel C, D’Afflitti J (2003) Reconsidering community-based health promotion: promise, performance, and potential. Am J Public Health 93(4):557–574

    Article  Google Scholar 

  28. 28.

    Opdenakker R (2006) Advantages and disadvantages of four interview techniques in qualitative research. In: Forum qualitative sozialforschung/forum: qualitative social research, vol 7

  29. 29.

    Pasek J, Krosnick JA (2010) Optimizing survey questionnaire design in political science: insights from psychology. In: Oxford handbook of american elections and political behavior, pp 27–50

  30. 30.

    Pham T, Tran T, Phung D, Venkatesh S (2017) Predicting healthcare trajectories from medical records: a deep learning approach. J Biomed Inform 69:218–229

    Article  Google Scholar 

  31. 31.

    Rolstad S, Adler J, Rydén A (2011) Response burden and questionnaire length: is shorter better? A review and meta-analysis. Value Health 14(8):1101–1108

    Article  Google Scholar 

  32. 32.

    Roussos ST, Fawcett SB (2000) A review of collaborative partnerships as a strategy for improving community health. Annu Rev Public Health 21(1):369–402

    Article  Google Scholar 

  33. 33.

    Sands WA, Waters BK, McBride JR (1997) Computerized adaptive testing: from inquiry to operation. American Psychological Association, Washington

    Book  Google Scholar 

  34. 34.

    Schein AI, Popescul A, Ungar LH, Pennock DM (2002) Methods and metrics for cold-start recommendations. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 253–260

  35. 35.

    Sparck Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28(1):11–21

    Article  Google Scholar 

  36. 36.

    Sudman S, Bradburn NM, Schwarz N (1996) Thinking about answers: the application of cognitive processes to survey methodology. Jossey-Bass, San Francisco

    Google Scholar 

  37. 37.

    Tanur JM (1992) Questions about questions: inquiries into the cognitive bases of surveys. Russell Sage, New York

    Google Scholar 

  38. 38.

    Tomar D, Agarwal S (2013) A survey on data mining approaches for healthcare. International Journal of Bio-Science and Bio-Technology 5(5):241–266

    Article  Google Scholar 

  39. 39.

    Tourangeau R, Rips LJ, Rasinski K (2000) The psychology of survey response. Cambridge University Press, Cambridge

    Book  Google Scholar 

  40. 40.

    Van der Linden WJ, Glas CA et al. (2000) Computerized adaptive testing: theory and practice. Springer, Berlin

    Book  Google Scholar 

  41. 41.

    Veerkamp WJ, Berger MP (1997) Some new item selection criteria for adaptive testing. J Educ Behav Stat 22(2):203–226

    Article  Google Scholar 

  42. 42.

    Velentgas P, Dreyer NA, Nourjah P, Smith SR, Torchia MM et al. (2013) Developing a protocol for observational comparative effectiveness research: a user’s guide. GPO, Washington

    Google Scholar 

  43. 43.

    Wainer H, Dorans NJ, Flaugher R, Green BF, Mislevy RJ (2000) Computerized adaptive testing: a primer. Routledge, Evanston

    Book  Google Scholar 

  44. 44.

    Whelan BJ, Savva GM (2013) Design and methodology of the irish longitudinal study on ageing. J Am Geriatr Soc 61(s2):S265–S268

    Article  Google Scholar 

  45. 45.

    Wilkinson RG, Marmot M (2003) Social determinants of health: the solid facts. World Health Organization

  46. 46.

    Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Mateo

    Google Scholar 

  47. 47.

    Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang JF, Hua L (2012) Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst 36(4):2431–2448

    Article  Google Scholar 

Download references


This work was done as part of an internship at IBM Research Ireland. This work was also supported in part by the National Science Foundation (NSF) Grant IIS-1447795 at the University of Notre Dame.

Author information



Corresponding author

Correspondence to Nitesh V. Chawla.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Feldman, K., Kotoulas, S. & Chawla, N.V. TIQS: Targeted Iterative Question Selection for Health Interventions. J Healthc Inform Res 2, 205–227 (2018). https://doi.org/10.1007/s41666-018-0015-z

Download citation


  • Healthcare informatics
  • Personalized health
  • Community health
  • Data mining