Journal of Healthcare Informatics Research

, Volume 2, Issue 3, pp 205–227 | Cite as

TIQS: Targeted Iterative Question Selection for Health Interventions

  • Keith Feldman
  • Spyros Kotoulas
  • Nitesh V. ChawlaEmail author
Research Article
Part of the following topical collections:
  1. Special Issue on Data Mining in Healthcare Informatics


While healthcare has traditionally existed within the confines of formal clinical environments, the emergence of population health initiatives has given rise to a new and diverse set of community interventions. As the number of interventions continues to grow, the ability to quickly and accurately identify those most relevant to an individual’s specific need has become essential in the care process. However, due to the diverse nature of the interventions, the determination need often requires non-clinical social and behavioral information that must be collected from the individuals themselves. Although survey tools have demonstrated success in the collection of this data, time restrictions and diminishing respondent interest have presented barriers to obtaining up-to-date information on a regular basis. In response, researchers have turned to analytical approaches to optimize surveys and quantify the importance of each question. To date, the majority of these works have approached the task from a univariate standpoint, identifying the next most important question to ask. However, such an approach fails to address the interconnected nature of the health conditions inherently captured by the broader set of survey questions. Utilizing data mining and machine learning methodology, this work demonstrates the value of capturing these relations. We present a novel framework that identifies a variable-length subset of survey questions most relevant in determining the need for a particular health intervention for a given individual. We evaluate the framework using a large national longitudinal dataset centered on aging, demonstrating the ability to identify the questions with the highest impact across a variety of interventions.


Healthcare informatics Personalized health Community health Data mining 


Funding Information

This work was done as part of an internship at IBM Research Ireland. This work was also supported in part by the National Science Foundation (NSF) Grant IIS-1447795 at the University of Notre Dame.

Compliance with Ethical Standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    Agrawal R, Srikant R et al. (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, VLDB, vol 1215, pp 487–499Google Scholar
  2. 2.
    Berry MW, Drmac Z, Jessup ER (1999) Matrices, vector spaces, and information retrieval. SIAM Rev 41(2):335–362MathSciNetCrossRefGoogle Scholar
  3. 3.
    Berwick DM, Nolan TW, Whittington J (2008) The triple aim: care, health, and cost. Health Aff 27(3):759–769CrossRefGoogle Scholar
  4. 4.
    Billings JR, Cowley S (1995) Approaches to community needs assessment: a literature review. J Adv Nurs 22(4):721–730CrossRefGoogle Scholar
  5. 5.
    Braveman P (2011) Accumulating knowledge on the social determinants of health and infectious disease. Public Health Rep 126(3_suppl):28–30CrossRefGoogle Scholar
  6. 6.
    Choi E, Schuetz A, Stewart WF, Sun J (2016) Medical concept representation learning from electronic health records and its application on heart failure prediction. arXiv:1602.03686
  7. 7.
    Cronin H, O’regan C, Finucane C, Kearney P, Kenny R (2013) Health and aging: development of the irish longitudinal study on ageing health assessment. J Am Geriatr Soc 61(s2):S269–S278CrossRefGoogle Scholar
  8. 8.
    Dahlgren G, Whitehead M (1991) Policies and strategies to promote social equity in health. Institute for Future Studies, StockholmGoogle Scholar
  9. 9.
    Davis DA, Chawla NV, Christakis NA, Barabási AL (2010) Time to care: a collaborative engine for practical disease prediction. Data Min Knowl Disc 20 (3):388–415MathSciNetCrossRefGoogle Scholar
  10. 10.
    de Leeuw ED (1992) Data quality in mail, telephone and face to face surveys. ERICGoogle Scholar
  11. 11.
    Edelen MO, Reeve BB (2007) Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Qual Life Res 16(1):5CrossRefGoogle Scholar
  12. 12.
    Emanet N, Öz HR, Bayram N, Delen D (2014) A comparative analysis of machine learning methods for classification type decision problems in healthcare. Decision Analytics 1(1):6CrossRefGoogle Scholar
  13. 13.
    Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 36–40Google Scholar
  14. 14.
    Fowler FJ (1995) Improving survey questions: design and evaluation, vol 38. Sage, Thousand OaksGoogle Scholar
  15. 15.
    Freund Y, Schapire RE (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: European conference on computational learning theory. Springer, pp 23–37Google Scholar
  16. 16.
    Hambleton RK, Swaminathan H, Rogers HJ (1991) Fundamentals of item response theory, vol 2. Sage, Thousand OaksGoogle Scholar
  17. 17.
    Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Disc 15(1):55–86MathSciNetCrossRefGoogle Scholar
  18. 18.
    Ireland H (2013) A framework for improved health and wellbeing 2013–2025. Department of HealthGoogle Scholar
  19. 19.
    Kenny R (2014) The Irish longitudinal study on ageing (tilda), 2009–2011. icpsr34315-v1. Ann Arbor, MI: Interuniversity Consortium for Political and Social Research [distributor], pp 07–16Google Scholar
  20. 20.
    Kilbourne AM, Neumann MS, Pincus HA, Bauer MS, Stall R (2007) Implementing evidence-based interventions in health care: application of the replicating effective programs framework. Implement Sci 2(1):42CrossRefGoogle Scholar
  21. 21.
    Kingsbury GG, Zara AR (1989) Procedures for selecting items for computerized adaptive tests. Appl Meas Educ 2(4):359–375CrossRefGoogle Scholar
  22. 22.
    Krosnick JA, Presser S (2010) Question and questionnaire design. Handbook of Survey Research 2(3):263–314Google Scholar
  23. 23.
    Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes P, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C (2015) DBpedia—a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web Journal 6(2):167–195Google Scholar
  24. 24.
    Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive datasets. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  25. 25.
    McFarland SG (1981) Effects of question order on survey responses. Public Opin Q 45(2):208–215CrossRefGoogle Scholar
  26. 26.
    McGovern L, Miller G, Hughes-Cromwick P (2014) Health policy brief: the relative contribution of multiple determinants to health outcomes, health affairs, August 21Google Scholar
  27. 27.
    Merzel C, D’Afflitti J (2003) Reconsidering community-based health promotion: promise, performance, and potential. Am J Public Health 93(4):557–574CrossRefGoogle Scholar
  28. 28.
    Opdenakker R (2006) Advantages and disadvantages of four interview techniques in qualitative research. In: Forum qualitative sozialforschung/forum: qualitative social research, vol 7Google Scholar
  29. 29.
    Pasek J, Krosnick JA (2010) Optimizing survey questionnaire design in political science: insights from psychology. In: Oxford handbook of american elections and political behavior, pp 27–50Google Scholar
  30. 30.
    Pham T, Tran T, Phung D, Venkatesh S (2017) Predicting healthcare trajectories from medical records: a deep learning approach. J Biomed Inform 69:218–229CrossRefGoogle Scholar
  31. 31.
    Rolstad S, Adler J, Rydén A (2011) Response burden and questionnaire length: is shorter better? A review and meta-analysis. Value Health 14(8):1101–1108CrossRefGoogle Scholar
  32. 32.
    Roussos ST, Fawcett SB (2000) A review of collaborative partnerships as a strategy for improving community health. Annu Rev Public Health 21(1):369–402CrossRefGoogle Scholar
  33. 33.
    Sands WA, Waters BK, McBride JR (1997) Computerized adaptive testing: from inquiry to operation. American Psychological Association, WashingtonCrossRefGoogle Scholar
  34. 34.
    Schein AI, Popescul A, Ungar LH, Pennock DM (2002) Methods and metrics for cold-start recommendations. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 253–260Google Scholar
  35. 35.
    Sparck Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28(1):11–21CrossRefGoogle Scholar
  36. 36.
    Sudman S, Bradburn NM, Schwarz N (1996) Thinking about answers: the application of cognitive processes to survey methodology. Jossey-Bass, San FranciscoGoogle Scholar
  37. 37.
    Tanur JM (1992) Questions about questions: inquiries into the cognitive bases of surveys. Russell Sage, New YorkGoogle Scholar
  38. 38.
    Tomar D, Agarwal S (2013) A survey on data mining approaches for healthcare. International Journal of Bio-Science and Bio-Technology 5(5):241–266CrossRefGoogle Scholar
  39. 39.
    Tourangeau R, Rips LJ, Rasinski K (2000) The psychology of survey response. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  40. 40.
    Van der Linden WJ, Glas CA et al. (2000) Computerized adaptive testing: theory and practice. Springer, BerlinCrossRefGoogle Scholar
  41. 41.
    Veerkamp WJ, Berger MP (1997) Some new item selection criteria for adaptive testing. J Educ Behav Stat 22(2):203–226CrossRefGoogle Scholar
  42. 42.
    Velentgas P, Dreyer NA, Nourjah P, Smith SR, Torchia MM et al. (2013) Developing a protocol for observational comparative effectiveness research: a user’s guide. GPO, WashingtonGoogle Scholar
  43. 43.
    Wainer H, Dorans NJ, Flaugher R, Green BF, Mislevy RJ (2000) Computerized adaptive testing: a primer. Routledge, EvanstonCrossRefGoogle Scholar
  44. 44.
    Whelan BJ, Savva GM (2013) Design and methodology of the irish longitudinal study on ageing. J Am Geriatr Soc 61(s2):S265–S268CrossRefGoogle Scholar
  45. 45.
    Wilkinson RG, Marmot M (2003) Social determinants of health: the solid facts. World Health OrganizationGoogle Scholar
  46. 46.
    Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San MateoGoogle Scholar
  47. 47.
    Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang JF, Hua L (2012) Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst 36(4):2431–2448CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Keith Feldman
    • 1
  • Spyros Kotoulas
    • 2
  • Nitesh V. Chawla
    • 1
    • 3
    Email author
  1. 1.Department of Computer Science and Engineering, and iCeNSAUniversity of Notre DameNotre DameUSA
  2. 2.IBM Research IrelandDublinIreland
  3. 3.Wrocław University of Science and TechnologyWrocławPoland

Personalised recommendations