A relative similarity based method for interactive patient risk prediction

Qian, Buyue; Wang, Xiang; Cao, Nan; Li, Hongfei; Jiang, Yu-Gang

doi:10.1007/s10618-014-0379-5

A relative similarity based method for interactive patient risk prediction

Published: 09 September 2014

Volume 29, pages 1070–1093, (2015)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Buyue Qian¹,
Xiang Wang¹,
Nan Cao¹,
Hongfei Li¹ &
…
Yu-Gang Jiang²

1248 Accesses
48 Citations
3 Altmetric
Explore all metrics

Abstract

This paper investigates the patient risk prediction problem in the context of active learning with relative similarities. Active learning has been extensively studied and successfully applied to solve real problems. The typical setting of active learning methods is to query absolute questions. In a medical application where the goal is to predict the risk of patients on certain disease using Electronic Health Records (EHR), the absolute questions take the form of “Will this patient suffer from Alzheimer’s later in his/her life?”, or “Are these two patients similar or not?”. Due to the excessive requirements of domain knowledge, such absolute questions are usually difficult to answer, even for experienced medical experts. In addition, the performance of absolute question focused active learning methods is less stable, since incorrect answers often occur which can be detrimental to the risk prediction model. In this paper, alternatively, we focus on designing relative questions that can be easily answered by domain experts. The proposed relative queries take the form of “Is patient A or patient B more similar to patient C?”, which can be answered by medical experts with more confidence. These questions poll relative information as opposed to absolute information, and even can be answered by non-experts in some cases. In this paper we propose an interactive patient risk prediction method, which actively queries medical experts with the relative similarity of patients. We explore our method on both benchmark and real clinic datasets, and make several interesting discoveries including that querying relative similarities is effective in patient risk prediction, and sometimes can even yield better prediction accuracy than asking for absolute questions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similarity-Aware Collaborative Learning for Patient Outcome Prediction

A One-Size-Fits-Three Representation Learning Framework for Patient Similarity Search

Article Open access 12 August 2023

Multi-modal Semi-supervised Evidential Recycle Framework for Alzheimer’s Disease Classification

References

Asuncion A, Newman D (2007) Uci machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html
Cebron N, Berthold MR (2009) Active learning for object classification: from exploration to exploitation. Data Min Knowl Discov 18(2):283–299
Article MathSciNet Google Scholar
Chattopadhyay R, Wang Z, Fan W, Davidson I, Panchanathan S, Ye J (2012) Batch mode active sampling based on marginal probability distribution matching. In: KDD, pp 741–749
Chen Y, Carroll RJ, Hinz ERM, Shah A, Eyler AE, Denny JC, Xu H (2013) Applying active learning to high-throughput phenotyping algorithms for electronic health records data. JAMIA 20:e253–e259
Google Scholar
Culotta A, McCallum A (2005) Reducing labeling effort for structured prediction tasks. In: Proceedings of the 20th national conference on artificial intelligence—vol 2, AAAI’05. AAAI Press, Menlo Park, pp 746–751
Davis DA, Chawla NV, Christakis NA, Barabási AL (2010) Time to care: a collaborative engine for practical disease prediction. Data Min Knowl Discov 20(3):388–415. doi:10.1007/s10618-009-0156-z
Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th international conference on very large data bases, VLDB ’99. Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 518–529
Gionis A, Lappas T, Terzi E (2012) Estimating entity importance via counting set covers. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12. ACM, New York, NY, pp 687–695
Guo Y, Greiner R (2007) Optimistic active learning using mutual information. In: Proceedings of the 20th international joint conference on artifical intelligence, IJCAI’07, pp 823–829
Hoi SCH, Jin R, Zhu J, Lyu MR (2006) Batch mode active learning and its application to medical image classification. In: Proceedings of the 23rd international conference on machine learning, ICML ’06. ACM, New York, NY, pp 417–424. doi:10.1145/1143844.1143897
Ipeirotis PG, Provost FJ, Sheng VS, Wang J (2014) Repeated labeling using multiple noisy labelers. Data Min Knowl Discov 28(2):402–441
Article MATH MathSciNet Google Scholar
Kapoor A, Horvitz E, Basu S (2007) Selective supervision: guiding supervised learning with decision-theoretic active learning. In: IJCAI, pp 877–882
Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’94. Springer-Verlag New York Inc, New York, NY, pp 3–12
Melville P, Mooney RJ (2004) Diverse ensembles for active learning. In: Proceedings of the twenty-first international conference on machine learning, ICML ’04. ACM, New York, NY, pp 74–81
Muslea I, Minton S, Knoblock C (2000) Selective sampling with redundant views. In: Proceedings of the national conference on artificial intelligence
Norén GN, Hopstadius J, Bate A, Star K, Edwards IR (2010) Temporal pattern discovery in longitudinal electronic patient records. Data Min Knowl Discov 20(3):361–387. doi:10.1007/s10618-009-0152-3
Panigrahy R (2008) An improved algorithm finding nearest neighbor using kd-trees. In: Proceedings of the 8th Latin American conference on theoretical informatics, LATIN’08. Springer-Verlag, Berlin, Heidelberg, pp 387–398
Qian B, Li H, Wang J, Wang X, Davidson I (2013a) Active learning to rank using pairwise supervision. In: SDM, pp 297–305
Qian B, Wang X, Wang J, Li H, Cao N, Zhi W, Davidson I (2013b) Fast pairwise query selection for large-scale active learning to rank. In: ICDM, pp 607–616
Rashidi P, Cook DJ (2011) Ask me better questions: active learning queries based on rule induction. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’11. ACM, New York, NY, pp 904–912. doi:10.1145/2020408.2020559
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
Article Google Scholar
Roy N, Mccallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of 18th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 441–448
Settles B (2009) Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison
Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: EMNLP, pp 1070–1079
Settles B, Craven M, Ray S (2008) Multiple-instance active learning. In: Advances in neural information processing systems NIPS. MIT Press, Cambridge, pp 1289–1296
Sun J, Wang F, Hu J, Edabollahi S (2012) Supervised patient similarity measure of heterogeneous patient records. SIGKDD Explor 14(1):16–24
Article Google Scholar
Wang F, Zhang C (2006) Label propagation through linear neighborhoods. In: Proceedings of the 23rd international conference on machine learning, ICML’06. ACM, New York, NY, pp 985–992. doi:10.1145/1143844.1143968
Wang F, Sun J, Ebadollahi S (2012) Composite distance metric integration by leveraging multiple experts’ inputs and its application in patient similarity assessment. Stat Anal Data Min 5(1):54–69
Article MathSciNet Google Scholar
Wang X, Wang F, Wang J, Qian B, Hu J (2013) Exploring patient risk groups with incomplete knowledge. In: ICDM, pp 1223–1228
Wauthier FL, Jojic N, Jordan MI (2012) Active spectral clustering via iterative uncertainty reduction. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’12. ACM, New York, NY, pp 1339–1347
Wu J, Roy J, Stewart WF (2010) Prediction modeling using ehr data: challenges, strategies, and a comparison of machine learning approaches. Med care 48(6):S106–S113
Article Google Scholar
Zhang T, Oles FJ (2000) A probability analysis on the value of unlabeled data for classification problems. In: Proceedings 17th international conference on machine learning, pp 1191–1198
Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2003) Learning with local and global consistency. In: NIPS
Zhou J, Sun J, Liu Y, Hu J, Ye J (2013) Patient risk prediction model via top-k stability selection. In: SDM, pp 55–63
Zhu X, Ghahramani Z, Lafferty JD (2003a) Semi-supervised learning using gaussian fields and harmonic functions. In: ICML, pp 912–919
Zhu X, Lafferty J, Ghahramani Z (2003b) Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In: ICML 2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining, pp 58–65
Zhuang H, Tang J, Tang W, Lou T, Chin A, Wang X (2012) Actively learning to infer social ties. Data Min Knowl Discov 25(2):270–297
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research, 1101 Kitchawan Rd, Yorktown Heights, NY, 10598, USA
Buyue Qian, Xiang Wang, Nan Cao & Hongfei Li
Fudan University, 825 Zhangheng Road, Shanghai, 201203, China
Yu-Gang Jiang

Authors

Buyue Qian
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Nan Cao
View author publications
You can also search for this author in PubMed Google Scholar
Hongfei Li
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Gang Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Buyue Qian or Nan Cao.

Additional information

Responsible editors: Fei Wang, Gregor Stiglic, Ian Davidson and Zoran Obradovic.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qian, B., Wang, X., Cao, N. et al. A relative similarity based method for interactive patient risk prediction. Data Min Knowl Disc 29, 1070–1093 (2015). https://doi.org/10.1007/s10618-014-0379-5

Download citation

Received: 01 April 2014
Accepted: 20 August 2014
Published: 09 September 2014
Issue Date: July 2015
DOI: https://doi.org/10.1007/s10618-014-0379-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A relative similarity based method for interactive patient risk prediction

Abstract

Access this article

Similar content being viewed by others

Similarity-Aware Collaborative Learning for Patient Outcome Prediction

A One-Size-Fits-Three Representation Learning Framework for Patient Similarity Search

Multi-modal Semi-supervised Evidential Recycle Framework for Alzheimer’s Disease Classification

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A relative similarity based method for interactive patient risk prediction

Abstract

Access this article

Similar content being viewed by others

Similarity-Aware Collaborative Learning for Patient Outcome Prediction

A One-Size-Fits-Three Representation Learning Framework for Patient Similarity Search

Multi-modal Semi-supervised Evidential Recycle Framework for Alzheimer’s Disease Classification

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation