Parallel and Distributed Processing for Unsupervised Patient Phenotype Representation

García Heano, John Anderson; Precioso, Frédéric; Staccini, Pascal; Riveill, Michel

doi:10.1007/978-3-030-16205-4_1

John Anderson García Heano¹²,
Frédéric Precioso¹²,
Pascal Staccini¹³ &
…
Michel Riveill¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 979))

Included in the following conference series:

Latin American High Performance Computing Conference

419 Accesses

Abstract

The value of data-driven healthcare is the possibility to detect new patterns for inpatient care, treatment, prevention, and comprehension of disease or to predict the duration of hospitalization, its cost or whether death is likely to occur during the hospital stay.

Modeling precise patients phenotype representation from clinical data is challenging over its high-dimensionality, noisy and missing data to be processed into a new low-dimensionality space. Likewise, processing unsupervised learning models into a growing clinical data raises many issues, in terms of algorithmic complexity, such as time to model convergence and memory capacity.

This paper presents DiagnoseNET framework to automate patient phenotype extractions and apply them to predict different medical targets. It provides three high-level features: a full-workflow orchestration into stage pipelining for mining clinical data and using unsupervised feature representations to initialize supervised models; a data resource management for training parallel and distributed deep neural networks.

As a case of study, we have used a clinical dataset from admission and hospital services to build a general purpose inpatient phenotype representation to be used in different medical targets, the first target is to classify the main purpose of inpatient care.

The research focuses on managing the data according to its dimensions, the model complexity, the workers number selected and the memory capacity, for training unsupervised staked denoising auto-encoders over a Mini-Cluster Jetson TX2.

Therefore, mapping tasks that fit over computational resources is a key factor to minimize the number of epochs necessary to model converge, reducing the execution time and maximizing the energy efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Heinzmann, K., Carter, L., Lewis, J.S., Aboagye, E.O.: Multiplexed imaging for diagnosis and therapy. Nature Biomed. Eng. 1, 09 (2017)
Article Google Scholar
Cheng, Y., Wang, F., Zhang, P., Hu, J.: A Deep Learning Approach, Risk Prediction with Electronic Health Records (2016)
Google Scholar
Lasko, T.A., Denny, J.C., Levy, M.A.: Computational Phenotype Discovery Using Unsupervised Feature Learning over Noisy, Sparse, and Irregular Clinical Data (2013)
Google Scholar
Matheny, M.E., et al.: Development of inpatient risk stratification models of acute kidney injury for use in electronic health records. Med. Decis. Making 30(6), 639–650 (2010)
Article Google Scholar
Kennedy, E.H., Wiitala, W.L., Hayward, R.A., Sussman, J.B.: Improved cardiovascular risk prediction using nonparametric regression and electronic health record data. Med. Care 51(3), 251–258 (2013)
Article Google Scholar
Sheng, Y., et al.: Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources. J. Am. Med. Inform. Assoc. 22(5), 993–1000 (2015)
Article Google Scholar
Wang, X., Wang, F., Hu, J.: A multi-task learning framework for joint disease risk prediction and comorbidity discovery. In: Proceedings of the 2014 22nd International Conference on Pattern Recognition, ICPR 2014, pp. 220–225. IEEE Computer Society, Washington, DC (2014)
Google Scholar
Ho, J.C., et al.: Limestone: high-throughput candidate phenotype generation via tensor factorization. J. Biomed. Inform. 52, 199–211 (2014)
Article Google Scholar
Perros, I., et al.: SPARTan: scalable PARAFAC2 for large & sparse data. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August, 2017, pp. 375–384 (2017)
Google Scholar
Perros, I., et al.: SUSTain: scalable unsupervised scoring for tensors and its application to phenotyping. CoRR, abs/1803.05473 (2018)
Google Scholar
Choi, E., Bahadori, M.T., Searles, E., Coffey, C., Sun, J.: Multi-layer representation learning for medical concepts. CoRR, abs/1602.05568 (2016)
Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives, April 2014
Google Scholar
Miotto, R., Li, L., Kidd, B.A., Dudley, J.T.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6, 26094 (2016)
Article Google Scholar
Nguyen, P., Tran, T., Wickramasinghe, N., Venkatesh, S.: Deepr: a convolutional net for medical records. IEEE J. Biomed. Health Inform. 21(1), 22–30 (2017)
Article Google Scholar
Choi, E., Bahadori, M.T., Song, L., Stewart, W.F., Sun, J.: GRAM: graph-based attention model for healthcare representation learning. CoRR, abs/1611.07012 (2016)
Google Scholar
Dean, J., et al.: Large scale distributed deep networks. In: NIPS (2012)
Google Scholar
Keuper, J., Preundt, F.-J.: Distributed training of deep neural networks: theoretical and practical limits of parallel scalability. In: Proceedings of the Workshop on Machine Learning in High Performance Computing Environments, MLHPC 2016, pp. 19–26, IEEE Press, Piscataway (2016)
Google Scholar
Zhang, W., Wang, F., Gupta, S.: Model accuracy and runtime tradeoff in distributed deep learning: a systematic study. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, pp. 4854–4858 (2017)
Google Scholar
Zhang, L., Ren, Y., Zhang, W., Wang, Y.: Nexus: bringing efficient and scalable training to deep learning frameworks. In: 25th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2017, Banff, AB, Canada, 20–22 September, 2017 (2017)
Google Scholar
Dünner, C., Parnell, T.P., Sarigiannis, D., Ioannou, N., Pozidis, H.: Snap Machine Learning. CoRR, abs/1803.06333 (2018)
Google Scholar
Jensen Peter, B., Jensen Lars, J., Søren, B.: Mining electronic health records: towards better research applications and clinical care. Nature Rev. Genet. 13, 395 (2012)
Article Google Scholar
Hripcsak, G., Albers, D.J.: Next-generation phenotyping of electronic health records. JAMIA 20(1), 117–121 (2013)
Google Scholar
Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. In: Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning Workshop - Volume 27, UTLW 2011, pp. 17–37. JMLR.org (2011)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS 2013, pp. 3111–3119. Curran Associates Inc., USA (2013)
Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion (2010)
Google Scholar
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. CoRR, abs/1603.04467 (2016)
Google Scholar
Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: Proceedings of the 32nd International Conference on Machine Learning (2015)
Google Scholar

Download references

Acknowledgments

This work is partly funded by the French government labelled PIA program under its IDEX UCAJEDI project (ANR−15−IDEX−0001). The PhD thesis of John Anderson García Henao is funded by the French government labelled PIA program under its LABEX UCN@Sophia project (ANR−11−LABX−0031−01).

Author information

Authors and Affiliations

Université Côte d’Azur, CNRS, Laboratoire I3S, Sophia Antipolis, France
John Anderson García Heano, Frédéric Precioso & Michel Riveill
Université Côte d’Azur, CHU Nice, Nice, France
Pascal Staccini

Authors

John Anderson García Heano
View author publications
You can also search for this author in PubMed Google Scholar
Frédéric Precioso
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Staccini
View author publications
You can also search for this author in PubMed Google Scholar
Michel Riveill
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to John Anderson García Heano , Frédéric Precioso , Pascal Staccini or Michel Riveill .

Editor information

Editors and Affiliations

Instituto Tecnológico de Costa Rica, Centro Nacional de Alta Tecnología , Pavas, Costa Rica
Esteban Meneses
Universidad de los Andes, Bogotá, Colombia
Harold Castro
Universidad Industrial de Santander, Bucaramanga, Colombia
Carlos Jaime Barrios Hernández
Universidad de Antioquia, Medellín, Colombia
Raul Ramos-Pollan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

García Heano, J.A., Precioso, F., Staccini, P., Riveill, M. (2019). Parallel and Distributed Processing for Unsupervised Patient Phenotype Representation. In: Meneses, E., Castro, H., Barrios Hernández, C., Ramos-Pollan, R. (eds) High Performance Computing. CARLA 2018. Communications in Computer and Information Science, vol 979. Springer, Cham. https://doi.org/10.1007/978-3-030-16205-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-16205-4_1
Published: 31 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16204-7
Online ISBN: 978-3-030-16205-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics