Abstract
Determining an optimal generalization model with deep neural networks for a medical task is an expensive process that generally requires large amounts of data and computing power. Furthermore, the complexity of the programming expressiveness increases to scale deep learning workflows over new heterogeneous system architectures for training each model and efficiently configure the computing resources. We introduce DiagnoseNET, an automatic framework designed for scaling deep learning models over heterogeneous systems applied to medical diagnosis. DiagnoseNET is designed as a modular framework to enable the deep learning workflow management and allows the expressiveness of neural networks written in TensorFlow, while the DiagnoseNET runtime abstracts the data locality, micro batching and the distributed orchestration to scale the neural network model from a GPU workstation to multi-nodes. The main approach is composed through a set of gradient computation modes to adapt the neural network according to the memory capacity, the workers’ number, the coordination method and the communication protocol (GRPC or MPI) for achieving a balance between accuracy and energy consumption. The experiments carried out allow to evaluate the computational performance in terms of accuracy, convergence time and worker scalability to determine an optimal neural architecture over a mini-cluster of Jetson TX2 nodes. These experiments were performed using two medical cases of study, the former dataset is composed by clinical descriptors collected during the first week of hospitalization of patients in the Provence-Alpes-Côe d’Azur region; the second dataset uses a short ECG records between 30 and 60 s, obtained as part of the PhysioNet 2017 Challenge.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Asch M, Moore T et al (2018) Big data and extreme-scale computing: pathways to convergence-toward a shaping strategy for a future software and data ecosystem for scientific inquiry. Int J High Perform Comput Appl 32:435–479
Avati A, Jung K, Harman S, Downing L, Ng AY, Shah NH (2017) Improving palliative care with deep learning. CoRR. arXiv:1711.06402
Bonawitz K, Eichner H, Grieskamp W, Huba D, Ingerman A, Ivanov V, Kiddon C, Konecný J, Mazzocchi S, McMahan HB, Overveldt TV, Petrou D, Ramage D, Roselander J (2019) Towards federated learning at scale: system design. CoRR. arXiv:1902.01046
Garcia Henao JA, Esteban Hernandez B, Montenegro CE, Navaux PO, Barrios Hernández CJ (2016) enerGyPU and enerGyPhi monitor for power consumption and performance evaluation on Nvidia Tesla GPU and Intel Xeon Ph
Garcia Henao JA, Precioso F, Staccini P, Riveill M (2018) Parallel and distributed processing for unsupervised patient phenotype representation. In: Latin America high performance computing conference. https://hal.archives-ouvertes.fr/hal-01885364, Sept 2018
Jia Z, Zaharia M, Aiken A (2018) Beyond data and model parallelism for deep neural networks. CoRR. arXiv:1807.05358
Jiang J, Yu L, Jiang J, Liu Y, Cui B (2017) Angel: a new large-scale machine learning system. Natl Sci Rev 5(2):216–236. https://doi.org/10.1093/nsr/nwx018
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Konecný J, McMahan HB, Yu FX, Richtárik P, Suresh AT, Bacon D (2016) Federated learning: strategies for improving communication efficiency. CoRR. arXiv:1610.05492
Maharlou H, Niakan Kalhori SR, Shahbazi S, Ravangard R (2018) Predicting length of stay in intensive care units after cardiac surgery: comparison of artificial neural networks and adaptive neuro-fuzzy system. Healthc Inform Res 24(2):109–117. https://doi.org/10.4258/hir.2018.24.2.109. http://europepmc.org/articles/PMC5944185
Rajpurkar P, Hannun A, Haghpanahi M, Bourn C, Ng A (2017) Cardiologist-level arrhythmia detection with convolutional neural networks
Strubell E, Ganesh A, McCallum A (2019) Energy and policy considerations for deep learning in NLP. arXiv:1906.02243, Jun 2019
Xing EP, Ho Q, Dai W, Kim JK, Wei J, Lee S, Zheng X, Xie P, Kumar A, Yu Y (2015) Petuum: a new platform for distributed machine learning on big data. IEEE Trans Big Data 1(2):49–67
Ye C, Wang O, Liu M, Zheng L, Xia M, Hao S, Jin B, Jin H, Zhu C, Huang CJ, Gao P, Ellrodt G, Brennan D, Stearns F, Sylvester KG, Widen E, McElhinney DB, Ling X (2019) A real-time early warning system for monitoring inpatient mortality risk: prospective study using electronic medical record data. J Med Internet Res 21(7):e13719–e13719. https://www.ncbi.nlm.nih.gov/pubmed/31278734, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6640073/
Acknowledgements
We thank DU Ziqing, Mohamed Younes, Arno Gobbin and the IADB team for their help. This work is partly funded by the French government labelled PIA program under its IDEX UCAJEDI project (ANR-15-IDEX-0001). The PhD thesis of John Anderson Garcia Henao is funded by the French government labelled PIA program under its LABEX UCN@Sophia project (ANR-11-LABX-0031-01).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Appendices
Appendix 1: DiagnoseNET MPI Synchronous Algorithm
The algorithm 1 describes the MPI synchronous coordination training with parameter server. It uses the nodes ranks to assign them the role of parameter server or worker, defined the rank 0 as parameter server (PS) and the other ranks as workers. When launching the program, the PS does necessary pre-processing tasks, such as loading the dataset and compiling the model. After these tasks, the PS sends the model to the workers, which are ready to receive it. At each training step, the PS sends a different subset of the data to every worker to be used for loss optimization. At the end of an epoch, the PS will gather the new weights from every worker. Workers receive the collection of weights and compute the average weight for the global update. For the other computing parts, it works as the desktop version.
Appendix 2: DiagnoseNET MPI Asynchronous Algorithm
The algorithm 2 allows training multiple model replicas in parallel on different nodes with different subsets of the data. Each model replica processes a mini-batch to compute gradients and sends them to the parameter server which apply a function (mean, weighted average) between previous and received weights, then updates the global weights accordingly and send them back to the workers. In fact, every worker will compute its gradients individually until its convergence; the convergence occurs when we start having overfitting, which means that the training loss is decreasing while the validation loss increased. The master who is responsible for computing the weighted average of received weights and its own weights, will stop when all workers converge. To check the status of convergence of workers, the master has a queue that stores converged workers and when its length is equal to the number of workers, the master knows that all workers converged and stops training. Since each node computes gradients independently and does not require interaction among each other, they can work at their own pace and have greater robustness to machine failure.
Appendix 3: Hyperparameter Search to Classify the Medical Task 1
A model space contains (d) hyperparameters and (n) hyperparameters configurations defined in Table 3 and the Table 4 shows the models by number of parameters. We have established some fixed hyperparameters and decided to tune the number of units per layer, the number of layers and batch size, which are the hyperparameters that directly affect the computational cost. Each model was trained using Adam as an optimizer with a maximum of 40 epochs and as a loss function is used the \(Cross\ Entropy\).
According with the model dimension showed in the Table 4, we are found that is possible divided the models by Fine, middle and course grain. In which, the Fig. 4 shows that middle-grain models from 1.99 to 8.29 millions of parameters have a fast convergence in validation loss, and high accuracy levels for the majority of the 14 care purpose labels, in comparison with the other models who present a great variation in accuracy and spent more epochs to convergence.
Appendix 4: ECG Neural Architecture to Classify the Medical Task 2
The pure CNN model leads to the problem that the last layer of the model may not exploit the original features or the ones extracted in the first layers. The Fig. 5 shows the ECG neural architecture implemented using DiagnoseNET framework, which key architecture factor are the residual network connections to solve the information loss problem into the deep layers. To implement this, a second information stream is added in the model. In this way, deeper layers have access to the original features, in addition to the information processed by the previous layers. What else, two different types of residual block are included to access the different states of the information. The normal residual block preserves the size of the input while the sub-sampling residual block lowers the size of the input down to a half. By using max pooling, the network extracts only the high values from an input so that the size of its output is halved.
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Garcia Henao, J.A., Precioso, F., Staccini, P., Riveill, M. (2021). DiagnoseNET: Automatic Framework to Scale Neural Networks on Heterogeneous Systems Applied to Medical Diagnosis. In: Kim, H., Kim, K.J. (eds) IT Convergence and Security. Lecture Notes in Electrical Engineering, vol 712. Springer, Singapore. https://doi.org/10.1007/978-981-15-9354-3_1
Download citation
DOI: https://doi.org/10.1007/978-981-15-9354-3_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9353-6
Online ISBN: 978-981-15-9354-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)