A Diagnostic Report Generator from CT Volumes on Liver Tumor with Semi-supervised Attention Mechanism

  • Jiang Tian
  • Cong Li
  • Zhongchao Shi
  • Feiyu Xu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11071)


Automatically generating diagnostic reports with interpretability for computed tomography (CT) volumes is a new challenge for the computer-aided diagnosis (CAD). In this paper, we propose a novel multimodal data and knowledge linking framework between CT volumes and textual reports with a semi-supervised attention mechanism. This multimodal framework includes a CT slices segmentation model and a language model. Semi-supervised attention mechanism paves the way for visually interpreting the underlying reasons that support the diagnosis results. This multi-task deep neural network is trained end-to-end. We not only quantitatively evaluate our system performance (76.6% in terms of BLEU@4), but also qualitatively visualize the attention heat map for this framework on a liver tumor dataset.


  1. 1.
    Xu, T., Zhang, H., Huang, X., Zhang, S., Metaxas, D.N.: Multimodal deep learning for cervical dysplasia diagnosis. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 115–123. Springer, Cham (2016). Scholar
  2. 2.
    Zhang, Z., Xie, Y., Xing, F., Mcgough, M., Yang, L.: Mdnet: a semantically and visually interpretable medical image diagnosis network. In: CVPR, pp. 6428–6436 (2017)Google Scholar
  3. 3.
    Zhang, Z., Chen, P., Sapkota, M., Yang, L.: TandemNet: distilling knowledge from medical images using diagnostic reports as optional semantic references. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 320–328. Springer, Cham (2017). Scholar
  4. 4.
    Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR, pp. 3156–3164 (2015)Google Scholar
  5. 5.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL, pp. 311–318 (2002)Google Scholar
  6. 6.
    Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries, pp. 74–81. In: ACL Workshop (2004)Google Scholar
  7. 7.
    Yao, L., et al.: Describing videos by exploiting temporal structure. In: ICCV, pp. 4507–4515 (2015)Google Scholar
  8. 8.
    Liu, C., Mao, J., Sha, F., Yuille, A.: Attention correctness in neural image captioning. In: AAAI, pp. 4176–4182 (2017)Google Scholar
  9. 9.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). Scholar
  10. 10.
    Zintgraf, L.M., Cohen, T.S., Welling, M.: A new method to visualize deep neural networks. arXiv preprint arXiv:1603.02518 (2016)
  11. 11.
    Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: ICML, pp. 2048–2057 (2015)Google Scholar
  12. 12.

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.AI LabLenovo ResearchBeijingChina

Personalised recommendations