Skip to main content
Log in

Lw-TISNet: Light-Weight Convolutional Neural Network Incorporating Attention Mechanism and Multiple Supervision Strategy for Tongue Image Segmentation

  • Original Paper
  • Published:
Sensing and Imaging Aims and scope Submit manuscript

Abstract

Segmenting the tongue body is an essential step for automated tongue diagnosis, which is a challenge task due to the tongue body’s specificity and heterogeneity. The current deep-learning based tongue image segmentation networks are bloated with high computational complexity. In this study, a light-weight segmentation network for tongue images is proposed under the basic encoder-decoder framework, in which MobileNet v2 is adopted as the backbone network, due to its few parameters and low computational complexity. The high-level semantic information and low-level positional information are combined together to detect the tongue body’s boundary. And the dilated convolution operations are performed on the final feature maps of the network to enlarge the receptive field, so as to capture rich global semantic information. An attention mechanism is embedded to re-calibrate the feature maps spatially and channel-wise to enhance important features for the segmentation task, while suppressing the irrelevant ones. Moreover, a supervision output is added to each level of the decoder to guide the network to capture both the local and global image features for accurate tongue image segmentation. All supervision outputs are fused to produce good segmented results. The quantitative and qualitative results on two tongue datasets indicate that the proposed network can achieve a competitive performance with smaller model size and lower computational cost. The proposed method could accurately extract the tongue body, which can fully meet the requirements of practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Liu, Y. Q., Wang, Y. X., Shi, N. N., Han, X. J., & Lu, A. P. (2016). Current situation of international organization for standardization/technical committee 249 international standards of traditional Chinese medicine. Chinese Journal of Integrative Medicine, 23(5), 376–380.

    Article  Google Scholar 

  2. Tania, M. H., Lwin, K., & Hossain, M. A. (2018). Advances in automated tongue diagnosis techniques. Integrative Medicine Research, 8(1), 42–56.

    Article  Google Scholar 

  3. Chiu, C. C. (2000). A novel approach based on computerized image analysis for traditional Chinese medical diagnosis of the tongue. Computer Methods and Programs in Biomedicine, 61(2), 77–89.

    Article  Google Scholar 

  4. Oji, T., Namiki, T., Nakaguchi, T., Ueda, K., Takeda, K., Nakamura, M., Okamoto, H., & Hirasaki, Y. (2014). Study of factors involved in tongue color diagnosis by kampo medical practitioners using the farnsworth-munsell 100 hue test and tongue color images. Evid Based Complement Alternat Med, 2014(3), 1–9.

    Article  Google Scholar 

  5. Wang, Y., Zhou, Y., Yang, J., & Xu, Q. (2004). An image analysis system for tongue diagnosis in traditional Chinese medicine. In: Computational and Information Science, First International Symposium, {CIS} 2004, Shanghai, China, December 16–18, 2004. pp. 1181–1186.

  6. Kim, J. S., Han, G.-J., Choi, B.-H., Park, J.-W., Park, K., Yeo, I.-K., & Ryu, B.-H. (2012). Development of differential criteria on tongue coating thickness in tongue diagnosis. Complementary Therapies in Medicine, 20(5), 316–322.

    Article  Google Scholar 

  7. Zhang, D., Zhang, H., & Zhang, B. (2017). Tongue image analysis. Springer.

    Book  Google Scholar 

  8. Ning, J. F., Zhang, D., Wu, C., & Yue, F. (2012). Automatic tongue image segmentation based on gradient vector flow and region merging. Neural Computing and Applications, 21(8), 1819–1826.

    Article  Google Scholar 

  9. Wu, J., Zhang, Y., & Bai, J. (2005). Tongue area extraction in tongue diagnosis of traditional Chinese medicine. In: 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, 2005. pp. 4955–4957.

  10. Wu, K. B., & Zhang, D. (2015). Robust tongue segmentation by fusing region-based and edge-based approaches. Expert Systems with Applications, 42(21), 8027–8038.

    Article  Google Scholar 

  11. Kass, M., Witkin, A., & Terzopoulos, D. (1988). Snakes: Active contour models. International Journal of Computer Vision, 1(4), 321–331.

    Article  MATH  Google Scholar 

  12. Pang, B., Zhang, D., & Wang, K. (2005). The bi-elliptical deformable contour and its application to automated tongue segmentation in Chinese medicine. IEEE Transactions on Medical Imaging, 24(8), 946–956.

    Article  Google Scholar 

  13. Zuo, W., Wang, K., Zhang, D., & Zhang, H. (2004). Combination of polar edge detection and active contour model for automated tongue segmentation. In: Third International Conference on Image and Graphics (ICIG'04), Hong Kong, China, December 18–20, 2004(pp. 270–273).

  14. Yu, S., Yang, J., Wang, Y., & Zhang, Y. (2007). Color active contour models based tongue segmentation in traditional Chinese medicine. In: International Conference on Bioinformatics and Biomedical Engineering, 2007. pp. 1065–1068.

  15. Shi, M., Li, G., Li, F., & Chao, X. (2012). A novel tongue segmentation approach utilizing double geodesic flow. In: 7th International Conference on Computer Science and Education, 2012(pp. 21–25).

  16. Jiang, L., Xu, B., Ban, X., Ping, T., & Ma, B. (2017). A tongue image segmentation method based on enhanced HSV convolutional neural network. In: International Conference on Cooperative Design, Visualization and Engineering, Mallorca, Spain, September 17–20, 2017. pp. 252–260.

  17. Lin, B., Xle, J., Li, C., & Qu, Y. (2018). Deeptongue: Tongue segmentation via resnet. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, AB, Canada, April 15–20, 2018. pp. 1035–1039.

  18. Qu, P., Hui, Z., Li, Z., Jing, Z., & Chen, G. (2017). Automatic tongue image segmentation for traditional Chinese medicine using deep neural network. In: International Conference on Intelligent Computing Theories and Application, Liverpool, UK, August 7–10, 2017. pp. 247–259.

  19. Li, X., Yang, D., Wang, Y., Yang, S., Qi, L., Li, F., Gan, Z., & Zhang, W. (2019). Automatic tongue image segmentation for real-time remote diagnosis. In: IEEE International Conference on Bioinformatics and Biomedicine-BIBM, San Diego, CA, USA, November 18–21, 2019. pp. 409–414.

  20. Zhou, J., Zhang, Q., Zhang, B., & Chen, X. (2019). TongueNet: A precise and fast tongue segmentation system using u-net with a morphological processing layer. Applied Sciences, 9(15), 3128–3136.

    Article  Google Scholar 

  21. Cai, Y., Wang, T., Liu, W., & Luo, Z. (2020). A robust interclass and intraclass loss function for deep learning based tongue segmentation. Concurrency and Computation: Practice and Experience, 32(22), e5849.

    Article  Google Scholar 

  22. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Computer Vision - (ECCV) 2018 - 15th European Conference, Munich, Germany, September 8–14, 2018. pp. 833–851.

  23. Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A Deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–2495.

    Article  Google Scholar 

  24. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, June 7–12, 2015. pp. 3431--3440.

  25. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, San Diego, CA, USA, May 7–9, 2015.

  26. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, June 27–30, 2016. pp. 770–778.

  27. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L.-C. (2018). MobileNetV2: Inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, June 18–22, 2018. pp. 4510–4520.

  28. Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, June 18–22, 2018. pp. 7132–7141.

  29. Roy, A. G., Navab, N., & Wachinger, C. (2019). Recalibrating fully convolutional networks with spatial and channel squeeze and excitation blocks. IEEE Transactions on Medical Imaging, 38(2), 540–549.

    Article  Google Scholar 

  30. Huang, L., Xia, W., Zhang, B., Qiu, B., & Gao, X. (2017). MSFCN-multiple supervised fully convolutional networks for the osteosarcoma segmentation of CT images. Computer Methods and Programs in Biomedicine, 143, 67–74.

    Article  Google Scholar 

  31. Crum, W. R., Camara, O., & Hill, D. L. G. (2006). Generalized overlap measures for evaluation and validation in medical image analysis. IEEE Transactions on Medical Imaging, 25(11), 1451–1461.

    Article  Google Scholar 

  32. Taha, A. A., & Hanbury, A. (2015). Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool. BMC Medical Imaging, 15(8), 29.

    Article  Google Scholar 

  33. Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid Scene Parsing Network. In: IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, July 21–26, 2017. pp. 6230–6239.

  34. Wu, T., Tang, S., Zhang, R., & Zhang, Y. (2021). CGNet: A light-weight context guided network for semantic segmentation. IEEE Transactions on Image Processing, 30, 1169–1179.

    Article  Google Scholar 

  35. Romera, E., Álvarez, J. M., Bergasa, L. M., & Arroyo, R. (2018). ERFNet: Efficient residual factorized ConvNet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 19(1), 263–272.

    Article  Google Scholar 

  36. Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., & Sang, N. (2018). BiSeNet: Bilateral segmentation network for real-time semantic segmentation. In: Computer Vision - (ECCV) 2018 - 15th European Conference, Munich, Germany, September 8–14, 2018. pp. 334–349.

  37. Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., & Torralba, A. (2019). Semantic understanding of scenes through the ADE20K dataset. International Journal of Computer Vision, 127(3), 302–321.

    Article  Google Scholar 

  38. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, June 27–30, 2016. pp. 3213–3223.

  39. Brostow, G.J., Shotton, J., Fauqueur, J., & Cipolla, R. (2008). Segmentation and recognition using structure from motion point clouds. In: Computer Vision - (ECCV) 2008, 10th European Conference on Computer Vision, Marseille, France, October 12–18, 2008. pp. 44–57.

  40. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention- (MICCAI), Munich, Germany, October 5–9, 2015. pp. 234–241.

Download references

Acknowledgements

This study is supported by the National Natural Science Foundation of China (No.61871006). The authors thank Dr. Xiaopeng Zheng, Dr. Wenqiang Chen and Dr. Wenjing Chen from Beijing Xuanwu Hospital for their help in capturing the tongue images and guiding in Traditional Chinese Medicine.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Zhuo.

Ethics declarations

Conflict of interest

Authors declare that they have no conflicts of interest.

Ethical Approval

This study does not contain any study with human or animal participants.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, X., Zhuo, L., Zhang, H. et al. Lw-TISNet: Light-Weight Convolutional Neural Network Incorporating Attention Mechanism and Multiple Supervision Strategy for Tongue Image Segmentation. Sens Imaging 23, 6 (2022). https://doi.org/10.1007/s11220-021-00375-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11220-021-00375-x

Keywords

Navigation