Skip to main content
Log in

Visual question answering on blood smear images using convolutional block attention module powered object detection

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

One of the vital characteristics that determine the health condition of a person is the shape and number of the red blood cells, white blood cells and platelets present in one’s blood. Any abnormality in these characteristics is an indication of the person suffering from diseases like anaemia, leukaemia or thrombocytosis. The counting of the blood cell is conventionally made by means of microscopic studies with the application of suitable chemical substances in the blood. The conventional methods pose challenges in the analysis in terms of manual labour and are time-consuming and costly tasks requiring highly skilled medical professionals. This paper proposes a novel scheme to analyse the blood sample of an individual by employing a visual question answering (VQA) system, which accepts a blood smear image as input and answers questions pertaining to the sample, viz. amount of blood cells, nature of abnormalities, etc. very quickly without requiring the service of a skilled medical professional. In VQA, the computer generates textual answers to questions about an input image. Solving this difficult problem requires visual understanding, question comprehension and deductive reasoning. The proposed approach exploits a convolutional neural network for question categorisation and an object detector with an attention mechanism for visual comprehension. The experiment has been conducted with two types of attention: (1) convolutional block attention module and (2) squeeze-and-excitation network which facilitates very fast and reliable results. A VQA dataset has been created for this study due to the unavailability of a public dataset, and the proposed system exhibited an accuracy of 94% for numeric response questions/yes or no type questions and has a BLEU score of 0.91. It is also observed that the attention-based object recognition model of the proposed system for counting the blood characteristics has an accuracy of 97%, 100% and 98% for red blood cell count, white blood cell count and platelet count, respectively, which is an improvement of 1%, 0.06% and 1.61% as compared to the state-of-the-art model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Algorithm 1
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Acevedo, A., Merino, A., Alférez, S., et al.: A dataset of microscopic peripheral blood cell images for development of automatic recognition systems. Data Br. 30, 105474 (2020)

    Article  Google Scholar 

  2. Alam, M.M., Islam, M.T.: Machine learning approach of automatic identification and counting of blood cells. Healthc. Technol. Lett. 6(4), 103–108 (2019)

    Article  Google Scholar 

  3. Alomari, Y., Abdullah, S.N.H.S., Azma, R.Z., et al.: Automatic detection and quantification of WBCs and RBCs using iterative structured circle detection algorithm. Comput. Math. Methods Med. 2014, 17 (2014). https://doi.org/10.1155/2014/979302

    Article  Google Scholar 

  4. Anderson, P., He, X., Buehler, C., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018. Computer Vision Foundation/IEEE Computer Society, pp. 6077–6086. https://doi.org/10.1109/CVPR.2018.00636 (2018)

  5. Antol, S., Agrawal, A., Lu, J., et al.: VQA: visual question answering. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015. IEEE Computer Society, pp. 2425–2433. https://doi.org/10.1109/ICCV.2015.279 (2015)

  6. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  7. Borji, A., Cheng, M., Hou, Q., et al.: Salient object detection: a survey. Comput. Vis. Media 5(2), 117–150 (2019). https://doi.org/10.1007/s41095-019-0149-9

    Article  Google Scholar 

  8. Chappuis, C., Zermatten, V., Lobry, S., et al.: Prompt-RSVQA: prompting visual context to a language model for remote sensing visual question answering. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2022, New Orleans, LA, USA, June 19–20, 2022, pp. 1371–1380. IEEE (2022). https://doi.org/10.1109/CVPRW56347.2022.00143

  9. Chaudhary, A.H., Ikhlaq, J., Iftikhar, M.A., et al.: Blood cell counting and segmentation using image processing techniques. In: Applications of Intelligent Technologies in Healthcare, pp. 87–98. Springer (2019)

  10. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 20–26 June 2005, San Diego, CA, USA, pp. 886–893. IEEE Computer Society (2005). https://doi.org/10.1109/CVPR.2005.177

  11. Dvanesh, V.D., Lakshmi, P.S., Reddy, K., et al.: Blood cell count using digital image processing. In: 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), pp. 1–7. IEEE (2018)

  12. Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)

  13. Gasmi, K.: Hybrid deep learning model for answering visual medical questions. J. Supercomput. 78(13), 15042–15059 (2022). https://doi.org/10.1007/s11227-022-04474-8

    Article  Google Scholar 

  14. Gasmi, K., Ltaifa, I.B., Lejeune, G., et al.: Optimal deep neural network-based model for answering visual medical question. Cybern. Syst. 53(5), 403–424 (2022). https://doi.org/10.1080/01969722.2021.2018543

    Article  Google Scholar 

  15. Girshick, R.B.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015, pp. 1440–1448. IEEE Computer Society (2015). https://doi.org/10.1109/ICCV.2015.169

  16. Girshick, R.B., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23–28, 2014, pp. 580–587. IEEE Computer Society (2014). https://doi.org/10.1109/CVPR.2014.81

  17. Guo, M., Xu, T., Liu, J., et al.: Attention mechanisms in computer vision: a survey. Comput. Vis. Media 8(3), 331–368 (2022). https://doi.org/10.1007/s41095-022-0271-y

    Article  Google Scholar 

  18. Guo, Z., Han, D.: Multi-modal co-attention relation networks for visual question answering. Vis. Comput. 39(11), 5783–5795 (2023). https://doi.org/10.1007/S00371-022-02695-9

    Article  Google Scholar 

  19. He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22–29, 2017, pp. 2980–2988. IEEE Computer Society (2017). https://doi.org/10.1109/ICCV.2017.322

  20. Hosseinabad, S.H., Safayani, M., Mirzaei, A.: Multiple answers to a question: a new approach for visual question answering. Vis. Comput. 37(1), 119–131 (2021). https://doi.org/10.1007/S00371-019-01786-4

    Article  Google Scholar 

  21. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp. 7132–7141. Computer Vision Foundation/IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00745

  22. Inchur, V.B., Praveen, L., Shankpal, P.: Implementation of blood cell counting algorithm using digital image processing techniques. In: 2020 International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT), pp. 21–26. IEEE (2020)

  23. Jiang, Z., Liu, X., Yan, Z., et al.: Improved detection performance in blood cell count by an attention-guided deep learning method. OSA Continuum 4(2), 323–333 (2021)

    Article  Google Scholar 

  24. Jocher, G., Nishimura, K., Mineeva, T., et al.: Yolov5 (2020). GitHub repository: URL: https://github.com/ultralytics/yolov5 (2020)

  25. Khan, A.U., Kuehne, H., Duarte, K., et al.: Found a reason for me? Weakly-supervised grounded visual question answering using capsules. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19–25, 2021, pp. 8465–8474. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.00836

  26. Lin, T., Goyal, P., Girshick, R.B., et al.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 318–327 (2020). https://doi.org/10.1109/TPAMI.2018.2858826

    Article  Google Scholar 

  27. Liu, S., Zhang, X., Zhou, X., et al.: BPI-MVQA: a bi-branch model for medical visual question answering. BMC Med. Imaging 22(1), 79 (2022). https://doi.org/10.1186/s12880-022-00800-x

    Article  Google Scholar 

  28. Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., et al. (eds.) Computer Vision—ECCV 2016—14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I, Lecture Notes in Computer Science, vol. 9905, pp. 21–37. Springer (2016). https://doi.org/10.1007/978-3-319-46448-0_2

  29. Maitra, M., Gupta, R.K., Mukherjee, M.: Detection and counting of red blood cells in blood cell images using hough transform. Int. J. Comput. Appl. 53(16), 13–17 (2012)

    Google Scholar 

  30. Malinowski, M., Rohrbach, M., Fritz, M.: Ask your neurons: a neural-based approach to answering questions about images. arxiv:1505.01121 (2015)

  31. Papineni, K., Roukos, S., Ward, T., et al.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6–12, 2002, Philadelphia, PA, USA, pp. 311–318. ACL (2002). https://doi.org/10.3115/1073083.1073135

  32. Patgiri, C., Ganguly, A.: Adaptive thresholding technique based classification of red blood cell and sickle cell using Naïve Bayes classifier and k-nearest neighbor classifier. Biomed. Signal Process. Control 68, 102745 (2021). https://doi.org/10.1016/j.bspc.2021.102745

    Article  Google Scholar 

  33. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arxiv:1804.02767 (2018)

  34. Ren, S., He, K., Girshick, R.B., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., et al. (eds.) Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada, pp. 91–99 (2015)

  35. Shao, X., Xiang, Z., Li, Y.: Visual question answering with gated relation-aware auxiliary. IET Image Process. 16(5), 1424–1432 (2022). https://doi.org/10.1049/ipr2.12421

    Article  Google Scholar 

  36. Shenggan: BCCD dataset. https://github.com/Shenggan/BCCD_Dataset (2017)

  37. Teney, D., Wu, Q., van den Hengel, A.: Visual question answering: a tutorial. IEEE Signal Process. Mag. 34(6), 63–75 (2017). https://doi.org/10.1109/MSP.2017.2739826

    Article  Google Scholar 

  38. Trott, A., Xiong, C., Socher, R.: Interpretable counting for visual question answering. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, Conference Track Proceedings. OpenReview.net (2018)

  39. Viola, P.A., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004). https://doi.org/10.1023/B:VISI.0000013087.49260.fb

    Article  Google Scholar 

  40. Wang, W., Zhu, L., Zuo, et al.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

  41. Woo, S., Park, J., Lee, J., et al.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., et al. (eds.) Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII, Lecture Notes in Computer Science, vol. 11211, pp. 3–19. Springer (2018). https://doi.org/10.1007/978-3-030-01234-2_1

  42. Wu, J., Du, J., Wang, F., et al.: A multimodal attention fusion network with a dynamic vocabulary for TextVQA. Pattern Recognit. 122, 108214 (2022). https://doi.org/10.1016/j.patcog.2021.108214

    Article  Google Scholar 

  43. Wu, Q., Wang, P., Wang, X., He, X., Zhu, W. Text-based VQA. In: Visual Question Answering. Advances in Computer Vision and Pattern Recognition. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-0964-1_12

  44. Yan, F., Silamu, W., Li, Y., et al.: SPCA-Net: a based on spatial position relationship co-attention network for visual question answering. Vis. Comput. 38(9), 3097–3108 (2022). https://doi.org/10.1007/S00371-022-02524-Z

    Article  Google Scholar 

  45. Yuan, Z., Mou, L., Wang, Q., et al.: From easy to hard: learning language-guided curriculum for visual question answering on remote sensing data. IEEE Trans. Geosci. Remote Sens. 60, 1–11 (2022). https://doi.org/10.1109/TGRS.2022.3173811

    Article  Google Scholar 

  46. Zhan, H., Xiong, P., Wang, X., et al.: Visual question answering by pattern matching and reasoning. Neurocomputing 467, 323–336 (2022). https://doi.org/10.1016/j.neucom.2021.10.016

    Article  Google Scholar 

  47. Zhang, Y., Hare, J.S., Prügel-Bennett, A.: Learning to count objects in natural images for visual question answering. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, Conference Track Proceedings. OpenReview.net (2018)

Download references

Acknowledgements

Mr. Arjun Babu Anand, Ms. Neha Ann Jacob and Mr. Vishnu Sajith (graduate students at NITC) are acknowledged for their manual annotation of the dataset and initial testing. Mrs. Safnas (laboratory technician) is thanked for assisting us with the manual annotation of blood cell images by providing proper direction and knowledge.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Lubna.

Ethics declarations

Conflict of interest

All authors declare there is no any financial or personal relationship with other people or organisations that could inappropriately influence the presented work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lubna, A., Kalady, S. & Lijiya, A. Visual question answering on blood smear images using convolutional block attention module powered object detection. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03359-6

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-024-03359-6

Keywords

Navigation