Visual question answering on blood smear images using convolutional block attention module powered object detection

Lubna, A.; Kalady, Saidalavi; Lijiya, A.

doi:10.1007/s00371-024-03359-6

Visual question answering on blood smear images using convolutional block attention module powered object detection

Original article
Published: 09 April 2024

(2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

87 Accesses
1 Altmetric
Explore all metrics

Abstract

One of the vital characteristics that determine the health condition of a person is the shape and number of the red blood cells, white blood cells and platelets present in one’s blood. Any abnormality in these characteristics is an indication of the person suffering from diseases like anaemia, leukaemia or thrombocytosis. The counting of the blood cell is conventionally made by means of microscopic studies with the application of suitable chemical substances in the blood. The conventional methods pose challenges in the analysis in terms of manual labour and are time-consuming and costly tasks requiring highly skilled medical professionals. This paper proposes a novel scheme to analyse the blood sample of an individual by employing a visual question answering (VQA) system, which accepts a blood smear image as input and answers questions pertaining to the sample, viz. amount of blood cells, nature of abnormalities, etc. very quickly without requiring the service of a skilled medical professional. In VQA, the computer generates textual answers to questions about an input image. Solving this difficult problem requires visual understanding, question comprehension and deductive reasoning. The proposed approach exploits a convolutional neural network for question categorisation and an object detector with an attention mechanism for visual comprehension. The experiment has been conducted with two types of attention: (1) convolutional block attention module and (2) squeeze-and-excitation network which facilitates very fast and reliable results. A VQA dataset has been created for this study due to the unavailability of a public dataset, and the proposed system exhibited an accuracy of 94% for numeric response questions/yes or no type questions and has a BLEU score of 0.91. It is also observed that the attention-based object recognition model of the proposed system for counting the blood characteristics has an accuracy of 97%, 100% and 98% for red blood cell count, white blood cell count and platelet count, respectively, which is an improvement of 1%, 0.06% and 1.61% as compared to the state-of-the-art model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

White Blood Cell Detection and Classification in Blood Smear Images Using a One-Stage Object Detector and Similarity Learning

Identification and Counting of Blood Cells Using Machine Learning and Image Processing

An hybrid soft attention based XGBoost model for classification of poikilocytosis blood cells

Article 06 November 2023

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Acevedo, A., Merino, A., Alférez, S., et al.: A dataset of microscopic peripheral blood cell images for development of automatic recognition systems. Data Br. 30, 105474 (2020)
Article Google Scholar
Alam, M.M., Islam, M.T.: Machine learning approach of automatic identification and counting of blood cells. Healthc. Technol. Lett. 6(4), 103–108 (2019)
Article Google Scholar
Alomari, Y., Abdullah, S.N.H.S., Azma, R.Z., et al.: Automatic detection and quantification of WBCs and RBCs using iterative structured circle detection algorithm. Comput. Math. Methods Med. 2014, 17 (2014). https://doi.org/10.1155/2014/979302
Article Google Scholar
Anderson, P., He, X., Buehler, C., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018. Computer Vision Foundation/IEEE Computer Society, pp. 6077–6086. https://doi.org/10.1109/CVPR.2018.00636 (2018)
Antol, S., Agrawal, A., Lu, J., et al.: VQA: visual question answering. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015. IEEE Computer Society, pp. 2425–2433. https://doi.org/10.1109/ICCV.2015.279 (2015)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Borji, A., Cheng, M., Hou, Q., et al.: Salient object detection: a survey. Comput. Vis. Media 5(2), 117–150 (2019). https://doi.org/10.1007/s41095-019-0149-9
Article Google Scholar
Chappuis, C., Zermatten, V., Lobry, S., et al.: Prompt-RSVQA: prompting visual context to a language model for remote sensing visual question answering. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2022, New Orleans, LA, USA, June 19–20, 2022, pp. 1371–1380. IEEE (2022). https://doi.org/10.1109/CVPRW56347.2022.00143
Chaudhary, A.H., Ikhlaq, J., Iftikhar, M.A., et al.: Blood cell counting and segmentation using image processing techniques. In: Applications of Intelligent Technologies in Healthcare, pp. 87–98. Springer (2019)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 20–26 June 2005, San Diego, CA, USA, pp. 886–893. IEEE Computer Society (2005). https://doi.org/10.1109/CVPR.2005.177
Dvanesh, V.D., Lakshmi, P.S., Reddy, K., et al.: Blood cell count using digital image processing. In: 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), pp. 1–7. IEEE (2018)
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Gasmi, K.: Hybrid deep learning model for answering visual medical questions. J. Supercomput. 78(13), 15042–15059 (2022). https://doi.org/10.1007/s11227-022-04474-8
Article Google Scholar
Gasmi, K., Ltaifa, I.B., Lejeune, G., et al.: Optimal deep neural network-based model for answering visual medical question. Cybern. Syst. 53(5), 403–424 (2022). https://doi.org/10.1080/01969722.2021.2018543
Article Google Scholar
Girshick, R.B.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015, pp. 1440–1448. IEEE Computer Society (2015). https://doi.org/10.1109/ICCV.2015.169
Girshick, R.B., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23–28, 2014, pp. 580–587. IEEE Computer Society (2014). https://doi.org/10.1109/CVPR.2014.81
Guo, M., Xu, T., Liu, J., et al.: Attention mechanisms in computer vision: a survey. Comput. Vis. Media 8(3), 331–368 (2022). https://doi.org/10.1007/s41095-022-0271-y
Article Google Scholar
Guo, Z., Han, D.: Multi-modal co-attention relation networks for visual question answering. Vis. Comput. 39(11), 5783–5795 (2023). https://doi.org/10.1007/S00371-022-02695-9
Article Google Scholar
He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22–29, 2017, pp. 2980–2988. IEEE Computer Society (2017). https://doi.org/10.1109/ICCV.2017.322
Hosseinabad, S.H., Safayani, M., Mirzaei, A.: Multiple answers to a question: a new approach for visual question answering. Vis. Comput. 37(1), 119–131 (2021). https://doi.org/10.1007/S00371-019-01786-4
Article Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp. 7132–7141. Computer Vision Foundation/IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00745
Inchur, V.B., Praveen, L., Shankpal, P.: Implementation of blood cell counting algorithm using digital image processing techniques. In: 2020 International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT), pp. 21–26. IEEE (2020)
Jiang, Z., Liu, X., Yan, Z., et al.: Improved detection performance in blood cell count by an attention-guided deep learning method. OSA Continuum 4(2), 323–333 (2021)
Article Google Scholar
Jocher, G., Nishimura, K., Mineeva, T., et al.: Yolov5 (2020). GitHub repository: URL: https://github.com/ultralytics/yolov5 (2020)
Khan, A.U., Kuehne, H., Duarte, K., et al.: Found a reason for me? Weakly-supervised grounded visual question answering using capsules. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19–25, 2021, pp. 8465–8474. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.00836
Lin, T., Goyal, P., Girshick, R.B., et al.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 318–327 (2020). https://doi.org/10.1109/TPAMI.2018.2858826
Article Google Scholar
Liu, S., Zhang, X., Zhou, X., et al.: BPI-MVQA: a bi-branch model for medical visual question answering. BMC Med. Imaging 22(1), 79 (2022). https://doi.org/10.1186/s12880-022-00800-x
Article Google Scholar
Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., et al. (eds.) Computer Vision—ECCV 2016—14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I, Lecture Notes in Computer Science, vol. 9905, pp. 21–37. Springer (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Maitra, M., Gupta, R.K., Mukherjee, M.: Detection and counting of red blood cells in blood cell images using hough transform. Int. J. Comput. Appl. 53(16), 13–17 (2012)
Google Scholar
Malinowski, M., Rohrbach, M., Fritz, M.: Ask your neurons: a neural-based approach to answering questions about images. arxiv:1505.01121 (2015)
Papineni, K., Roukos, S., Ward, T., et al.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6–12, 2002, Philadelphia, PA, USA, pp. 311–318. ACL (2002). https://doi.org/10.3115/1073083.1073135
Patgiri, C., Ganguly, A.: Adaptive thresholding technique based classification of red blood cell and sickle cell using Naïve Bayes classifier and k-nearest neighbor classifier. Biomed. Signal Process. Control 68, 102745 (2021). https://doi.org/10.1016/j.bspc.2021.102745
Article Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arxiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R.B., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., et al. (eds.) Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada, pp. 91–99 (2015)
Shao, X., Xiang, Z., Li, Y.: Visual question answering with gated relation-aware auxiliary. IET Image Process. 16(5), 1424–1432 (2022). https://doi.org/10.1049/ipr2.12421
Article Google Scholar
Shenggan: BCCD dataset. https://github.com/Shenggan/BCCD_Dataset (2017)
Teney, D., Wu, Q., van den Hengel, A.: Visual question answering: a tutorial. IEEE Signal Process. Mag. 34(6), 63–75 (2017). https://doi.org/10.1109/MSP.2017.2739826
Article Google Scholar
Trott, A., Xiong, C., Socher, R.: Interpretable counting for visual question answering. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, Conference Track Proceedings. OpenReview.net (2018)
Viola, P.A., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004). https://doi.org/10.1023/B:VISI.0000013087.49260.fb
Article Google Scholar
Wang, W., Zhu, L., Zuo, et al.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Woo, S., Park, J., Lee, J., et al.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., et al. (eds.) Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII, Lecture Notes in Computer Science, vol. 11211, pp. 3–19. Springer (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Wu, J., Du, J., Wang, F., et al.: A multimodal attention fusion network with a dynamic vocabulary for TextVQA. Pattern Recognit. 122, 108214 (2022). https://doi.org/10.1016/j.patcog.2021.108214
Article Google Scholar
Wu, Q., Wang, P., Wang, X., He, X., Zhu, W. Text-based VQA. In: Visual Question Answering. Advances in Computer Vision and Pattern Recognition. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-0964-1_12
Yan, F., Silamu, W., Li, Y., et al.: SPCA-Net: a based on spatial position relationship co-attention network for visual question answering. Vis. Comput. 38(9), 3097–3108 (2022). https://doi.org/10.1007/S00371-022-02524-Z
Article Google Scholar
Yuan, Z., Mou, L., Wang, Q., et al.: From easy to hard: learning language-guided curriculum for visual question answering on remote sensing data. IEEE Trans. Geosci. Remote Sens. 60, 1–11 (2022). https://doi.org/10.1109/TGRS.2022.3173811
Article Google Scholar
Zhan, H., Xiong, P., Wang, X., et al.: Visual question answering by pattern matching and reasoning. Neurocomputing 467, 323–336 (2022). https://doi.org/10.1016/j.neucom.2021.10.016
Article Google Scholar
Zhang, Y., Hare, J.S., Prügel-Bennett, A.: Learning to count objects in natural images for visual question answering. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, Conference Track Proceedings. OpenReview.net (2018)

Download references

Acknowledgements

Mr. Arjun Babu Anand, Ms. Neha Ann Jacob and Mr. Vishnu Sajith (graduate students at NITC) are acknowledged for their manual annotation of the dataset and initial testing. Mrs. Safnas (laboratory technician) is thanked for assisting us with the manual annotation of blood cell images by providing proper direction and knowledge.

Author information

Saidalavi Kalady and A. Lijiya have contributed equally to this work.

Authors and Affiliations

Computer Science and Engineering Department, National Institute of Technology Calicut, NITC P O, Kozhikode, Kerala, 673601, India
A. Lubna, Saidalavi Kalady & A. Lijiya

Authors

A. Lubna
View author publications
You can also search for this author in PubMed Google Scholar
Saidalavi Kalady
View author publications
You can also search for this author in PubMed Google Scholar
A. Lijiya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Lubna.

Ethics declarations

Conflict of interest

All authors declare there is no any financial or personal relationship with other people or organisations that could inappropriately influence the presented work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lubna, A., Kalady, S. & Lijiya, A. Visual question answering on blood smear images using convolutional block attention module powered object detection. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03359-6

Download citation

Accepted: 01 March 2024
Published: 09 April 2024
DOI: https://doi.org/10.1007/s00371-024-03359-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual question answering on blood smear images using convolutional block attention module powered object detection

Abstract

Access this article

Similar content being viewed by others

White Blood Cell Detection and Classification in Blood Smear Images Using a One-Stage Object Detector and Similarity Learning

Identification and Counting of Blood Cells Using Machine Learning and Image Processing

An hybrid soft attention based XGBoost model for classification of poikilocytosis blood cells

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Visual question answering on blood smear images using convolutional block attention module powered object detection

Abstract

Access this article

Similar content being viewed by others

White Blood Cell Detection and Classification in Blood Smear Images Using a One-Stage Object Detector and Similarity Learning

Identification and Counting of Blood Cells Using Machine Learning and Image Processing

An hybrid soft attention based XGBoost model for classification of poikilocytosis blood cells

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation