Skip to main content

Thick Data Analytics for Small Training Samples Using Siamese Neural Network and Image Augmentation

  • Conference paper
  • First Online:
LISS 2021

Abstract

Although machine learning and deep learning has provided solutions and effective predictions to a variety of complex tasks, it requires to be trained with large amount of labeled data in order to make the learning models perform with high accuracy. In many applications such as in healthcare and medical imaging, collecting big amount of data is sometimes not feasible. Thick data analytics is an attempt to solve this challenge by incorporating additional qualitative interventions such as involving expert’s heuristics to annotate and augment the training data. In this article, we are embarking on an investigation to involve the heuristics of a human radiologist in identifying COVID-19 few cases of CT-Scans imaging through the use of groups of image annotation and augmentation techniques. The identification of new COVID-19 is carried out utilizing unique structure Siamese network to rank similarity between new COVID-19 CT Scan images and images determined as COVID provided by the radiologist. The Siamese network extracts the features of the augmented images compared to the new CT-Scan image to determine whether the new image is COVID-19 positive using a similarity ratio. The results show that the proposed model of using the augmentation heuristics trained on small dataset outperforms the advanced models that are trained on datasets containing large numbers of samples. This article starts by answering key questions on why we need CT-Scans for COVID-19 diagnosis and what is the notion of Thick Data and the use of image augmentation as heuristics as well as what is the role of Siamese Neural Network in learning from small samples. Based on answering these questions, the analytics method described in this paper will have better justification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Robert, H., Shmerling, M.D.: Which test is best for COVID-19? Harverd Medical School. Accessed 30 Sept 2020, https://www.health.harvard.edu/blog/which-test-is-best-for-covid-19-2020081020734

  2. FDA: Potential for False Positive Results with Antigen Tests for Rapid Detection of SARS-CoV-2 - Letter to Clinical Laboratory Staff and Health Care Providers (2020). Accessed 11Mar 2020, https://www.fda.gov/medical-devices/letters-health-care-providers/potential-false-positive-results-antigen-tests-rapid-detection-sars-cov-2-letter-clinical-laboratory

  3. Radiological Society of North America, CT provides best diagnosis for COVID-19. Accessed 26 Feb 2020, www.sciencedaily.com/releases/2020/02/200226151951.htm

  4. SHARON BEGLEY, Covid-19 testing issues could sink plans to re-open the country. Might CT scans help? Accessed 16 Apr 2020, https://www.statnews.com/2020/04/16/ct-scans-alternative-to-inaccurate-coronavirus-tests/

  5. Fiaidhi, J.: Envisioning insight-driven learning based on thick data analytics with focus on healthcare. IEEE Access 8, 114998–115004 (2020)

    Article  Google Scholar 

  6. Wang, T.: Big data needs thick data. Ethnography Matters 13 (2013). https://medium.com/ethnography-matters/why-big-data-needs-thick-data-b4b3e75e3d7

  7. Der, J.: What are thick data? Medium.com. Accessed 5 Nov 2017, https://medium.com/@jder00/what-are-thick-data-6ed5178d1dd

  8. Grosjean, S., Mallowan, M., Marcon, C.: Methods and strategies of information management by organizations: from big data to “thick data”. In: ACFAS Congress, 11–12 May 2017 (2017). https://www.acfas.ca/evenements/congres/programme/85/400/405/c?ancre=522

  9. Fiaidhi, J., Mohammed, S., Fong, S.S.: Orchestration of thick data analytics based on conversational workflows in healthcare community of practice. In: IEEE Big Data 2020 Conference, 3rd SI on HealthCare Data, 10–13 December 2020 (2020)

    Google Scholar 

  10. Fiaidhi, J., Mohammed, S.: Submitted to the 2020 WS-9 SAC Symposium on e-Health, IEEE International Conference on Communications (IEEE ICC 2021) , Montreal, Canada, 14–18 June 2021 (2021)

    Google Scholar 

  11. Zhao, J., Zhang, Y., He, X., Xie, P.: COVID-CT-dataset: a CT scan dataset about COVID-19. arXiv preprint arXiv:2003.13865 (2020)

  12. Feng, S., Zhou, H., Dong, H.: Using deep neural network with small dataset to predict material defects. Mater. Des. 162, 300–310 (2019)

    Article  Google Scholar 

  13. Figueroa-Mata, G., Mata-Montero, E.: Using a convolutional siamese network for image-based plant species identification with small datasets. Biomimetics 5(1), 8 (2020)

    Article  Google Scholar 

  14. Li, M.D., et al.: Automated assessment and tracking of COVID-19 pulmonary disease severity on chest radiographs using convolutional siamese neural networks. Radiol. Artif. Intell. 2(4), e200079 (2020)

    Article  Google Scholar 

  15. Imani, M.: Automatic diagnosis of coronavirus (COVID-19) using shape and texture characteristics extracted from X-Ray and CT-Scan images. Biomed. Signal Process. Control 68, 102602 (2021)

    Article  Google Scholar 

  16. Mohammad, S., Hossain, M.S.: MetaCOVID: a siamese neural network framework with contrastive loss for n-shot diagnosis of COVID-19 patients. Pattern Recogn. 113, 107700 (2021)

    Article  Google Scholar 

  17. Zhao, J., Zhang, Y., He, X., Xie, P.: COVID-CT-Dataset: a CT scan dataset about COVID-19. arXiv preprint arXiv:2003.13865 (2020)

  18. Eduardo, S., Angelov, P., Biaso, S., Froes, M.H., Abe, D.K.: SARS-CoV-2 CT-scan dataset: a large dataset of real patients CT scans for SARS-CoV-2 identification. medRxiv (2020)

    Google Scholar 

  19. Mishra, A.K., Das, S.K., Roy, P., Bandyopadhyay, S.: Identifying COVID19 from chest CT images: a deep convolutional neural networks based approach. J. Healthcare Eng. 2020, 1–7 (2020)

    Article  Google Scholar 

  20. Silva, P., et al.: COVID-19 detection in CT images with deep learning. Inf. Med. Unlocked 20, 100427 (2020)

    Article  Google Scholar 

  21. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)

    Article  Google Scholar 

  22. Casado-García, Á., et al.: CLoDSA: a tool for augmentation in classification, localization, detection, semantic segmentation and instance segmentation tasks. BMC Bioinf. 20(1), 1–14 (2019)

    Article  Google Scholar 

  23. Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6023–6032 (2019)

    Google Scholar 

  24. Kiela, D., Bottou, L.: Learning image embeddings using convolutional neural networks for improved multi-modal semantics. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 36–45 (2014)

    Google Scholar 

  25. Carvalho, A.R.S., et al.: COVID-19 chest computed tomography to stratify severity and disease extension by artificial neural network computer-aided diagnosis. Front. Med. 7 (2020)

    Google Scholar 

  26. Hsu, C.-C., Lin, C.-W., Weng-Tai, S., Cheung, G.: Sigan: siamese generative adversarial network for identity-preserving face hallucination. IEEE Trans. Image Process. 28(12), 6225–6236 (2019)

    Article  Google Scholar 

  27. Subsoontorn, P., Lohitnavy, M., Kongkaew, C.: The diagnostic accuracy of isothermal nucleic acid point-of-care tests for human coronaviruses: a systematic review and meta-analysis. Sci. Rep. 10(1), 1–13 (2020)

    Article  Google Scholar 

  28. Dinnes, J., Deeks, J.J., Berhane, S.: How accurate are rapid tests for diagnosing COVID-19? Cochrane Podcast. Accessed 24 Mar 2021, https://www.cochrane.org/CD013705/INFECTN_how-accurate-are-rapid-tests-diagnosing-covid-19

  29. Francone, M., et al.: Chest CT score in COVID-19 patients: correlation with disease severity and short-term prognosis. Eur. Radiol. 30(12), 6808–6817 (2020). https://doi.org/10.1007/s00330-020-07033-y

    Article  Google Scholar 

  30. Wang, T.: We need to invest in socialware just as much as we invest in hardware. Accessed 30 Aug 2021, https://www.triciawang.com/about

  31. Wang, T.: Big data needs thick data. Ethnogr. Matters 13 (2013)

    Google Scholar 

Download references

Acknowledgment

The first author would like to thank NSERC for supporting this research through NSERC DDG-2020–00037.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinan Fiaidhi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fiaidhi, J., Sawyer, D., Mohammed, S. (2022). Thick Data Analytics for Small Training Samples Using Siamese Neural Network and Image Augmentation. In: Shi, X., Bohács, G., Ma, Y., Gong, D., Shang, X. (eds) LISS 2021. Lecture Notes in Operations Research. Springer, Singapore. https://doi.org/10.1007/978-981-16-8656-6_6

Download citation

Publish with us

Policies and ethics