Abstract
Although machine learning and deep learning has provided solutions and effective predictions to a variety of complex tasks, it requires to be trained with large amount of labeled data in order to make the learning models perform with high accuracy. In many applications such as in healthcare and medical imaging, collecting big amount of data is sometimes not feasible. Thick data analytics is an attempt to solve this challenge by incorporating additional qualitative interventions such as involving expert’s heuristics to annotate and augment the training data. In this article, we are embarking on an investigation to involve the heuristics of a human radiologist in identifying COVID-19 few cases of CT-Scans imaging through the use of groups of image annotation and augmentation techniques. The identification of new COVID-19 is carried out utilizing unique structure Siamese network to rank similarity between new COVID-19 CT Scan images and images determined as COVID provided by the radiologist. The Siamese network extracts the features of the augmented images compared to the new CT-Scan image to determine whether the new image is COVID-19 positive using a similarity ratio. The results show that the proposed model of using the augmentation heuristics trained on small dataset outperforms the advanced models that are trained on datasets containing large numbers of samples. This article starts by answering key questions on why we need CT-Scans for COVID-19 diagnosis and what is the notion of Thick Data and the use of image augmentation as heuristics as well as what is the role of Siamese Neural Network in learning from small samples. Based on answering these questions, the analytics method described in this paper will have better justification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Robert, H., Shmerling, M.D.: Which test is best for COVID-19? Harverd Medical School. Accessed 30 Sept 2020, https://www.health.harvard.edu/blog/which-test-is-best-for-covid-19-2020081020734
FDA: Potential for False Positive Results with Antigen Tests for Rapid Detection of SARS-CoV-2 - Letter to Clinical Laboratory Staff and Health Care Providers (2020). Accessed 11Mar 2020, https://www.fda.gov/medical-devices/letters-health-care-providers/potential-false-positive-results-antigen-tests-rapid-detection-sars-cov-2-letter-clinical-laboratory
Radiological Society of North America, CT provides best diagnosis for COVID-19. Accessed 26 Feb 2020, www.sciencedaily.com/releases/2020/02/200226151951.htm
SHARON BEGLEY, Covid-19 testing issues could sink plans to re-open the country. Might CT scans help? Accessed 16 Apr 2020, https://www.statnews.com/2020/04/16/ct-scans-alternative-to-inaccurate-coronavirus-tests/
Fiaidhi, J.: Envisioning insight-driven learning based on thick data analytics with focus on healthcare. IEEE Access 8, 114998–115004 (2020)
Wang, T.: Big data needs thick data. Ethnography Matters 13 (2013). https://medium.com/ethnography-matters/why-big-data-needs-thick-data-b4b3e75e3d7
Der, J.: What are thick data? Medium.com. Accessed 5 Nov 2017, https://medium.com/@jder00/what-are-thick-data-6ed5178d1dd
Grosjean, S., Mallowan, M., Marcon, C.: Methods and strategies of information management by organizations: from big data to “thick data”. In: ACFAS Congress, 11–12 May 2017 (2017). https://www.acfas.ca/evenements/congres/programme/85/400/405/c?ancre=522
Fiaidhi, J., Mohammed, S., Fong, S.S.: Orchestration of thick data analytics based on conversational workflows in healthcare community of practice. In: IEEE Big Data 2020 Conference, 3rd SI on HealthCare Data, 10–13 December 2020 (2020)
Fiaidhi, J., Mohammed, S.: Submitted to the 2020 WS-9 SAC Symposium on e-Health, IEEE International Conference on Communications (IEEE ICC 2021) , Montreal, Canada, 14–18 June 2021 (2021)
Zhao, J., Zhang, Y., He, X., Xie, P.: COVID-CT-dataset: a CT scan dataset about COVID-19. arXiv preprint arXiv:2003.13865 (2020)
Feng, S., Zhou, H., Dong, H.: Using deep neural network with small dataset to predict material defects. Mater. Des. 162, 300–310 (2019)
Figueroa-Mata, G., Mata-Montero, E.: Using a convolutional siamese network for image-based plant species identification with small datasets. Biomimetics 5(1), 8 (2020)
Li, M.D., et al.: Automated assessment and tracking of COVID-19 pulmonary disease severity on chest radiographs using convolutional siamese neural networks. Radiol. Artif. Intell. 2(4), e200079 (2020)
Imani, M.: Automatic diagnosis of coronavirus (COVID-19) using shape and texture characteristics extracted from X-Ray and CT-Scan images. Biomed. Signal Process. Control 68, 102602 (2021)
Mohammad, S., Hossain, M.S.: MetaCOVID: a siamese neural network framework with contrastive loss for n-shot diagnosis of COVID-19 patients. Pattern Recogn. 113, 107700 (2021)
Zhao, J., Zhang, Y., He, X., Xie, P.: COVID-CT-Dataset: a CT scan dataset about COVID-19. arXiv preprint arXiv:2003.13865 (2020)
Eduardo, S., Angelov, P., Biaso, S., Froes, M.H., Abe, D.K.: SARS-CoV-2 CT-scan dataset: a large dataset of real patients CT scans for SARS-CoV-2 identification. medRxiv (2020)
Mishra, A.K., Das, S.K., Roy, P., Bandyopadhyay, S.: Identifying COVID19 from chest CT images: a deep convolutional neural networks based approach. J. Healthcare Eng. 2020, 1–7 (2020)
Silva, P., et al.: COVID-19 detection in CT images with deep learning. Inf. Med. Unlocked 20, 100427 (2020)
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)
Casado-García, Á., et al.: CLoDSA: a tool for augmentation in classification, localization, detection, semantic segmentation and instance segmentation tasks. BMC Bioinf. 20(1), 1–14 (2019)
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6023–6032 (2019)
Kiela, D., Bottou, L.: Learning image embeddings using convolutional neural networks for improved multi-modal semantics. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 36–45 (2014)
Carvalho, A.R.S., et al.: COVID-19 chest computed tomography to stratify severity and disease extension by artificial neural network computer-aided diagnosis. Front. Med. 7 (2020)
Hsu, C.-C., Lin, C.-W., Weng-Tai, S., Cheung, G.: Sigan: siamese generative adversarial network for identity-preserving face hallucination. IEEE Trans. Image Process. 28(12), 6225–6236 (2019)
Subsoontorn, P., Lohitnavy, M., Kongkaew, C.: The diagnostic accuracy of isothermal nucleic acid point-of-care tests for human coronaviruses: a systematic review and meta-analysis. Sci. Rep. 10(1), 1–13 (2020)
Dinnes, J., Deeks, J.J., Berhane, S.: How accurate are rapid tests for diagnosing COVID-19? Cochrane Podcast. Accessed 24 Mar 2021, https://www.cochrane.org/CD013705/INFECTN_how-accurate-are-rapid-tests-diagnosing-covid-19
Francone, M., et al.: Chest CT score in COVID-19 patients: correlation with disease severity and short-term prognosis. Eur. Radiol. 30(12), 6808–6817 (2020). https://doi.org/10.1007/s00330-020-07033-y
Wang, T.: We need to invest in socialware just as much as we invest in hardware. Accessed 30 Aug 2021, https://www.triciawang.com/about
Wang, T.: Big data needs thick data. Ethnogr. Matters 13 (2013)
Acknowledgment
The first author would like to thank NSERC for supporting this research through NSERC DDG-2020–00037.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Fiaidhi, J., Sawyer, D., Mohammed, S. (2022). Thick Data Analytics for Small Training Samples Using Siamese Neural Network and Image Augmentation. In: Shi, X., Bohács, G., Ma, Y., Gong, D., Shang, X. (eds) LISS 2021. Lecture Notes in Operations Research. Springer, Singapore. https://doi.org/10.1007/978-981-16-8656-6_6
Download citation
DOI: https://doi.org/10.1007/978-981-16-8656-6_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8655-9
Online ISBN: 978-981-16-8656-6
eBook Packages: Business and ManagementBusiness and Management (R0)