Prompt-MIL: Boosting Multi-instance Learning Schemes via Task-Specific Prompt Tuning

Zhang, Jingwei; Kapse, Saarthak; Ma, Ke; Prasanna, Prateek; Saltz, Joel; Vakalopoulou, Maria; Samaras, Dimitris

doi:10.1007/978-3-031-43993-3_60

Jingwei Zhang¹⁴,
Saarthak Kapse¹⁴,
Ke Ma¹⁵,
Prateek Prasanna¹⁴,
Joel Saltz¹⁴,
Maria Vakalopoulou¹⁶ &
…
Dimitris Samaras¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14227))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

2757 Accesses
4 Citations

Abstract

Whole slide image (WSI) classification is a critical task in computational pathology, requiring the processing of gigapixel-sized images, which is challenging for current deep-learning methods. Current state of the art methods are based on multi-instance learning schemes (MIL), which usually rely on pretrained features to represent the instances. Due to the lack of task-specific annotated data, these features are either obtained from well-established backbones on natural images, or, more recently from self-supervised models pretrained on histopathology. However, both approaches yield task-agnostic features, resulting in performance loss compared to the appropriate task-related supervision, if available. In this paper, we show that when task-specific annotations are limited, we can inject such supervision into downstream task training, to reduce the gap between fully task-tuned and task agnostic features. We propose Prompt-MIL, an MIL framework that integrates prompts into WSI classification. Prompt-MIL adopts a prompt tuning mechanism, where only a small fraction of parameters calibrates the pretrained features to encode task-specific information, rather than the conventional full fine-tuning approaches. Extensive experiments on three WSI datasets, TCGA-BRCA, TCGA-CRC, and BRIGHT, demonstrate the superiority of Prompt-MIL over conventional MIL methods, achieving a relative improvement of 1.49%–4.03% in accuracy and 0.25%–8.97% in AUROC while using fewer than 0.3% additional parameters. Compared to conventional full fine-tuning approaches, we fine-tune less than 1.3% of the parameters, yet achieve a relative improvement of 1.29%–13.61% in accuracy and 3.22%–27.18% in AUROC and reduce GPU memory consumption by 38%–45% while training 21%–27% faster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bilal, M., et al.: Development and validation of a weakly supervised deep learning framework to predict the status of molecular pathways and key mutations in colorectal cancer from routine histology images: a retrospective study. Lancet Digital Health 3(12), e763–e772 (2021)
Article Google Scholar
Brancati, N., et al.: BRACS: a dataset for breast carcinoma subtyping in H &E histology images. Database 2022, baac093 (2022)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)
Google Scholar
Chen, R.J., et al.: Scaling vision transformers to gigapixel images via hierarchical self-supervised learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16144–16155, June 2022
Google Scholar
Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9640–9649 (2021)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021)
Google Scholar
Gu, Y., Han, X., Liu, Z., Huang, M.: PPT: pre-trained prompt tuning for few-shot learning. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), pp. 8410–8423 (2022)
Google Scholar
Hou, L., Samaras, D., Kurc, T.M., Gao, Y., Davis, J.E., Saltz, J.H.: Patch-based convolutional neural network for whole slide tissue image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2424–2433 (2016)
Google Scholar
Jia, M., et al.: Visual prompt tuning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part XXXIII, pp. 709–727. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19827-4_41
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015)
Google Scholar
Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3045–3059 (2021)
Google Scholar
Li, B., Li, Y., Eliceiri, K.W.: Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14318–14328 (2021)
Google Scholar
Lingle, W., et al.: Radiology data from the Cancer Genome Atlas Breast Invasive Carcinoma [TCGA-BRCA] collection. Cancer Imaging Arch. 10, K9 (2016)
Google Scholar
Liu, X., et al.: P-tuning: prompt tuning can be comparable to fine-tuning across scales and tasks. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (vol. 2: Short Papers), pp. 61–68 (2022)
Google Scholar
Liu, Y., et al.: Comparative molecular analysis of gastrointestinal adenocarcinomas. Cancer Cell 33, 721–735.e8 (2018)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2018)
Google Scholar
Lu, M.Y., Williamson, D.F., Chen, T.Y., Chen, R.J., Barbieri, M., Mahmood, F.: Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5(6), 555–570 (2021)
Article Google Scholar
Network, C.G.A., et al.: Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012)
Article Google Scholar
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Pinckaers, H., Van Ginneken, B., Litjens, G.: Streaming convolutional neural networks for end-to-end learning with multi-megapixel images. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1581–1590 (2020)
Article Google Scholar
Platform, P.A.: PAIP (2021). Data retrieved from PAIP, http://www.wisepaip.org/paip/
Schucher, N., Reddy, S., de Vries, H.: The power of prompt tuning for low-resource semantic parsing. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (vol. 2: Short Papers), pp. 148–156 (2022)
Google Scholar
Shao, Z., et al.: TransMIL: transformer based correlated multiple instance learning for whole slide image classification. Adv. Neural. Inf. Process. Syst. 34, 2136–2147 (2021)
Google Scholar
Takahama, S., et al.: Multi-stage pathological image classification using semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10702–10711 (2019)
Google Scholar
Wang, X., et al.: TransPath: transformer-based self-supervised learning for histopathological image classification. In: de Bruijne, M., et al. (eds.) MICCAI 2021, Part VIII. LNCS, vol. 12908, pp. 186–195. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87237-3_18
Chapter Google Scholar
Wang, X., et al.: Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022)
Article Google Scholar
Weinstein, J.N., et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113–1120 (2013)
Article Google Scholar
Zhang, J., et al.: Gigapixel whole-slide images classification using locally supervised learning. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 192–201. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16434-7_19

Download references

Acknowledgements

This work was partially supported by the ANR Hagnodice ANR-21-CE45-0007, the NSF IIS-2212046, the NSF IIS-2123920, the NIH 1R21CA258493-01A1, the NCI UH3CA225021 and Stony Brook University Provost Funds. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Authors and Affiliations

Stony Brook University, Stony Brook, USA
Jingwei Zhang, Saarthak Kapse, Prateek Prasanna, Joel Saltz & Dimitris Samaras
Snap Inc., New York, USA
Ke Ma
CentraleSupélec, University of Paris-Saclay, Paris, France
Maria Vakalopoulou

Authors

Jingwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Saarthak Kapse
View author publications
You can also search for this author in PubMed Google Scholar
Ke Ma
View author publications
You can also search for this author in PubMed Google Scholar
Prateek Prasanna
View author publications
You can also search for this author in PubMed Google Scholar
Joel Saltz
View author publications
You can also search for this author in PubMed Google Scholar
Maria Vakalopoulou
View author publications
You can also search for this author in PubMed Google Scholar
Dimitris Samaras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingwei Zhang .

Editor information

Editors and Affiliations

Icahn School of Medicine, Mount Sinai, NYC, NY, USA, Tel Aviv University, Tel Aviv, Israel
Hayit Greenspan
Emory University, Atlanta, GA, USA
Anant Madabhushi
Queen’s University, Kingston, ON, Canada
Parvin Mousavi
The University of British Columbia, Vancouver, BC, Canada
Septimiu Salcudean
Yale University, New Haven, CT, USA
James Duncan
IBM Research, San Jose, CA, USA
Tanveer Syeda-Mahmood
Johns Hopkins University, Baltimore, MD, USA
Russell Taylor

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1687 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J. et al. (2023). Prompt-MIL: Boosting Multi-instance Learning Schemes via Task-Specific Prompt Tuning. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14227. Springer, Cham. https://doi.org/10.1007/978-3-031-43993-3_60

Download citation

DOI: https://doi.org/10.1007/978-3-031-43993-3_60
Published: 01 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43992-6
Online ISBN: 978-3-031-43993-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)