Using SMOTE to Deal with Class-Imbalance Problem in Bioactivity Data to Predict mTOR Inhibitors

Kumari, Chetna; Abulaish, Muhammad; Subbarao, Naidu

doi:10.1007/s42979-020-00156-5

Using SMOTE to Deal with Class-Imbalance Problem in Bioactivity Data to Predict mTOR Inhibitors

Original Research
Published: 06 May 2020

Volume 1, article number 150, (2020)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Chetna Kumari¹,
Muhammad Abulaish² &
Naidu Subbarao³

916 Accesses
8 Citations
Explore all metrics

Abstract

Machine learning algorithms give sub-optimal performance in the presence of class-imbalanced dataset. Mammalian target of rapamycin (mTOR) is one of the serine/threonine protein kinase, and plays an integral role in autophagy pathway. Autophagy is a cellular pathway for recycling of macromolecules (proteins, lipids, and organelles), which enables eukaryotic cells to adapt metabolism to survive during adverse growth conditions. Targeting mTOR through therapeutic interventions of autophagy pathway establishes mTOR a promising pharmacological target for autophagy modulation in cancer. The bioactivity dataset of mTOR in ChEMBL, a compound bioactivity database maintained by European Bioinformatics Institute, shows disproportionate distribution of active and inactive classes. The predictive models based on this skewed dataset are biased towards prediction of majority class. Hence, we have used Synthetic Minority Over-sampling TEchnique to deal with class-imbalance problem in bioactivity datasets. We have built and evaluated predictive models based on four commonly used classifiers using both class-imbalanced and class-balanced bioactivity datasets, and compared their performance based on various metrics like accuracy, sensitivity, specificity, F1-measure, and AUC. We observe that the classification models based on balanced dataset generally outperform those that are based on class-imbalanced dataset, irrespective of the classifiers used for classification task. We conclude that predictive models trained over class-balanced dataset can be used for screening large compound bioactivity datasets to predict mTOR inhibitors-like compounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

Article 12 April 2021

Deep learning in drug discovery: an integrative review and future challenges

Article Open access 17 November 2022

Machine Learning in Drug Discovery: A Review

Article 11 August 2021

References

Bender A. Databases: compound bioactivities go public. Nat Chem Biol. 2010;6(5):309.
Article Google Scholar
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
Article Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. Boca Raton: CRC Press; 1984.
MATH Google Scholar
Chang CC, Lin CJ. LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol. 2011;2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.
Article Google Scholar
Chiarini F, Evangelisti C, McCubrey JA, Martelli AM. Current treatment strategies for inhibiting mtor in cancer. Trends Pharmacol Sci. 2015;36(2):124–35.
Article Google Scholar
Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97.
MATH Google Scholar
Fabbro D, Cowan-Jacob SW, Moebitz H. Ten things you should know about protein kinases: IUPHAR review 14. Br J Pharmacol. 2015;172(11):2675–700.
Article Google Scholar
Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, et al. ChEmbl: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40(D1):D1100–7.
Article Google Scholar
Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, Mutowo P, Atkinson F, Bellis LJ, Cibrián-Uhalte E, et al. The ChEMBL database in 2017. Nucleic Acids Res. 2016;45(D1):D945–54.
Article Google Scholar
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G. Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl. 2017;73:220–39.
Article Google Scholar
Haykin S. Neural networks: a comprehensive foundation. Englewood Cliffs: Pretice Hall International, Inc.; 1999.
MATH Google Scholar
He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2008;73(9):1263–84.
Google Scholar
Kim YC, Guan KL. mTOR: a pharmacologic target for autophagy regulation. J Clin Investig. 2015;125(1):25–32.
Article Google Scholar
Li Q, Wang Y, Bryant SH. A novel method for mining highly imbalanced high-throughput screening data in pubchem. Bioinformatics. 2009;25(24):3310–6.
Article Google Scholar
Loh WY. Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov. 2011;1(1):14–23.
Article Google Scholar
Roskoski R Jr. Classification of small molecule protein kinase inhibitors based upon the structures of their drug–enzyme complexes. Pharmacol Res. 2016;103:26–48.
Article Google Scholar
Sun Y, Wong AK, Kamel MS. Classification of imbalanced data: a review. Int J Pattern Recognit Artif Intell. 2009;23(04):687–719.
Article Google Scholar
Wang L, Chen L, Liu Z, Zheng M, Gu Q, Xu J. Predicting mTOR inhibitors with a classifier using recursive partitioning and Naïve Bayesian approaches. PloS ONE. 2014;9(5):e95221.
Article Google Scholar
Yap CW. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011;32(7):1466–74.
Article MathSciNet Google Scholar
Zakharov AV, Peach ML, Sitzmann M, Nicklaus MC. QSAR modeling of imbalanced high-throughput screening data in pubchem. J Chem Inf Model. 2014;54(3):705–12.
Article Google Scholar
Zask A, Verheijen JC, Richard DJ. Recent advances in the discovery of small-molecule ATP competitive mTOR inhibitors: a patent review. Expert Opin Ther Patents. 2011;21(7):1109–27.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Jamia Millia Islamia, New Delhi, India
Chetna Kumari
Department of Computer Science, South Asian University, New Delhi, India
Muhammad Abulaish
School of Computational and Integrative Biology, Jawaharlal Nehru University, New Delhi, India
Naidu Subbarao

Authors

Chetna Kumari
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Abulaish
View author publications
You can also search for this author in PubMed Google Scholar
Naidu Subbarao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Abulaish.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances in Computational Approaches for Artificial Intelligence, Image Processing, IoT and Cloud Applications” guest edited by Bhanu Prakash K N and M. Shivakumar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumari, C., Abulaish, M. & Subbarao, N. Using SMOTE to Deal with Class-Imbalance Problem in Bioactivity Data to Predict mTOR Inhibitors. SN COMPUT. SCI. 1, 150 (2020). https://doi.org/10.1007/s42979-020-00156-5

Download citation

Received: 28 March 2020
Accepted: 08 April 2020
Published: 06 May 2020
DOI: https://doi.org/10.1007/s42979-020-00156-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using SMOTE to Deal with Class-Imbalance Problem in Bioactivity Data to Predict mTOR Inhibitors

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

Deep learning in drug discovery: an integrative review and future challenges

Machine Learning in Drug Discovery: A Review

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using SMOTE to Deal with Class-Imbalance Problem in Bioactivity Data to Predict mTOR Inhibitors

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

Deep learning in drug discovery: an integrative review and future challenges

Machine Learning in Drug Discovery: A Review

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation