Abstract
Semantic analysis of text collections was used to identify drugs with similar therapeutic activity. Natural language processing methods were applied to analyse > 2.5 mln texts from drug reviews (in English) found on patient forums and discussion boards. In order to obtain distributed word representations form the input data, a continuous bag-of-words type model was used. Such model is one of the word2vec models intended to analyse the natural language semantics. This allowed the assignment of a numeric vector to each drug name. A list of pairs of drugs with similar vectors was formed. An analysis of this list confirmed that similar word vectors correspond to either drugs with the same active compound or to drugs with close therapeutic effects that belong to the same therapeutic group. The chemical similarity in such drug pairs was found to be low. The suggested procedure was used to visualize the chemical drug space and in the search for compounds with potentially similar biological effects among drugs of different therapeutic groups.
Similar content being viewed by others
References
E. Lekka, S. N. Deftereos, A. Persidis, A. Persidis, C. An-dronis, Drug Discovery Today: Therapeutic Strategies, 2012, 8, 103.
W. Loging, R. Rodriguez-Esteban, J. Hill, T. Freeman, J. Miglietta, Drug Discovery Today: Therapeutic Strategies, 2012, 8, 109.
N. C. Baker, B. M. Hemminger, J. Biomed. Inform., 2010, 43, 510.
R. Leaman, L. Wojtulewicz, R. Sullivan, A. Skariah, J. Yang, G. Gonzalez, Proc. 2010 Workshop on Biomedical Natural Language Processing (Uppsala, Sweden, July 15, 2010), Uppsala, 2010, p. 117.
A. Benton, L. Ungar, S. Hill, S. Hennessy, J. Mao, A. Chung, C. E. Leonard, J. H. Holmes, J. Biomed. Inform., 2011, 44, 989.
C. C. Freifeld, J. S. Brownstein, C. M. Menone, W. Bao, R. Filice, T. Kass-Hout, N. Dasgupta, Drug Safety, 2014, 37, 343.
A. Nikfarjam, A. Sarker, K. ÓConnor, R. Ginn, G. Gonza-lez, J. Am. Med. Inform. Ass., 2015, 1.
S. Karimi, C. Wang, A. Metke-Jimenez, R. Gaire, C. Paris, ACM Computing Surveys, 2015, 47, 56.
C. C. Huang, Z. Lu, Brief. Bioinform., 2016, 17, 132.
C. H. Wei, Y. Peng, R. Leaman, A. P. Davis, C. J. Mattingly, J. Li, C.W. Thomas, Z. Lu, Proc. 5th BioCreative Challenge Evaluation Workshop, 2015, 154.
M. Rastegar-Mojarad, H. Liu, P. Nambisan, JMIR Res. Protocols, 2016, 5.
A. Sarker, R. Ginn, A. Nikfarjam, K. ÓConnor, K. Smith, S. Jayaraman, T. Upadhaya, G. Gonzalez, J. Biomed. Inform., 2015, 54, 202.
J. Lardon, R. Abdellaoui, F. Bellet, H. Asfari, J. Souvignet, N. Texier, J. Med. Internet Res., 2015, 17, 171.
H. J. Murff, V. L. Patel, G. Hripcsak, D. W. Bates, J. Biomed. Inform., 2003, 36, 131.
R. Harpaz, A. Callahan, S. Tamang, Y. Low, D. Odgers, S. Finlayson, K. Jung, P. LePendu, N. H. Shah, J. Drug Safety, 2014, 37, 777.
R. Sloane, O. Osanlou, D. Lewis, D. Bollegala, S. Maskell, M. Pirmohamed, British J. Clin. Pharmacol., 2015, 80, 910.
A. Benton, L. Ungar, S. Hill, S. Hennessy, J. Mao, A. Chung, C. H. Leonard, J. H. Holmes, J. Biomed. Inform., 2011, 44, 989.
C. C. Yang, H. Yang, L. Jiang, M. Zhang, Proc. 2012 In-tern. Workshop on Smart Health and Wellbeing (Sheralon, October 29, 2012), Sheralon, 2012, 33.
X. Liu, H. Chen, Proc. Intern. Conf. Smart Health (Beijing, August 3—4, 2013), Beijing, 2013, 134.
S. Yeleswarapu, A. Rao, T. Joseph, V. G. Saipradeep, R. Srinivasan, J. BMC Med. Inform. Decisicion Making, 2014, 14.
C. C. Freifeld, J. S. Brownstein, C. M. Menone, W. Bao, R. Filice, T. Kass-Hout, N. Dasgupta, J. Drug Safety, 2014, 37, 343.
K. ÓConnor, P. Pimpalkhute, A. Nikfarjam, R. Ginn, K. L. Smith, G. Gonzalez, Proc. Am. Med. Inform. Association (AMIA) Ann. Symp., 2014, 2014, 924.
C. C. Yang, H. Yang, L. Jiang, J. ACM Trans. Management Inform. Systems, 2014, 5, 2.
E. Tutubalina, S. Nikolenko, J. Computaciуn y Sistemas, 2017, 21.
J. C. Na, W. Y. M. Kyaing, C. S. Khoo, S. Foo, Y. K. Chang, Y. L. Theng, Proc. Intern. Conf. on Asian Digital Libraries (Taivan, November 12—15, 2012), Taivan, 2012, 189.
A. Nikfarjam, G. H. Gonzalez, Proc. AMIA Ann. Symp. (Washington, October 22—26, 2011), Washington, 2011, 2011, p. 1019.
Y. Niu, X. Zhu, J. Li, G. Hirst, J. AMIA, 2005, 2005, 507.
J. Bian, U. Topaloglu, F. Yu, Proc. 2012 Intern. Workshop on Smart Health and Wellbeing (Sheralon, October 29, 2012), Sheralon, 2012, 25.
M. Yang, X. Wang, M. Y. Kiang, PACIS, 2013, 193.
A. Patki, A.Sarker, P. Pimpalkhute, A. Nikfarjam, R. Ginn, K. ÓConnor, K. Smith, G. Gonzalez, Proc. BioLinkSig 2014 (Boston, July 11—12, 2014), Boston, 2014, 2014, p. 1–8.
B. W. Chee, R. Berlin, B. Schatz, Proc. AMIA Ann. Symp. (Washington, October 22—26, 2011), Washington, 2011, 2011, 217.
A. Sarker, R. Ginn, A. Nikfarjam, K. ÓConnor, K. Smith, S. Jayaraman, U. Tejaswi, G. Gonzalez, J. Biomed. Inform., 2015, 54, 202.
R. Leaman, L. Wojtulewicz, R. Sullivan, A. Skariah, J. Yang, G. Gonzalez, Proc. 2010 Workshop on Biomedical Natural Language Proc., 2010, 117–125.
A. Yates, N. Goharian, O. Frieder, Proc. 2013 ACM SIGIR Workshop on Health Search and Discovery (Dublin, Ireland, August 1, 2013), 2013, p. 55.
E. Aramaki, Y. Miura, M. Tonoike, T. Ohkuma, H. Masui-chi, K. Waki, K. Ohe, J. Stud. Health. Technol. Inform., 2010, 160, 739.
M. A. J. I. D. Rastegar-Mojarad, R. K. Elayavilli, Y. Yu, H. Liu, Proc. Social Media Mining Shared Task Workshop at the Pacific Symposium on Biocomputing, Big Island of Ha-waii, 2016; http://diego.asu.edu/psb2016/acceptedpapers/Mayo-NLP.pdf.
T. Huynh, Y. He, A. Willis, S. Rьger, Proc. COLING 2016, 26th Intern. Conf. on Computational Linguistics: Technical Papers (Osaka, December 11—17, 2016), Osaka, 2016, 877.
Y. Wu, J. Xu, M. Jiang, Y. Zhang, H. Xu, AMIA Ann. Symp. Proc., 2015, 1326.
D. L. Ngo, N. Yamamoto, V. A. Tran, N. G. Nguyen, D. Phan, F. R. Lumbanraja, M. Kubo, K. Satou, J. Biomed. Sci. Eng., 2016, 9, 7.
A. N. Jagannatha, J. Chen, H. Yu, Proc. 6th Intern. Work-shop on Health Text Mining and Information Analysis (Louhi, 2015), Louhi, 2015, 142.
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Proc. of NIPS, Eds C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, K. Q. Weinberger, Online, 2013, 3111.
T. Mikolov, K. Chen, G. Corrado, J. Dean, arXiv preprint arXiv, 2013.
B. F. Begam, J. S. Kumar, Proc. Eng., 2012, 38, 1264.
A. Varnek, I. I. Baskin, Mol. Inform., 2011, 30, 20.
M. A. Johnson, G. M. Maggiora, in Concepts and Applica-tions of Molecular Similarity, John Wiley & Sons, Hoboken, New Jersey, 1990, p. 394.
P. G. Polishchuk, T. I. Madzhidov, A. Varnek, J. Comput-er-Aided Molecular Design, 2013, 27, 675.
R. Todeschini, V. Consonni, Molecular Descriptors for chemoinformatics, John Wiley & Sons, Hoboken, New Jersey, 2009.
URL: https://scrapy.org.
URL: http://www.webmd.com.
URL: http://www.askapatient.co.
URL: https://www.drugs.com.
URL: https://dailystrength.org.
URL: http://patient.info.
J. McAuley, C. Targett, Q. Shi, Van Den Hengel, Proc. 38th Intern. ACM SIGIR Conf. on Research and Develop-ment in Information Retrieval, ACM, New York, 2015, p. 43–52.
J. Beck, B. Woolf, J. Intelligent Tutoring Systems, 2000, 584.
Y. Bengio, R. Ducharme, P. Vincent, J. Machine Learning Res., 2003, 3, 1137.
X. Rong, arXiv preprint arXiv, 2014; https://arxiv.org/pdf/1411.2738.
R. Rehurek, P. Sojka, Proc. LREC 2010 Workshop on New Challenges for NLP Frameworks (Valletta, Malta, May 22, 2010), ELRA, 2010, p 45.
A. Varnek, D. Fourches, F. Hoonakker, V. P. Solovév, J. Comput. Aided. Mol. Des, 2005, 19, 693.
https://www.drugbank.ca.
A. Kishimoto, K. Kamata, T. Sugihara, S. Ishiguro, H. Hazama, R. Mizukawa, N. Kunimoto, Acta Psychiatrica Scandinavica, 1988, 77, 81.
S. Morishita, S. Aoki, J. Affective Disorders, 1999, 53, 275.
B. W. Dunlop, P. G. Davis, Prim Care Companion J. Clin. Psychiatry, 2008, 10, 222.
S. V. Kane, E. L. Altschuler, R. E. Kast, Gastroenterology, 2003, 125, 1290.
T. Mikolov, W. Yih, G. Zweig, Proc. 2013 Conf. of the North American Chapter of the Association for Computational Lin-guistics: Human Language Technologies (Atlanta, 2013), 2013, 13, 746.
E. Tutubalina, S. Nikolenko, Proc. Intern. Conf. on Analysis of Images, Social Networks and Texts, Biznes-Tsentr Palla-dium, Ekaterinburg, 2016, p. 208.
S. I. Nikolenko, Proc. 39th Intern. ACM SIGIR Conf. on Research and Development in Information Retrieval (Pisa, 2016), Pisa, 2016, p. 1029.
N. A. Loukachevitch, Proc. Intern. Conf. on Text, Speech, and Dialogue (Moscow, 2016), RGGU, Moscow, 2016, p. 134.
V. Solovyev, V. Ivanov, J. Comput. Intelligence and Neurosci., 2016; doi: 10.1155/2016/4183760.
M. Ester, H.-P. Kriegel, J. Sander, X. Xu, Proc. Second Intern. Conf. on Knowledge Discovery and Data Mining (Port Land, 1996), AAAI, Menlo Park, 1996, p. 226.
L. V. D. Maaten, G. Hinton, J. Machine Learning Res., 2008, 9, 2579.
L. van der Maaten, J. Machine Learning Res., 2008, 9, 2579.
S. X. M. Li, K. W. Perry, D. T. Wong, Neuropharmacology, 2002, 42, 181.
J. A. Bodkin, R. A. Lasser, Jr., J. D. Wines, D. M. Gardner, R. J. Baldessarini, J. Clinical Psychiatry, 1997, 58, 137.
P. Blier, H. E. Ward, P. Tremblay, L. Laberge, C. Hйbert, R. Bergeron, Am. J. Psychiatry, 2009, 167, 281.
Z. Sh. Miftahutdinov, E. V. Tutubalina, A. E. Tropsha, Komp. Lingv. and Intel. Teh. [Comp. Lingv. and Intell. Tech.], 2017, 1, 155 (in Russian).
Author information
Authors and Affiliations
Corresponding author
Additional information
Based on the Materials of the XX Mendeleev Congress on General and Applied Chemistry (September 26—30, 2016, Ekaterinburg, Russia).
Published in Russian in Izvestiya Akademii Nauk. Seriya Khimicheskaya, No. 11, pp. 2180—2189, November, 2017.
Rights and permissions
About this article
Cite this article
Tutubalina, E.V., Miftahutdinov, Z.S., Nugmanov, R.I. et al. Using semantic analysis of texts for the identification of drugs with similar therapeutic effects. Russ Chem Bull 66, 2180–2189 (2017). https://doi.org/10.1007/s11172-017-2000-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11172-017-2000-8