Abstract
To understand the mass behaviour of people, an effectual platform was provided by the online social network, which aids in developing techniques for the surveillance of Human Immunodeficiency Viruses/Acquired Immunodeficiency Syndrome (HIV/AIDS). With the rapid advancement of social sites, namely Facebook, Twitter, and blogs, the social networking approach is the most promising factor in HIV/AIDS investigation. Recently, most of the prevailing works implemented various frameworks to classify HIV/AIDS-related information using Social Media (SM) data. However, the traditional techniques were not generalized well enough to handle the complex structure of SM data. Also, the existing models were less effective due to the lack of annotation processes and pre-processing strategies. In this paper, to identify as well as classify HIV/AIDS-related SM data, a novel strategy has been proposed by utilizing the Levenshtein Distance-KMeans algorithm (LD-KMeans) and Deep Belief Networks—Long Short Term Memory (DBN-LSTM) models. The proposed work mainly focuses on the discussions on HIV and AIDS-related issues taking place on Twitter. For an efficient HIV/AIDS-related tweet classification, the proposed work undergoes the following steps. Initially, the tweets from Twitter are extracted by using Twitter API, and then, the preprocessing function is performed on the Twitter data. Then, the annotation extraction is performed. Next, the tweets are separated into organization tweets and person tweets based on the annotation. In the proposed work, organization tweets are highly considered. After that, the text normalization is performed, which provides the cleaned structured tweets. Then, the hashtags related to HIV and AIDS are identified and grouped together by using the LD-KMeans algorithm. Thereafter, the word embedding is performed by means of M-Word2Vec. Once the embedding process is completed, the most important features are selected by the LS-DFO algorithm. Finally, on the basis of selected features, the classification is performed, which efficiently classifies the HIV/AIDS-related tweets into different categories like symptoms, awareness, medicine, and reason. In this research work, Twitter data are utilized. Then, the outcomes obtained by the proposed methodology are analogized with the prevailing algorithms. Thus, the analysis results proved that the research methodology obtained a better accuracy, sensitivity, and specificity of 94.65%, 94.56%, and 94.25%, respectively. Likewise, the proposed work reached a tweet identification time of 72154 ms. Finally, the experiential outcomes demonstrated that regarding sensitivity, specificity, along with accuracy, the proposed model outperformed the prevailing systems in the process of classifying the HIV\AIDS-related tweets.
Similar content being viewed by others
Data availability
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
Abbreviations
- HIV:
-
Human Immunodeficiency Syndrome
- AIDS:
-
Acquired Immune Deficiency Syndrome
- ML:
-
Machine Learning
- DL:
-
Deep Learning
- LS:
-
Linear Scaling
- KMeans:
-
K-Means
- DBN:
-
Deep Belief Network
- LSTM:
-
Long Short-Term Memory
- DFO:
-
Dispersive Flies Optimization
- DBN-LSTM:
-
Deep Belief Network- Long Short-Term Memory
- LD-KMeans:
-
Levenshtein Distance-KMeans
- LS-DFO:
-
Linear Scaling based Dispersive Flies Optimization algorithm
- M-Word2Vec:
-
Modified-Word2Vec
- Twitter API:
-
Twitter Application Programming Interface
- SA:
-
Sentiment Analysis
- LR:
-
Logistic Regression
- CNN:
-
Convolutional Neural Network
- SVM:
-
Support Vector Machine
- TF-IDF:
-
Term Frequency-Inverse Document Frequency
- FS:
-
Feature Selection
- CM:
-
Confusion Matrix
- DP:
-
Data Points
References
Zheng C, Wang W, Young S (2021) Identifying HIV-related digital social influencers using an iterative deep learning approach. AIDS 35(1):1–10
Jahanbin Kia, Rahmanian Fereshte, Rahmanian Vahid, Jahromi Abdolreza Sotoodeh (2019) Application of twitter and web news mining in infectious disease surveillance systems and prospects for public health. GMS Hygiene and Infection Control 14:1–12
Garza P, Sarvas R, Malik A (2020) Applying natural language processing techniques to analyze HIV-related discussions on Social Media. Thesis, Politecnico Di Torino. https://webthesis.biblio.polito.it/secure/15239/1/tesi.pdf
Lohmann S, White BX, Zuo Z, Chan MS, Morales A, Li B, Zhai C, Albarracin D (2018) HIV messaging on twitter an analysis of current practice and data-driven recommendations. AIDS. 32(18):2799–2805
Weibel N, Desai P, Saul L, Gupta A, Little S (2017) HIV risk on twitter the ethical dimension of social media evidence-based prevention for vulnerable populations. Proceedings of the 50th Hawaii International Conference on System Sciences, January 4–7, 2017, Hilton Waikoloa Village. http://hdl.handle.net/10125/41370
LourdesAraujo JM, Romo OB (2022) Ricardo Sanchez de Madariaga and The Cohort of the National AIDS Network (CoRIS), “Discovering HIV related information by means of association rules and machine learning.” Sci Rep 12:1–12
Naga HB, Kumari R, Kumar S andJiling Zhong (2018) How much do you care? mining and analysis of tweets pertaining to health issues. SoutheastCon. St. Petersburg, FL, USA, pp 1–8. https://doi.org/10.1109/SECON.2018.8478865
Odlum M, Yoon S, Broadwell P, Brewer R, Kuang Da (2018) How twitter can support the HIV/AIDS response to achieve the 2030 eradication goal in-depth thematic analysis of world AIDS day tweets. JMIR Public Health Surveill 4(4):1–11
Fung Isaac Chun-Hai, Jackson Ashley M, Ahweyevu Jennifer O, Grizzle Jordan H, Yin Jingjing, Tse Zion Tsz Ho, Liang Hai, Sekandi Juliet N, King-Wa Fu (2017) #Globalhealth Twitter Conversations on #Malaria, #HIV, #TB, #NCDS, and #NTDS: a Cross-Sectional Analysis. Annals of Global Health 83(3–4):682–690
Matza Louis S, Paulus Trena M, Garris Cindy P, Van de Velde Nicolas, Chounta Vasiliki, Deger Kristen A (2020) Qualitative thematic analysis of social media data to assess perceptions of route of administration for antiretroviral treatment among people living with HIV. The Patient - Patient-Centered Outcomes Research 13:409–432
Mittal Mamta, Kaur Iqbaldeep, Pandey Subhash Chandra, Verma Amit, Goyal Lalit Mohan (2019) Opinion mining for the tweets in healthcare sector using fuzzy association rule. EAI Endorsed Trans Pervasive Health Technol 4(16):1–10
Marshall B, Salabarria-Pena Y, Johnson W, Moore L (2021) Reaching racial/ethnic and sexual and gender minorities with HIV prevention information via social marketing. Evaluation and Program Planning (In Press). https://doi.org/10.1016/j.evalprogplan.2021.101982
Saranya G, Geetha G, Chakrapani K, Meenakshi K and Karpagaselvi S (2020) Sentiment analysis of healthcare tweets using SVM classifier. International Conference on Power, Energy, Control and Transmission Systems (ICPECTS). Chennai, pp 1–3. https://doi.org/10.1109/ICPECTS49113.2020.9336981
Manaloto TAD, Raga RC Jr (2020) Tools and techniques for capturing possible HIV risk-related tweets of filipinos. Int J Sci Technol Res 9(4):2116–2121
Young SD, Wenchao Yu, Wang W (2017) Toward automating HIV identification machine learning for rapid identification of HIV-related social media data. J Acquir Immune Defic Syndr 74:128–131
Tavoschi Lara, Quattrone Filippo, D’Andrea Eleonora, Ducange Pietro, Vabanesi Marco, Marcelloni Francesco, Lopalco Pier Luigi (2020) Twitter as a sentinel tool to monitor public opinion on vaccination an opinion mining analysis from September 2016 to August 2017 in Italy. Human Vaccines & Immunotherapeutics 16(5):1062–1069
Francesco Marcelloni and Pier Luigi Lopalco (2020) Twitter as a sentinel tool to monitor public opinion on vaccination an opinion mining analysis from September 2016 to August 2017 in Italy. Hum Vaccin Immunother 16(5):1062–1069
Lohmann S, Lourentzou I, Zhai C, Albarracin D (2018) Who is saying what on twitter an analysis of messages with references to HIV and HIV risk behavior. ACTA De InvestigacionPsicologica 8(1):95–100
Adrover C, Bodnar T, Salathe M (2014) Targeting HIV-related medication side effects and sentiment using twitter data. https://doi.org/10.48550/arXiv.1404.3610
Mageshwari V, Laurence Aroquiaraj I (2019) An efficient feature extraction method for mining social media. Int J Sci Technol Res 8(11):640–643
Stevens Robin, Bonett Stephen, Bannon Jacqueline, Chittamuru Deepti, Slaff Barry, Browne Safa K, Huang Sarah, Bauermeister Jose A (2020) Association between HIV-related Tweets to HIV incidence in the U.S A digital epidemiological study. J Med Int Res 22(6):1–25
Malik Aqdas, Antonino Angi, Laeeq Khan M, Nieminen Marko (2021) Characterizing HIV discussions and engagement on twitter. Health Technol 11(4):1237–1245
CosmeAdrover Todd Bodnar, Huang Zhuojie, Telenti Amalio, Salathe Marcel (2015) Identifying adverse effects of HIV drug treatment and associated sentiments using twitter. JMIR Public Health and Surveillance 1(2):1–10
Thangarajan N, Green N, Gupta A, Little S, Weibel N (2015) Analyzing social media to characterize local HIV at-risk populations. Proceedings of the conference on Wireless Health. Bethesda Maryland, pp 1–8. https://doi.org/10.1145/2811780.2811923
Viola Savy Dsouza (2023) Priyobrat Rajkhowa, Rashmi Mallya, Raksha, Mrinalini V, Cauvery K, Rohit Raj, Indu Toby, Sanjay Pattanshetty and Helmut Brand, “A sentiment and content analysis of tweets on monkeypox stigma among the LGBTQ+ community: A cue to risk communication plan.” Dialogues in Health 2:1–8
Qin Z, Ronchieri E (2022) Exploring pandemics events on twitter by using sentiment analysis and topic modeling. Appl Sci 12:1–21
Mathiyazhagan B, Liyaskar J (2022) Ahmad Taher Azar, Hannah H Inbarani, Yasir Javed, Nashwa Ahmad Kamal and Khaled M Fouad, “Rough set based classification and feature selection using improved harmony search for peptide analysis and prediction of anti-hiv-1 activities.” Appl Sci 12:1–13
Maria Grazia Sindoni (2021) The time is now: A multimodal pragmatic analysis of how identity and distance are indexed in HIV risk communication digital campaigns in US. J Pragmat 171:82–86
Bazrafshan A, PanahiI S, Sharifi H, Merghati-Khoei E (2022) The role of online social networks in improving health literacy and medication adherence among people living with HIV/AIDS in Iran: Development of a conceptual model. PLoS ONE 17(6):1–21
Erdengasileng A, Tian S, Green SS, Naar S, He Z (2022) Using twitter data analysis to understand the perceptions, awareness, and barriers to the wide use of pre-exposure prophylaxis in the united states, In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Las Vegas, Nevada, pp 3000–3007. https://doi.org/10.1109/2Fbibm55620.2022.9995568
Dangi D, Dixi DK, Bhagat A (2022) Sentiment analysis of COVID-19 social media data through machine learning. Multimed Tools Appl 81(29):42261–42283. https://doi.org/10.1007/s11042-022-13492-w
Palani B, Elango S, Viswanathan KV (2022) CB-Fake: A multimodal deep learning framework for automatic fake news detection using capsule neural network and BERT. Multimed Tools Appl 81(4):5587–5620. https://doi.org/10.1007/s11042-021-11782-3
Dinç B, Kaya Y (2024) HBDFA: An intelligent nature-inspired computing with high-dimensional data analytics. Multimed Tools Appl 83(4):11573–11592. https://doi.org/10.1007/s11042-023-16039-9
Mallik A, Kumar S (2024) Word2Vec and LSTM based deep learning technique for context-free fake news detection. Multimed Tools Appl 83(1):919–940. https://doi.org/10.1007/s11042-023-15364-3
Acknowledgements
We thank the anonymous referees for their useful suggestions.
Funding
This work has no funding resource.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by *1Mageshwari V, 2Dr. I. Laurence Aroquiaraj. The first draft of the manuscript was written by Mageshwari V and all authors commented on previous versions of the manuscript.
All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent of publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mageshwari, V., Aroquiaraj, I.L. A novel technique for identification and classification of HIV/AIDS related social media data using LD-KMEANS and DBN-LSTM. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19283-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-19283-9