Skip to main content

Advertisement

Log in

A novel technique for identification and classification of HIV/AIDS related social media data using LD-KMEANS and DBN-LSTM

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

To understand the mass behaviour of people, an effectual platform was provided by the online social network, which aids in developing techniques for the surveillance of Human Immunodeficiency Viruses/Acquired Immunodeficiency Syndrome (HIV/AIDS). With the rapid advancement of social sites, namely Facebook, Twitter, and blogs, the social networking approach is the most promising factor in HIV/AIDS investigation. Recently, most of the prevailing works implemented various frameworks to classify HIV/AIDS-related information using Social Media (SM) data. However, the traditional techniques were not generalized well enough to handle the complex structure of SM data. Also, the existing models were less effective due to the lack of annotation processes and pre-processing strategies. In this paper, to identify as well as classify HIV/AIDS-related SM data, a novel strategy has been proposed by utilizing the Levenshtein Distance-KMeans algorithm (LD-KMeans) and Deep Belief Networks—Long Short Term Memory (DBN-LSTM) models. The proposed work mainly focuses on the discussions on HIV and AIDS-related issues taking place on Twitter. For an efficient HIV/AIDS-related tweet classification, the proposed work undergoes the following steps. Initially, the tweets from Twitter are extracted by using Twitter API, and then, the preprocessing function is performed on the Twitter data. Then, the annotation extraction is performed. Next, the tweets are separated into organization tweets and person tweets based on the annotation. In the proposed work, organization tweets are highly considered. After that, the text normalization is performed, which provides the cleaned structured tweets. Then, the hashtags related to HIV and AIDS are identified and grouped together by using the LD-KMeans algorithm. Thereafter, the word embedding is performed by means of M-Word2Vec. Once the embedding process is completed, the most important features are selected by the LS-DFO algorithm. Finally, on the basis of selected features, the classification is performed, which efficiently classifies the HIV/AIDS-related tweets into different categories like symptoms, awareness, medicine, and reason. In this research work, Twitter data are utilized. Then, the outcomes obtained by the proposed methodology are analogized with the prevailing algorithms. Thus, the analysis results proved that the research methodology obtained a better accuracy, sensitivity, and specificity of 94.65%, 94.56%, and 94.25%, respectively. Likewise, the proposed work reached a tweet identification time of 72154 ms. Finally, the experiential outcomes demonstrated that regarding sensitivity, specificity, along with accuracy, the proposed model outperformed the prevailing systems in the process of classifying the HIV\AIDS-related tweets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Abbreviations

HIV:

Human Immunodeficiency Syndrome

AIDS:

Acquired Immune Deficiency Syndrome

ML:

Machine Learning

DL:

Deep Learning

LS:

Linear Scaling

KMeans:

K-Means

DBN:

Deep Belief Network

LSTM:

Long Short-Term Memory

DFO:

Dispersive Flies Optimization

DBN-LSTM:

Deep Belief Network- Long Short-Term Memory

LD-KMeans:

Levenshtein Distance-KMeans

LS-DFO:

Linear Scaling based Dispersive Flies Optimization algorithm

M-Word2Vec:

Modified-Word2Vec

Twitter API:

Twitter Application Programming Interface

SA:

Sentiment Analysis

LR:

Logistic Regression

CNN:

Convolutional Neural Network

SVM:

Support Vector Machine

TF-IDF:

Term Frequency-Inverse Document Frequency

FS:

Feature Selection

CM:

Confusion Matrix

DP:

Data Points

References

  1. Zheng C, Wang W, Young S (2021) Identifying HIV-related digital social influencers using an iterative deep learning approach. AIDS 35(1):1–10

    Google Scholar 

  2. Jahanbin Kia, Rahmanian Fereshte, Rahmanian Vahid, Jahromi Abdolreza Sotoodeh (2019) Application of twitter and web news mining in infectious disease surveillance systems and prospects for public health. GMS Hygiene and Infection Control 14:1–12

    Google Scholar 

  3. Garza P, Sarvas R, Malik A (2020) Applying natural language processing techniques to analyze HIV-related discussions on Social Media. Thesis, Politecnico Di Torino. https://webthesis.biblio.polito.it/secure/15239/1/tesi.pdf

  4. Lohmann S, White BX, Zuo Z, Chan MS, Morales A, Li B, Zhai C, Albarracin D (2018) HIV messaging on twitter an analysis of current practice and data-driven recommendations. AIDS. 32(18):2799–2805

    Article  Google Scholar 

  5. Weibel N, Desai P, Saul L, Gupta A, Little S (2017) HIV risk on twitter the ethical dimension of social media evidence-based prevention for vulnerable populations. Proceedings of the 50th Hawaii International Conference on System Sciences, January 4–7, 2017, Hilton Waikoloa Village. http://hdl.handle.net/10125/41370

  6. LourdesAraujo JM, Romo OB (2022) Ricardo Sanchez de Madariaga and The Cohort of the National AIDS Network (CoRIS), “Discovering HIV related information by means of association rules and machine learning.” Sci Rep 12:1–12

    Google Scholar 

  7. Naga HB, Kumari R, Kumar S andJiling Zhong (2018) How much do you care? mining and analysis of tweets pertaining to health issues. SoutheastCon. St. Petersburg, FL, USA, pp 1–8. https://doi.org/10.1109/SECON.2018.8478865

  8. Odlum M, Yoon S, Broadwell P, Brewer R, Kuang Da (2018) How twitter can support the HIV/AIDS response to achieve the 2030 eradication goal in-depth thematic analysis of world AIDS day tweets. JMIR Public Health Surveill 4(4):1–11

    Article  Google Scholar 

  9. Fung Isaac Chun-Hai, Jackson Ashley M, Ahweyevu Jennifer O, Grizzle Jordan H, Yin Jingjing, Tse Zion Tsz Ho, Liang Hai, Sekandi Juliet N, King-Wa Fu (2017) #Globalhealth Twitter Conversations on #Malaria, #HIV, #TB, #NCDS, and #NTDS: a Cross-Sectional Analysis. Annals of Global Health 83(3–4):682–690

    Article  Google Scholar 

  10. Matza Louis S, Paulus Trena M, Garris Cindy P, Van de Velde Nicolas, Chounta Vasiliki, Deger Kristen A (2020) Qualitative thematic analysis of social media data to assess perceptions of route of administration for antiretroviral treatment among people living with HIV. The Patient - Patient-Centered Outcomes Research 13:409–432

    Article  Google Scholar 

  11. Mittal Mamta, Kaur Iqbaldeep, Pandey Subhash Chandra, Verma Amit, Goyal Lalit Mohan (2019) Opinion mining for the tweets in healthcare sector using fuzzy association rule. EAI Endorsed Trans Pervasive Health Technol 4(16):1–10

    Google Scholar 

  12. Marshall B, Salabarria-Pena Y, Johnson W, Moore L (2021) Reaching racial/ethnic and sexual and gender minorities with HIV prevention information via social marketing. Evaluation and Program Planning (In Press). https://doi.org/10.1016/j.evalprogplan.2021.101982

    Article  Google Scholar 

  13. Saranya G, Geetha G, Chakrapani K, Meenakshi K and Karpagaselvi S (2020) Sentiment analysis of healthcare tweets using SVM classifier. International Conference on Power, Energy, Control and Transmission Systems (ICPECTS). Chennai, pp 1–3. https://doi.org/10.1109/ICPECTS49113.2020.9336981

  14. Manaloto TAD, Raga RC Jr (2020) Tools and techniques for capturing possible HIV risk-related tweets of filipinos. Int J Sci Technol Res 9(4):2116–2121

    Google Scholar 

  15. Young SD, Wenchao Yu, Wang W (2017) Toward automating HIV identification machine learning for rapid identification of HIV-related social media data. J Acquir Immune Defic Syndr 74:128–131

    Article  Google Scholar 

  16. Tavoschi Lara, Quattrone Filippo, D’Andrea Eleonora, Ducange Pietro, Vabanesi Marco, Marcelloni Francesco, Lopalco Pier Luigi (2020) Twitter as a sentinel tool to monitor public opinion on vaccination an opinion mining analysis from September 2016 to August 2017 in Italy. Human Vaccines & Immunotherapeutics 16(5):1062–1069

    Article  Google Scholar 

  17. Francesco Marcelloni and Pier Luigi Lopalco (2020) Twitter as a sentinel tool to monitor public opinion on vaccination an opinion mining analysis from September 2016 to August 2017 in Italy. Hum Vaccin Immunother 16(5):1062–1069

    Article  Google Scholar 

  18. Lohmann S, Lourentzou I, Zhai C, Albarracin D (2018) Who is saying what on twitter an analysis of messages with references to HIV and HIV risk behavior. ACTA De InvestigacionPsicologica 8(1):95–100

    Google Scholar 

  19. Adrover C, Bodnar T, Salathe M (2014) Targeting HIV-related medication side effects and sentiment using twitter data. https://doi.org/10.48550/arXiv.1404.3610

  20. Mageshwari V, Laurence Aroquiaraj I (2019) An efficient feature extraction method for mining social media. Int J Sci Technol Res 8(11):640–643

    Google Scholar 

  21. Stevens Robin, Bonett Stephen, Bannon Jacqueline, Chittamuru Deepti, Slaff Barry, Browne Safa K, Huang Sarah, Bauermeister Jose A (2020) Association between HIV-related Tweets to HIV incidence in the U.S A digital epidemiological study. J Med Int Res 22(6):1–25

    Google Scholar 

  22. Malik Aqdas, Antonino Angi, Laeeq Khan M, Nieminen Marko (2021) Characterizing HIV discussions and engagement on twitter. Health Technol 11(4):1237–1245

    Article  Google Scholar 

  23. CosmeAdrover Todd Bodnar, Huang Zhuojie, Telenti Amalio, Salathe Marcel (2015) Identifying adverse effects of HIV drug treatment and associated sentiments using twitter. JMIR Public Health and Surveillance 1(2):1–10

    Google Scholar 

  24. Thangarajan N, Green N, Gupta A, Little S, Weibel N (2015) Analyzing social media to characterize local HIV at-risk populations. Proceedings of the conference on Wireless Health. Bethesda Maryland, pp 1–8. https://doi.org/10.1145/2811780.2811923

  25. Viola Savy Dsouza (2023) Priyobrat Rajkhowa, Rashmi Mallya, Raksha, Mrinalini V, Cauvery K, Rohit Raj, Indu Toby, Sanjay Pattanshetty and Helmut Brand, “A sentiment and content analysis of tweets on monkeypox stigma among the LGBTQ+ community: A cue to risk communication plan.” Dialogues in Health 2:1–8

    Google Scholar 

  26. Qin Z, Ronchieri E (2022) Exploring pandemics events on twitter by using sentiment analysis and topic modeling. Appl Sci 12:1–21

    Article  Google Scholar 

  27. Mathiyazhagan B, Liyaskar J (2022) Ahmad Taher Azar, Hannah H Inbarani, Yasir Javed, Nashwa Ahmad Kamal and Khaled M Fouad, “Rough set based classification and feature selection using improved harmony search for peptide analysis and prediction of anti-hiv-1 activities.” Appl Sci 12:1–13

    Article  Google Scholar 

  28. Maria Grazia Sindoni (2021) The time is now: A multimodal pragmatic analysis of how identity and distance are indexed in HIV risk communication digital campaigns in US. J Pragmat 171:82–86

    Article  Google Scholar 

  29. Bazrafshan A, PanahiI S, Sharifi H, Merghati-Khoei E (2022) The role of online social networks in improving health literacy and medication adherence among people living with HIV/AIDS in Iran: Development of a conceptual model. PLoS ONE 17(6):1–21

    Google Scholar 

  30. Erdengasileng A, Tian S, Green SS, Naar S, He Z (2022) Using twitter data analysis to understand the perceptions, awareness, and barriers to the wide use of pre-exposure prophylaxis in the united states, In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Las Vegas, Nevada, pp 3000–3007. https://doi.org/10.1109/2Fbibm55620.2022.9995568

  31. Dangi D, Dixi DK, Bhagat A (2022) Sentiment analysis of COVID-19 social media data through machine learning. Multimed Tools Appl 81(29):42261–42283. https://doi.org/10.1007/s11042-022-13492-w

    Article  Google Scholar 

  32. Palani B, Elango S, Viswanathan KV (2022) CB-Fake: A multimodal deep learning framework for automatic fake news detection using capsule neural network and BERT. Multimed Tools Appl 81(4):5587–5620. https://doi.org/10.1007/s11042-021-11782-3

    Article  Google Scholar 

  33. Dinç B, Kaya Y (2024) HBDFA: An intelligent nature-inspired computing with high-dimensional data analytics. Multimed Tools Appl 83(4):11573–11592. https://doi.org/10.1007/s11042-023-16039-9

    Article  Google Scholar 

  34. Mallik A, Kumar S (2024) Word2Vec and LSTM based deep learning technique for context-free fake news detection. Multimed Tools Appl 83(1):919–940. https://doi.org/10.1007/s11042-023-15364-3

    Article  Google Scholar 

Download references

Acknowledgements

We thank the anonymous referees for their useful suggestions.

Funding

This work has no funding resource.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by *1Mageshwari V, 2Dr. I. Laurence Aroquiaraj. The first draft of the manuscript was written by Mageshwari V and all authors commented on previous versions of the manuscript.

All authors read and approved the final manuscript.

Corresponding author

Correspondence to V. Mageshwari.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent of publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mageshwari, V., Aroquiaraj, I.L. A novel technique for identification and classification of HIV/AIDS related social media data using LD-KMEANS and DBN-LSTM. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19283-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-19283-9

Keywords

Navigation