Skip to main content

Hypothesis Testing of Tweet Text Using NLP

  • Conference paper
  • First Online:
Data Management, Analytics and Innovation (ICDMAI 2022)

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 137))

Included in the following conference series:

  • 514 Accesses

Abstract

Natural Language processing applications such as sentiment analysis, spam detection, and stance detection extract the author’s emotions, feelings, and categorizations such as favor or denial from a piece of text sentences or corpus. Various researchers keep on working in these areas. However, in this research work, the relations between target entities (say: climate, population) along with the author’s stance and the country’s economic growth level are used to derive a statistical hypothesis. This hypothesis is being proved, discussed, and concluded as per the result obtained. All the missing country information for each Twitter account is filled up with a technique that uses meta information from tweets. A subset of the data is annotated with predefined seeding features, rules, and then applies the best performed supervised machine learning model to predict the remaining unlabeled tweets. Tweets are labeled as “believer” or “denier” for each country, and a hypothesis is being proved based on the statement made by rich and poor countries. The statistical result also shows that there exists a positive correlation between the GDP growth rate and the number of deniers and believers in each country. These techniques, experimental findings, and statistical analysis are presented in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://overpopulation-project.com/population-growth-is-a-threat-to-the-worlds-climate/.

References

  1. K. Douglas, R. Sutton, Climate change: why the conspiracy theories are dangerous. Bull. At. Sci. 71, 98–106 (2015)

    Article  Google Scholar 

  2. V. Gupta, R. Hewett, Harnessing the power of hashtags in tweet analytics, in 2017 IEEE International Conference on Big Data (Big Data) (Boston, MA, USA, 2017), pp. 2390–2395. https://doi.org/10.1109/BigData.2017.8258194

  3. S. Sedhai, A. Sun, An analysis of 14 million tweets on hashtag-oriented spamming. J. Assoc. Inf. Sci. Technol. 68(7), 1638–1651 (2017)

    Google Scholar 

  4. S. Chandra, L. Khan, F.B. Muhaya, Estimating twitter user location using social interactions—a content based approach, in 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing (MA, 2011), pp. 838–843. https://doi.org/10.1109/PASSAT/SocialCom.2011.120

  5. J. Mahmud, J. Nichols, C. Drews, Home location identification of twitter users. ACM Trans. Intell. Syst. Technol. 5, 47:1–47:21 (2014). https://doi.org/10.1145/2528548

  6. C.R. Singh, R. Gobinath, Identify missing countries using GEEBLL iterative method for analyzing tweets opinion. Mater. Today Proc. (2020). ISSN 2214-7853. https://doi.org/10.1016/j.matpr.2020.10.758

  7. F. Emmert-Streib, M. Dehmer, Understanding statistical hypothesis testing: the logic of statistical inference. Mach. Learn. Knowl. Extr. 1(3), 945–961 (2019)

    Google Scholar 

  8. A. Kaur, R. Kumar, Comparative analysis of parametric and non-parametric tests. J. Comput. Math. Sci. 6(6), 336–342 (2015). ISSN 0976-5727

    Google Scholar 

  9. D. Kalpić, N. Hlupić, M. Lovrić, Student’s t-tests, in International Encyclopedia of Statistical Science, ed. by M. Lovric. Springer, Berlin, Heidelberg (2011). https://doi.org/10.1007/978-3-642-04898-2_641

  10. Mann–Whitney Test, in The Concise Encyclopedia of Statistics (Springer, New York, NY, 2008). https://doi.org/10.1007/978-0-387-32833-1_243

  11. H.B. Mann, D.R. Whitney, On a test whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60 (1947). https://doi.org/10.1214/aoms/1177730491

    Article  MathSciNet  MATH  Google Scholar 

  12. C. Goad, Regression and Correlation Analysis (2020). https://doi.org/10.4324/9780429491900-11

  13. Jefferson-Henrique, Get Old Tweets Programmatically. Retrieved on Jan 2020, from https://github.com/Jefferson-Henrique/GetOldTweets-python

  14. The World Bank, World Development Indicator. Accessed 6th June 2021, from https://databank.worldbank.org/indicator/NY.GDP.PCAP.CD/1ff4a498/Popular-Indicators

  15. JASP, A Fresh Way to Do Statistics. Accessed 6th June 2021, from https://jasp-stats.org/

  16. L. Jäntschi, S.D. Bolboacă, Computation of probability associated with Anderson–darling statistic. Mathematics 6(6), 88 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chongtham Rajen Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, C.R., Gobinath, R. (2023). Hypothesis Testing of Tweet Text Using NLP. In: Goswami, S., Barara, I.S., Goje, A., Mohan, C., Bruckstein, A.M. (eds) Data Management, Analytics and Innovation. ICDMAI 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 137. Springer, Singapore. https://doi.org/10.1007/978-981-19-2600-6_7

Download citation

Publish with us

Policies and ethics