Abstract
Natural Language processing applications such as sentiment analysis, spam detection, and stance detection extract the author’s emotions, feelings, and categorizations such as favor or denial from a piece of text sentences or corpus. Various researchers keep on working in these areas. However, in this research work, the relations between target entities (say: climate, population) along with the author’s stance and the country’s economic growth level are used to derive a statistical hypothesis. This hypothesis is being proved, discussed, and concluded as per the result obtained. All the missing country information for each Twitter account is filled up with a technique that uses meta information from tweets. A subset of the data is annotated with predefined seeding features, rules, and then applies the best performed supervised machine learning model to predict the remaining unlabeled tweets. Tweets are labeled as “believer” or “denier” for each country, and a hypothesis is being proved based on the statement made by rich and poor countries. The statistical result also shows that there exists a positive correlation between the GDP growth rate and the number of deniers and believers in each country. These techniques, experimental findings, and statistical analysis are presented in this paper.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
K. Douglas, R. Sutton, Climate change: why the conspiracy theories are dangerous. Bull. At. Sci. 71, 98–106 (2015)
V. Gupta, R. Hewett, Harnessing the power of hashtags in tweet analytics, in 2017 IEEE International Conference on Big Data (Big Data) (Boston, MA, USA, 2017), pp. 2390–2395. https://doi.org/10.1109/BigData.2017.8258194
S. Sedhai, A. Sun, An analysis of 14 million tweets on hashtag-oriented spamming. J. Assoc. Inf. Sci. Technol. 68(7), 1638–1651 (2017)
S. Chandra, L. Khan, F.B. Muhaya, Estimating twitter user location using social interactions—a content based approach, in 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing (MA, 2011), pp. 838–843. https://doi.org/10.1109/PASSAT/SocialCom.2011.120
J. Mahmud, J. Nichols, C. Drews, Home location identification of twitter users. ACM Trans. Intell. Syst. Technol. 5, 47:1–47:21 (2014). https://doi.org/10.1145/2528548
C.R. Singh, R. Gobinath, Identify missing countries using GEEBLL iterative method for analyzing tweets opinion. Mater. Today Proc. (2020). ISSN 2214-7853. https://doi.org/10.1016/j.matpr.2020.10.758
F. Emmert-Streib, M. Dehmer, Understanding statistical hypothesis testing: the logic of statistical inference. Mach. Learn. Knowl. Extr. 1(3), 945–961 (2019)
A. Kaur, R. Kumar, Comparative analysis of parametric and non-parametric tests. J. Comput. Math. Sci. 6(6), 336–342 (2015). ISSN 0976-5727
D. Kalpić, N. Hlupić, M. Lovrić, Student’s t-tests, in International Encyclopedia of Statistical Science, ed. by M. Lovric. Springer, Berlin, Heidelberg (2011). https://doi.org/10.1007/978-3-642-04898-2_641
Mann–Whitney Test, in The Concise Encyclopedia of Statistics (Springer, New York, NY, 2008). https://doi.org/10.1007/978-0-387-32833-1_243
H.B. Mann, D.R. Whitney, On a test whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60 (1947). https://doi.org/10.1214/aoms/1177730491
C. Goad, Regression and Correlation Analysis (2020). https://doi.org/10.4324/9780429491900-11
Jefferson-Henrique, Get Old Tweets Programmatically. Retrieved on Jan 2020, from https://github.com/Jefferson-Henrique/GetOldTweets-python
The World Bank, World Development Indicator. Accessed 6th June 2021, from https://databank.worldbank.org/indicator/NY.GDP.PCAP.CD/1ff4a498/Popular-Indicators
JASP, A Fresh Way to Do Statistics. Accessed 6th June 2021, from https://jasp-stats.org/
L. Jäntschi, S.D. Bolboacă, Computation of probability associated with Anderson–darling statistic. Mathematics 6(6), 88 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Singh, C.R., Gobinath, R. (2023). Hypothesis Testing of Tweet Text Using NLP. In: Goswami, S., Barara, I.S., Goje, A., Mohan, C., Bruckstein, A.M. (eds) Data Management, Analytics and Innovation. ICDMAI 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 137. Springer, Singapore. https://doi.org/10.1007/978-981-19-2600-6_7
Download citation
DOI: https://doi.org/10.1007/978-981-19-2600-6_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2599-3
Online ISBN: 978-981-19-2600-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)