A Hybrid and Adaptive Approach for Classification of Indian Stock Market-Related Tweets

  • Sourav MalakarEmail author
  • Saptarsi Goswami
  • Amlan Chakrabarti
  • Basabi Chakraborty
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1016)


Twitter generates an enormous amount of data daily. Various studies over the years have concluded that tweets have a significant impact in predicting and understanding the stock price movement. Designing a system to store relevant tweets and extracting information for specific stocks and industry is a relevant and unattempted problem for Indian stock market, which is the eighth largest in terms of market capitalization. As people with diverse backgrounds are tweeting about many topics simultaneously, it is nontrivial to identify tweets which are relevant for the stock market. Therefore, a critical component of the aforesaid system should contain one module for the extraction and storage of the tweets and another module for text classification. In the current study, we have proposed a hybrid approach for text classification which combines lexicon-based and machine learning-based techniques. The proposed scheme handles class imbalance problems effectively and has an adaptive characteristic, where it automatically grows the lexicon both through WordNet and by using a machine learning techniques. This system achieves F1-score over 98% of the relevant class, as compared to 60% achieved using the baseline method over a corpus of 10,000 tweets. The coverage of tweets by lexicons also improves by 8%.


Cross-validation Stock market Twitter Text classification 


  1. 1.
    Liu, H., et al. (2016). The good, the bad, and the ugly: Uncovering novel research opportunities in social media mining. International Journal of Data Science and Analytics, 1(3–4), 137–143.CrossRefGoogle Scholar
  2. 2.
    Ediger, D., Jiang, K., Riedy, J., Bader, D.A., & Corley, C. (2010, September). Massive social network analysis: Mining Twitter for social good. In 2010 39th International Conference on Parallel Processing (ICPP) (pp. 583–593). IEEE.Google Scholar
  3. 3.
    Ashktorab, Z., Brown, C., Nandi, M., & Culotta, A. (2014, May). Tweedr: Mining Twitter to inform disaster response. In ISCRAM.Google Scholar
  4. 4.
    Abboute, A., Boudjeriou, Y., Entringer, G., Az, J., Bringay, S., & Poncelet, P. (2014, June). Mining Twitter for suicide prevention. In International Conference on Applications of Natural Language to Data Bases/Information Systems (pp. 250–253). Cham: Springer.Google Scholar
  5. 5.
    Goswami, S., Chakraborty, S., Ghosh, S., Chakrabarti, A., & Chakraborty, B. (2016). A review on application of data mining techniques to combat natural disasters. Ain Shams Engineering Journal, 9(3), 362–378.Google Scholar
  6. 6.
    Jain, V. K., & Kumar, S. (2017). Effective surveillance and predictive mapping of mosquito-borne diseases using social media. Journal of Computational Science, 25, 406–415.CrossRefGoogle Scholar
  7. 7.
    Ghiassi, M., Skinner, J., & Zimbra, D. (2013). Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network. Expert Systems with Applications, 40(16), 6266–6282.CrossRefGoogle Scholar
  8. 8.
    Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8.CrossRefGoogle Scholar
  9. 9.
    Rao, T., & Srivastava, S. (2012, August). Analyzing stock market movements using Twitter sentiment analysis. In Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) (pp. 119–123). IEEE Computer Society.Google Scholar
  10. 10.
    Zhang, X., Shi, J., Wang, D., & Fang, B. (2017). Exploiting investors social network for stock prediction in Chinas market. Journal of Computational Science, 28, 294–303.CrossRefGoogle Scholar
  11. 11.
    Ruan, Y., Durresi, A., & Alfantoukh, L. (2018). Using Twitter trust network for stock market analysis. Knowledge-Based Systems, 1(145), 207–218.CrossRefGoogle Scholar
  12. 12.
    Nisar, T. M., & Yeung, M. (2018). Twitter as a tool for forecasting stock market movements: A short-window event study. The Journal of Finance and Data Science, 4(2), 101–119.CrossRefGoogle Scholar
  13. 13.
    Rajput, H. (2014). Social media and politics in India: A study on Twitter usage among Indian Political Leaders. Asian Journal of Multidisciplinary Studies, 2(1), 63–69.Google Scholar
  14. 14.
    Khan, A. Z., Atique, M., & Thakare, V. M. (2015). Combining lexicon-based and learning-based methods for Twitter sentiment analysis. International Journal of Electronics, Communication and Soft Computing Science and Engineering (IJECSCSE), 89.Google Scholar
  15. 15.
    Mudinas, A., Zhang, D., & Levene, M. (2012, August). Combining lexicon and learning based approaches for concept-level sentiment analysis. In Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining (p. 5). ACM.Google Scholar
  16. 16.
    Christiane, F. (Ed.). (1998). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.zbMATHGoogle Scholar
  17. 17.
    Rothwell, A. C., Jagger, L. D., Dennis, W. R., & Clarke, D. R. (2004). Networks Associates Technology Inc, 2004. Intelligent SPAM detection system using an updateable neural analysis engine. U.S. Patent 6,769,016.Google Scholar
  18. 18.
    Juola, P. (2008). Authorship attribution. Foundations and Trends in Information Retrieval, 1(3), 233–334.CrossRefGoogle Scholar
  19. 19.
    Kumar, M., & Rangan, V. (2011). Clearwell Systems Inc, 2011. Methods and systems for e-mail topic classification. U.S. Patent 7,899,871.Google Scholar
  20. 20.
    Veningston, K., Shanmugalakshmi, R., & Nirmala, V. (2015). Semantic association ranking schemes for information retrieval applications using term association graph representation. Sadhana, 40(6), 1793–1819.MathSciNetCrossRefGoogle Scholar
  21. 21.
    Rani, P., Pudi, V., & Sharma, D. M. (2016). A semi-supervised associative classification method for POS tagging. International Journal of Data Science and Analytics, 1(2), 123–136.CrossRefGoogle Scholar
  22. 22.
    Lpez, V., et al. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250, 113–141.CrossRefGoogle Scholar
  23. 23.
    Melville, P., Gryc, W., & Lawrence, R. D. (2009, June). Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1275–1284). ACM.Google Scholar
  24. 24.
    Yenala, H., et al. (2017). Deep learning for detecting inappropriate content in text. International Journal of Data Science and Analytics, 6(4), 273–286.CrossRefGoogle Scholar
  25. 25.
    Lu, B., & Tsou, B. K. (2010, July). Combining a large sentiment lexicon and machine learning for subjectivity classification. In 2010 International Conference on Machine Learning and Cybernetics (ICMLC) (Vol. 6, pp. 3311–3316). IEEE.Google Scholar
  26. 26.
    Zhao, S., et al. (2016). Correlating Twitter with the stock market through non-Gaussian SVAR. In 2016 Eighth International Conference on Advanced Computational Intelligence (ICACI). IEEE.Google Scholar
  27. 27.
    Pagolu, V. S., et al. (2016). Sentiment analysis of Twitter data for predicting stock market movements. In 2016 International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES). IEEE.Google Scholar
  28. 28.
    Oliveira, N., Paulo C., & Nelson, A. (2013). Some experiments on modeling stock market behavior using investor sentiment analysis and posting volume from Twitter. In Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics. ACM.Google Scholar
  29. 29.
    Leitch, D., & Sherif, M. (2017). Twitter mood, CEO succession announcements and stock returns. Journal of Computational Science, 21, 1–10.CrossRefGoogle Scholar
  30. 30.
    Chung, S., & Sandy, L. (2011). Predicting stock market fluctuations from Twitter. Berkeley, California.Google Scholar
  31. 31.
    Mao, Y., Wei, W., & Bing, W. (2013). Twitter volume spikes: analysis and application in stock trading. In Proceedings of the 7th Workshop on Social Network Mining and Analysis. ACM.Google Scholar
  32. 32.
    Simsek, M. U., & Suat, Z. (2012). Analysis of the relation between Turkish Twitter messages and stock market index. In 2012 6th International Conference on Application of Information and Communication Technologies (AICT). IEEE.Google Scholar
  33. 33.
    Smailovi, J., et al. (2013). Predictive sentiment analysis of tweets: A stock market application. In Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data (pp. 77–88). Berlin, Heidelberg: Springer.Google Scholar
  34. 34.
    R Core Team. (2017). R: A language and environment for statistical computing. In R Foundation for Statistical Computing, Vienna, Austria,
  35. 35.
    Fellbaum, C. (1998). WordNet: An electronic lexical database. Bradford Books.Google Scholar
  36. 36.
    Feinerer, I., Hornik, K., & Meyer, D. (2008). Text mining infrastructure in R. Journal of Statistical Software, 25(5), 1–54.CrossRefGoogle Scholar
  37. 37.
    Rinker, T. W. (2018). Textstem: Tools for stemming and lemmatizing text version 0.1.4. New York: Buffalo.Google Scholar
  38. 38.
    Faruqui, M., et al. (2016). Problems with evaluation of word embeddings using word similarity tasks. arXiv preprint arXiv:1605.02276.
  39. 39.
    Torgo, L. (2010). Data mining with R, learning with case studies. Boca Rotan: Chapman and Hall/CRC.CrossRefGoogle Scholar
  40. 40.
    R Development Core Team. (2008). R: A language and environment for statistical computing. In R Foundation for Statistical Computing, Vienna, Austria. ISBN:3-900051-07-0.Google Scholar
  41. 41.
    Kuhn, M. (2018). Caret: classification and regression training. Contributions from Wing, J., Weston, S., Williams, A., Keefer, C., Engelhardt, A., Cooper, T., Mayer, Z., Kenkel, B., The R Core Team, Benesty, M., Lescarbeau, R., Ziem, A., Scrucca, L., Tang, Y., Candan, C., & Tyler Hunt. In R Package Version 6.0-79.Google Scholar
  42. 42.
    Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., et al. (2011). pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12, 77.CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Sourav Malakar
    • 1
    Email author
  • Saptarsi Goswami
    • 1
  • Amlan Chakrabarti
    • 1
  • Basabi Chakraborty
    • 2
  1. 1.A.K. Choudhury School of Information Technology, University of CalcuttaKolkataIndia
  2. 2.Faculty of Software and Information ScienceIwate Prefectural UniversityTakizawaJapan

Personalised recommendations