Skip to main content

An Approach to Tweets Categorization by Using Machine Learning Classifiers in Oil Business

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9042))

Abstract

The rapid growth in social media data has motivated the development of a real time framework to understand and extract the meaning of the data. Text categorization is a well-known method for understanding text. Text categorization can be applied in many forms, such as authorship detection and text mining by extracting useful information from documents to sort a set of documents automatically into predefined categories. Here, we propose a method for identifying those who posted the tweets into categories. The task is performed by extracting key features from tweets and subjecting them to a machine learning classifier. The research shows that this multi-classification task is very difficult, in particular the building of a domain-independent machine learning classifier. Our problem specifically concerned tweets about oil companies, most of which were noisy enough to affect the accuracy. The analytical technique used here provided structured and valuable information for oil companies.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alchemy API, AlchemyAPI,Inc. (2015), http://www.alchemyapi.com/

  2. Amazon Mechanical Turk, https://www.mturk.com

  3. Alag, S.: Collective intelligence in action. Manning, New York (2009)

    Google Scholar 

  4. Aldahawi, H., Allen, S.: Twitter Mining in the Oil Business: A Sentiment Analysis Approach. In: The 3rd International Conference on Cloud and Green Computing (CGC). IEEE (2013)

    Google Scholar 

  5. Billsus, D., Pazzani, M.: User Modeling for Adaptive News Access. User Modeling and User-Adapted Interaction 10(2-3), 147–180 (2000)

    Article  Google Scholar 

  6. Bollen, J., Mao, H., Zeng, X.: Twitter Mood Predicts the Stock Market. Journal of Computational Science 2(1), 1–8 (2011)

    Article  Google Scholar 

  7. Fournier, S., Avery, J.: The uninvited brand. Business Horizons 54(3), 193–207 (2011)

    Article  Google Scholar 

  8. Ghazanfar, M.A.: Robust, Scalable, and Practical Algorithms for Recommender Systems, University of Southampto (2012)

    Google Scholar 

  9. Ghazanfar, M.A., Prügel-Bennett, A.: The Advantage of Careful Imputation Sources in Sparse Data-Environment of Recommender Systems: Generating Improved SVD-based Recommendations. Informatica (Slovenia) 37(1), 61–92 (2013)

    Google Scholar 

  10. Ghazanfar, M.A., Prügel-Bennett, A., Szedmak, S.: Kernel-Mapping Recommender System Algorithms. Information Sciences 208, 81–104 (2012)

    Article  Google Scholar 

  11. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: an Update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)

    Article  Google Scholar 

  12. Jindal, N., Liu, B.: Review Spam Detection. In: The 16th International Conference on World Wide Web. ACM (2007)

    Google Scholar 

  13. Mooney, R.J., Roy, L.: Content-Based Book Recommending Using Learning for Text Categorisation. In: The 5th ACM Conference on Digital Libraries. ACM (2000)

    Google Scholar 

  14. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs Up?: Sentiment Classification Using Machine Learning Techniques. In: The ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10. Association for Computational Linguistics (2002)

    Google Scholar 

  15. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–47 (2002)

    Article  Google Scholar 

  16. Twitter, Twitter,Inc. (2015), https://about.twitter.com/company

  17. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kauffman, San Francisco (1999)

    Google Scholar 

  18. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)

    Google Scholar 

  19. Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study. International Journal of Computer Vision 73(2), 213–238 (2007)

    Article  Google Scholar 

  20. Zhang, T., Popescul, A., Dom, B.: Linear Prediction Models with Graph Regularization for Web-Page Categorisation. In: The 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2006)

    Google Scholar 

  21. Zhang, X., Fuehres, H., Gloor, P.A.: Predicting Stock Market Indicators Through Twitter “I Hope It Is Not as Bad as I Fear”. Procedia-Social and Behavioral Sciences 26, 55–62 (2011)

    Article  Google Scholar 

  22. Poria, S., Gelbukh, A., Cambria, E., Yang, P., Hussain, A., Durrani, T.: Merging SenticNet and WordNet-Affect emotion lists for sentiment analysis. In: 2012 IEEE 11th International Conference on Signal Processing (ICSP), October 21-25, vol. 2, pp. 1251–1255 (2012)

    Google Scholar 

  23. Poria, S., Cambria, E., Winterstein, G., Huang, G.-B.: Sentic patterns: Dependency-based rules for concept-level sentiment analysis. Knowledge-Based Systems 69, 45–63 (2014), http://dx.doi.org/10.1016/j.knosys.2014.05.005 ISSN 0950-7051

  24. Poria, S., Gelbukh, A., Das, D., Bandyopadhyay, S.: Fuzzy Clustering for Semi-supervised Learning–Case Study: Construction of an Emotion Lexicon. In: Batyrshin, I., González Mendoza, M. (eds.) MICAI 2012, Part I. LNCS, vol. 7629, pp. 73–86. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  25. Cambria, E., Fu, J., Bisio, F., Poria, S.: AffectiveSpace 2: Enabling Affective Intuition for Concept-Level Sentiment Analysis. In: Twenty-ninth AAAI Conference on Artificial Intelligence (2015)

    Google Scholar 

  26. Poria, S., Cambria, E., Hussain, A., Huang, G.-B.: Towards an intelligent framework for multimodal affective data analysis. Neural Networks 63, 104–116 (2015), http://dx.doi.org/10.1016/j.neunet.2014.10.005 ISSN 0893-6080

  27. Poria, S., Cambria, E., Ku, L.-W., Gui, C., Gelbukh, A.: A rule-based approach to aspect extraction from product reviews. In: SocialNLP 2014, vol. 28 (2014)

    Google Scholar 

  28. Poria, S., Gelbukh, A., Cambria, E., Das, D., Bandyopadhyay, S.: Enriching SenticNet polarity scores through semi-supervised fuzzy clustering. In: 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), pp. 709–716. IEEE (2012)

    Google Scholar 

  29. Poria, S., Gelbukh, A., Hussain, A., Bandyopadhyay, S., Howard, N.: Music genre classification: A semi-supervised approach. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Rodríguez, J.S., di Baja, G.S. (eds.) MCPR 2012. LNCS, vol. 7914, pp. 254–263. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  30. Poria, S., Gelbukh, A., Cambria, E., Hussain, A., Huang, G.-B.: EmoSenticSpace: A novel framework for affective common-sense reasoning. Knowledge-Based Systems 69, 108–123 (2014)

    Article  Google Scholar 

  31. Poria, S., Gelbukh, A., Hussain, A., Howard, N., Das, D., Bandyopadhyay, S.: Enhanced SenticNet with Affective Labels for Concept-Based Opinion Mining. IEEE Intelligent Systems 28(2), 31–38 (2013), doi:10.1109/MIS.2013.4

    Article  Google Scholar 

  32. Poria, S., Agarwal, B., Gelbukh, A., Hussain, A., Howard, N.: Dependency-based semantic parsing for concept-level text analysis. In: Gelbukh, A. (ed.) CICLing 2014, Part I. LNCS, vol. 8403, pp. 113–127. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  33. Poria, S., Gelbukh, A., Agarwal, B., Cambria, E., Howard, N.: Common sense knowledge based personality recognition from text. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013, Part II. LNCS, vol. 8266, pp. 484–496. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  34. Cambria, E., Poria, S., Gelbukh, A., Kwok, K.: Sentic API: A common-sense based API for concept-level sentiment analysis. In: Proceedings of the 4th Workshop on Making Sense of Microposts (# Microposts2014), co-located with the 23rd International World Wide Web Conference (WWW 2014), Seoul, Korea. CEUR Workshop Proceedings, vol. 1141, pp. 19–24 (2014)

    Google Scholar 

  35. Agarwal, B., Poria, S., Mittal, N., Gelbukh, A., Hussain, A.: Concept-Level Sentiment Analysis with Dependency-Based Semantic Parsing: A Novel Approach. In: Cognitive Computation, pp. 1–13 (2015)

    Google Scholar 

  36. Poria, S., Cambria, E., Howard, N., Huang, G.-B., Hussain, A.: Fusing Audio, Visual and Textual Clues for Sentiment Analysis from Multimodal Content. Neurocomputing (2015)

    Google Scholar 

  37. Chikersal, P., Poria, S., Cambria, E.: SeNTU: Sentiment analysis of tweets by combining a rule-based classifier with supervised learning. In: Proceedings of the International Workshop on Semantic Evaluation, SemEval 2015 (2015)

    Google Scholar 

  38. Minhas, S., Poria, S., Hussain, A., Hussainey, K.: A review of artificial intelligence and biologically inspired computational approaches to solving issues in narrative financial disclosure. In: Liu, D., Alippi, C., Zhao, D., Hussain, A. (eds.) BICS 2013. LNCS, vol. 7888, pp. 317–327. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  39. Pakray, P., Poria, S., Bandyopadhyay, S., Gelbukh, A.: Semantic textual entailment recognition using UNL. Polibits 43, 23–27 (2011)

    Google Scholar 

  40. Das, D., Poria, S., Bandyopadhyay, S.: A classifier based approach to emotion lexicon construction. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds.) NLDB 2012. LNCS, vol. 7337, pp. 320–326. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  41. Sidorov, G.: Should syntactic n-grams contain names of syntactic relations. International Journal of Computational Linguistics and Applications 5(1), 139–158 (2014)

    MathSciNet  Google Scholar 

  42. Sidorov, G., Gelbukh, A., Gómez-Adorno, H., Pinto, D.: Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model. Computación y Sistemas 18(3) (2014)

    Google Scholar 

  43. Sidorov, G., Kobozeva, I., Zimmerling, A., Chanona-Hernández, L., Kolesnikova, O.: Modelo computacional del diálogo basado en reglas aplicado a un robot guía móvil. Polibits 50, 35–42 (2014)

    Google Scholar 

  44. Ben-Ami, Z., Feldman, R., Rosenfeld, B.: Using Multi-View Learning to Improve Detection of Investor Sentiments on Twitter. Computación y Sistemas 18(3) (2014)

    Google Scholar 

  45. Das, N., Ghosh, S., Gonçalves, T., Quaresma, P.: Comparison of Different Graph Distance Metrics for Semantic Text Based Classification. Polibits 49, 51–57 (2014)

    Google Scholar 

  46. Alonso-Rorís, V.M., Gago, J.M.S., Rodríguez, R.P., Costa, C.R., Carballa, M.A.G., Rifón, L.A.: Information Extraction in Semantic, Highly-Structured, and Semi-Structured Web Sources. Polibits 49, 69–75 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hanaa Aldahawi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Aldahawi, H., Allen, S. (2015). An Approach to Tweets Categorization by Using Machine Learning Classifiers in Oil Business. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18117-2_40

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18116-5

  • Online ISBN: 978-3-319-18117-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics