Abstract
The rapid growth in social media data has motivated the development of a real time framework to understand and extract the meaning of the data. Text categorization is a well-known method for understanding text. Text categorization can be applied in many forms, such as authorship detection and text mining by extracting useful information from documents to sort a set of documents automatically into predefined categories. Here, we propose a method for identifying those who posted the tweets into categories. The task is performed by extracting key features from tweets and subjecting them to a machine learning classifier. The research shows that this multi-classification task is very difficult, in particular the building of a domain-independent machine learning classifier. Our problem specifically concerned tweets about oil companies, most of which were noisy enough to affect the accuracy. The analytical technique used here provided structured and valuable information for oil companies.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Alchemy API, AlchemyAPI,Inc. (2015), http://www.alchemyapi.com/
Amazon Mechanical Turk, https://www.mturk.com
Alag, S.: Collective intelligence in action. Manning, New York (2009)
Aldahawi, H., Allen, S.: Twitter Mining in the Oil Business: A Sentiment Analysis Approach. In: The 3rd International Conference on Cloud and Green Computing (CGC). IEEE (2013)
Billsus, D., Pazzani, M.: User Modeling for Adaptive News Access. User Modeling and User-Adapted Interaction 10(2-3), 147–180 (2000)
Bollen, J., Mao, H., Zeng, X.: Twitter Mood Predicts the Stock Market. Journal of Computational Science 2(1), 1–8 (2011)
Fournier, S., Avery, J.: The uninvited brand. Business Horizons 54(3), 193–207 (2011)
Ghazanfar, M.A.: Robust, Scalable, and Practical Algorithms for Recommender Systems, University of Southampto (2012)
Ghazanfar, M.A., Prügel-Bennett, A.: The Advantage of Careful Imputation Sources in Sparse Data-Environment of Recommender Systems: Generating Improved SVD-based Recommendations. Informatica (Slovenia) 37(1), 61–92 (2013)
Ghazanfar, M.A., Prügel-Bennett, A., Szedmak, S.: Kernel-Mapping Recommender System Algorithms. Information Sciences 208, 81–104 (2012)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: an Update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)
Jindal, N., Liu, B.: Review Spam Detection. In: The 16th International Conference on World Wide Web. ACM (2007)
Mooney, R.J., Roy, L.: Content-Based Book Recommending Using Learning for Text Categorisation. In: The 5th ACM Conference on Digital Libraries. ACM (2000)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs Up?: Sentiment Classification Using Machine Learning Techniques. In: The ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10. Association for Computational Linguistics (2002)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–47 (2002)
Twitter, Twitter,Inc. (2015), https://about.twitter.com/company
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kauffman, San Francisco (1999)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study. International Journal of Computer Vision 73(2), 213–238 (2007)
Zhang, T., Popescul, A., Dom, B.: Linear Prediction Models with Graph Regularization for Web-Page Categorisation. In: The 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2006)
Zhang, X., Fuehres, H., Gloor, P.A.: Predicting Stock Market Indicators Through Twitter “I Hope It Is Not as Bad as I Fear”. Procedia-Social and Behavioral Sciences 26, 55–62 (2011)
Poria, S., Gelbukh, A., Cambria, E., Yang, P., Hussain, A., Durrani, T.: Merging SenticNet and WordNet-Affect emotion lists for sentiment analysis. In: 2012 IEEE 11th International Conference on Signal Processing (ICSP), October 21-25, vol. 2, pp. 1251–1255 (2012)
Poria, S., Cambria, E., Winterstein, G., Huang, G.-B.: Sentic patterns: Dependency-based rules for concept-level sentiment analysis. Knowledge-Based Systems 69, 45–63 (2014), http://dx.doi.org/10.1016/j.knosys.2014.05.005 ISSN 0950-7051
Poria, S., Gelbukh, A., Das, D., Bandyopadhyay, S.: Fuzzy Clustering for Semi-supervised Learning–Case Study: Construction of an Emotion Lexicon. In: Batyrshin, I., González Mendoza, M. (eds.) MICAI 2012, Part I. LNCS, vol. 7629, pp. 73–86. Springer, Heidelberg (2013)
Cambria, E., Fu, J., Bisio, F., Poria, S.: AffectiveSpace 2: Enabling Affective Intuition for Concept-Level Sentiment Analysis. In: Twenty-ninth AAAI Conference on Artificial Intelligence (2015)
Poria, S., Cambria, E., Hussain, A., Huang, G.-B.: Towards an intelligent framework for multimodal affective data analysis. Neural Networks 63, 104–116 (2015), http://dx.doi.org/10.1016/j.neunet.2014.10.005 ISSN 0893-6080
Poria, S., Cambria, E., Ku, L.-W., Gui, C., Gelbukh, A.: A rule-based approach to aspect extraction from product reviews. In: SocialNLP 2014, vol. 28 (2014)
Poria, S., Gelbukh, A., Cambria, E., Das, D., Bandyopadhyay, S.: Enriching SenticNet polarity scores through semi-supervised fuzzy clustering. In: 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), pp. 709–716. IEEE (2012)
Poria, S., Gelbukh, A., Hussain, A., Bandyopadhyay, S., Howard, N.: Music genre classification: A semi-supervised approach. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Rodríguez, J.S., di Baja, G.S. (eds.) MCPR 2012. LNCS, vol. 7914, pp. 254–263. Springer, Heidelberg (2013)
Poria, S., Gelbukh, A., Cambria, E., Hussain, A., Huang, G.-B.: EmoSenticSpace: A novel framework for affective common-sense reasoning. Knowledge-Based Systems 69, 108–123 (2014)
Poria, S., Gelbukh, A., Hussain, A., Howard, N., Das, D., Bandyopadhyay, S.: Enhanced SenticNet with Affective Labels for Concept-Based Opinion Mining. IEEE Intelligent Systems 28(2), 31–38 (2013), doi:10.1109/MIS.2013.4
Poria, S., Agarwal, B., Gelbukh, A., Hussain, A., Howard, N.: Dependency-based semantic parsing for concept-level text analysis. In: Gelbukh, A. (ed.) CICLing 2014, Part I. LNCS, vol. 8403, pp. 113–127. Springer, Heidelberg (2014)
Poria, S., Gelbukh, A., Agarwal, B., Cambria, E., Howard, N.: Common sense knowledge based personality recognition from text. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013, Part II. LNCS, vol. 8266, pp. 484–496. Springer, Heidelberg (2013)
Cambria, E., Poria, S., Gelbukh, A., Kwok, K.: Sentic API: A common-sense based API for concept-level sentiment analysis. In: Proceedings of the 4th Workshop on Making Sense of Microposts (# Microposts2014), co-located with the 23rd International World Wide Web Conference (WWW 2014), Seoul, Korea. CEUR Workshop Proceedings, vol. 1141, pp. 19–24 (2014)
Agarwal, B., Poria, S., Mittal, N., Gelbukh, A., Hussain, A.: Concept-Level Sentiment Analysis with Dependency-Based Semantic Parsing: A Novel Approach. In: Cognitive Computation, pp. 1–13 (2015)
Poria, S., Cambria, E., Howard, N., Huang, G.-B., Hussain, A.: Fusing Audio, Visual and Textual Clues for Sentiment Analysis from Multimodal Content. Neurocomputing (2015)
Chikersal, P., Poria, S., Cambria, E.: SeNTU: Sentiment analysis of tweets by combining a rule-based classifier with supervised learning. In: Proceedings of the International Workshop on Semantic Evaluation, SemEval 2015 (2015)
Minhas, S., Poria, S., Hussain, A., Hussainey, K.: A review of artificial intelligence and biologically inspired computational approaches to solving issues in narrative financial disclosure. In: Liu, D., Alippi, C., Zhao, D., Hussain, A. (eds.) BICS 2013. LNCS, vol. 7888, pp. 317–327. Springer, Heidelberg (2013)
Pakray, P., Poria, S., Bandyopadhyay, S., Gelbukh, A.: Semantic textual entailment recognition using UNL. Polibits 43, 23–27 (2011)
Das, D., Poria, S., Bandyopadhyay, S.: A classifier based approach to emotion lexicon construction. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds.) NLDB 2012. LNCS, vol. 7337, pp. 320–326. Springer, Heidelberg (2012)
Sidorov, G.: Should syntactic n-grams contain names of syntactic relations. International Journal of Computational Linguistics and Applications 5(1), 139–158 (2014)
Sidorov, G., Gelbukh, A., Gómez-Adorno, H., Pinto, D.: Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model. Computación y Sistemas 18(3) (2014)
Sidorov, G., Kobozeva, I., Zimmerling, A., Chanona-Hernández, L., Kolesnikova, O.: Modelo computacional del diálogo basado en reglas aplicado a un robot guía móvil. Polibits 50, 35–42 (2014)
Ben-Ami, Z., Feldman, R., Rosenfeld, B.: Using Multi-View Learning to Improve Detection of Investor Sentiments on Twitter. Computación y Sistemas 18(3) (2014)
Das, N., Ghosh, S., Gonçalves, T., Quaresma, P.: Comparison of Different Graph Distance Metrics for Semantic Text Based Classification. Polibits 49, 51–57 (2014)
Alonso-Rorís, V.M., Gago, J.M.S., Rodríguez, R.P., Costa, C.R., Carballa, M.A.G., Rifón, L.A.: Information Extraction in Semantic, Highly-Structured, and Semi-Structured Web Sources. Polibits 49, 69–75 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Aldahawi, H., Allen, S. (2015). An Approach to Tweets Categorization by Using Machine Learning Classifiers in Oil Business. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-18117-2_40
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18116-5
Online ISBN: 978-3-319-18117-2
eBook Packages: Computer ScienceComputer Science (R0)