Abstract
Mobile devices are now the dominant medium for communications. Humans express various emotions when communicating with others and these communications can be analyzed to deduce their emotional inclinations. Natural language processing techniques have been used to analyze sentiment in text. However, most research involving sentiment analysis in the short message domain (SMS and Twitter) do not account for the presence of non-dictionary words. This chapter investigates the problem of sentiment analysis in short messages and the analysis of emotional swings of an individual over time. This provides an additional layer of information for forensic analysts when investigating suspects. The maximum entropy algorithm is used to classify short messages as positive, negative or neutral. Non-dictionary words are normalized and the impact of normalization and other features on classification is evaluated; in fact, this approach enhances the classification F-score compared with previous work. A forensic tool with an intuitive user interface has been developed to support the extraction and visualization of sentiment information pertaining to persons of interest. In particular, the tool presents an improved approach for identifying mood swings based on short messages sent by subjects. The timeline view provided by the tool helps pinpoint periods of emotional instability that may require further investigation. Additionally, the Apache Solr system used for indexing ensures that a forensic analyst can retrieve the desired information rapidly and efficiently using faceted search queries.
Chapter PDF
Similar content being viewed by others
References
Andriotis, P., Oikonomou, G.: Messaging activity reconstruction with sentiment polarity identification. In: Tryfonas, T., Askoxylakis, I. (eds.) HAS 2015. LNCS, vol. 9190, pp. 475–486. Springer, Heidelberg (2015). doi:10.1007/978-3-319-20376-8_42
Andriotis, P., Oikonomou, G., Tryfonas, T.: Forensic analysis of wireless networking evidence of android smartphones. In: Proceedings of the IEEE International Workshop on Information Forensics and Security, pp. 109–114 (2012)
Andriotis, P., Takasu, A., Tryfonas, T.: Smartphone message sentiment analysis. In: Peterson, G., Shenoi, S. (eds.) DigitalForensics 2014. IAICT, vol. 433, pp. 253–265. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44952-3_17
Beebe, N., Clark, J.: Digital forensic text string searching: Improving information retrieval effectiveness by thematically clustering search results. Digital Investigation 4(S), 49–54 (2007)
Bird, S.: NLTK: the natural Language Toolkit. In: Proceedings of the Association for Computational Linguistics Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, vol. 1, pp. 63–70 (2002)
Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intelligent Systems 28(2), 15–21 (2013)
Chen, T., Kan, M.: Creating a live, public short message service corpus: The NUS SMS corpus. Language Resources and Evaluation 47(2), 299–335 (2013)
Cheng, N., Chandramouli, R., Subbalakshmi, K.: Author gender identification from text. Digital Investigation 8(1), 78–88 (2011)
de Vel, O., Anderson, A., Corney, M., Mohay, G.: Mining e-mail content for author identification forensics. ACM Sigmod Record 30(4), 55–64 (2001)
Ding, X., Liu, B., Yu, P.: A holistic lexicon-based approach to opinion mining. In: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 231–240 (2008)
Gamon, M.: Sentiment classification on customer feedback data: noisy data, large feature vectors and the role of linguistic analysis. In: Proceedings of the Twentieth International Conference on Computational Linguistics, pp. 841–847 (2004)
Go, A., Bhayani, R., Huang, L.: Twitter Sentiment Classification using Distant Supervision, CS224N Final Project Report, Department of Computer Science, Stanford University, Stanford, California (2009)
Han, B., Cook, P., Baldwin, T.: Lexical normalization for social media text. ACM Transactions on Intelligent Systems and Technology 4(1), article no. 5 (2013)
Iqbal, F., Binsalleeh, H., Fung, B., Debbabi, M.: Mining writeprints from anonymous e-mails for forensic investigation. Digital Investigation 7(1–2), 56–64 (2010)
Jin, X., Li, Y., Mah, T., Tong, J.: Sensitive webpage classification for content advertising. In: Proceedings of the First International Workshop on Data Mining and Audience Intelligence for Advertising, pp. 28–33 (2007)
Juola, P.: Authorship attribution. Foundations and Trends in Information Retrieval 1(3), 233–334 (2006)
Kobus, C., Yvon, F., Damnati, G.: Normalizing SMS: Are two metaphors better than one? Proceedings of the Twenty-Second International Conference on Computational Linguistics 1, 441–448 (2008)
Ling, W., Dyer, C., Black, A., Trancoso, I.: Paraphrasing 4 microblog normalization. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 73–84 (2013)
Martinez-Camara, E., Martin-Valdivia, M., Urena Lopez, L., Montejo-Raez, A.: Sentiment analysis in Twitter. Natural Language Engineering, 1–28 (2012)
Mohammad, S., Kiritchenko, S., Zhu, X.: NRC-canada: building the state-of-the-art in sentiment analysis of tweets. In: Proceedings of the Seventh International Workshop on Semantic Evaluation Exercises (2013)
Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.: Improved part-of-speech tagging for online conversational text with word clusters. In: Proceedings of the Conference of the North American Chapter of the Association of Computational Linguistics, pp. 380–390 (2013)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1–2), 1–135 (2008)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Association for Computational Linguistics Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86 (2002)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)
Stolerman, A., Overdorf, R., Afroz, S., Greenstadt, R.: Breaking the closed-world assumption in stylometric authorship attribution. In: Peterson, G., Shenoi, S. (eds.) DigitalForensics 2014. IAICT, vol. 433, pp. 185–205. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44952-3_13
Suttles, J., Ide, N.: Distant supervision for emotion classification with discrete binary values. Proceedings of the Fourteenth International Conference on Computational Linguistics and Intelligent Text Processing 2, 121–136 (2013)
Turney, P.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the Fortieth Annual Meeting of the Association for Computational Linguistics, pp. 417–424 (2002)
Venkata Subramaniam, L., Roy, S., Faruquie, T., Negi, S.: A survey of types of text noise and techniques to handle noisy text. In: Proceedings of the Third Workshop on Analytics for Noisy Unstructured Text Data, pp. 115–122 (2009)
Wang, H., Can, D., Kazemzadeh, A., Bar, F., Narayanan, S.: A system for real-time Twitter sentiment analysis of the 2012 U.S. presidential election cycle. In: Proceedings of the Association for Computational Linguistics 2012 System Demonstrations, pp. 115–120 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 IFIP International Federation for Information Processing
About this paper
Cite this paper
Aboluwarin, O., Andriotis, P., Takasu, A., Tryfonas, T. (2016). Optimizing Short Message Text Sentiment Analysis for Mobile Device Forensics. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics XII. DigitalForensics 2016. IFIP Advances in Information and Communication Technology, vol 484. Springer, Cham. https://doi.org/10.1007/978-3-319-46279-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-46279-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46278-3
Online ISBN: 978-3-319-46279-0
eBook Packages: Computer ScienceComputer Science (R0)