Skip to main content

Optimizing Short Message Text Sentiment Analysis for Mobile Device Forensics

Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT,volume 484)

Abstract

Mobile devices are now the dominant medium for communications. Humans express various emotions when communicating with others and these communications can be analyzed to deduce their emotional inclinations. Natural language processing techniques have been used to analyze sentiment in text. However, most research involving sentiment analysis in the short message domain (SMS and Twitter) do not account for the presence of non-dictionary words. This chapter investigates the problem of sentiment analysis in short messages and the analysis of emotional swings of an individual over time. This provides an additional layer of information for forensic analysts when investigating suspects. The maximum entropy algorithm is used to classify short messages as positive, negative or neutral. Non-dictionary words are normalized and the impact of normalization and other features on classification is evaluated; in fact, this approach enhances the classification F-score compared with previous work. A forensic tool with an intuitive user interface has been developed to support the extraction and visualization of sentiment information pertaining to persons of interest. In particular, the tool presents an improved approach for identifying mood swings based on short messages sent by subjects. The timeline view provided by the tool helps pinpoint periods of emotional instability that may require further investigation. Additionally, the Apache Solr system used for indexing ensures that a forensic analyst can retrieve the desired information rapidly and efficiently using faceted search queries.

Keywords

  • Sentiment analysis
  • Text mining
  • SMS
  • Twitter
  • Normalization

References

  1. Andriotis, P., Oikonomou, G.: Messaging activity reconstruction with sentiment polarity identification. In: Tryfonas, T., Askoxylakis, I. (eds.) HAS 2015. LNCS, vol. 9190, pp. 475–486. Springer, Heidelberg (2015). doi:10.1007/978-3-319-20376-8_42

    CrossRef  Google Scholar 

  2. Andriotis, P., Oikonomou, G., Tryfonas, T.: Forensic analysis of wireless networking evidence of android smartphones. In: Proceedings of the IEEE International Workshop on Information Forensics and Security, pp. 109–114 (2012)

    Google Scholar 

  3. Andriotis, P., Takasu, A., Tryfonas, T.: Smartphone message sentiment analysis. In: Peterson, G., Shenoi, S. (eds.) DigitalForensics 2014. IAICT, vol. 433, pp. 253–265. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44952-3_17

    Google Scholar 

  4. Beebe, N., Clark, J.: Digital forensic text string searching: Improving information retrieval effectiveness by thematically clustering search results. Digital Investigation 4(S), 49–54 (2007)

    CrossRef  Google Scholar 

  5. Bird, S.: NLTK: the natural Language Toolkit. In: Proceedings of the Association for Computational Linguistics Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, vol. 1, pp. 63–70 (2002)

    Google Scholar 

  6. Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intelligent Systems 28(2), 15–21 (2013)

    CrossRef  Google Scholar 

  7. Chen, T., Kan, M.: Creating a live, public short message service corpus: The NUS SMS corpus. Language Resources and Evaluation 47(2), 299–335 (2013)

    Google Scholar 

  8. Cheng, N., Chandramouli, R., Subbalakshmi, K.: Author gender identification from text. Digital Investigation 8(1), 78–88 (2011)

    CrossRef  Google Scholar 

  9. de Vel, O., Anderson, A., Corney, M., Mohay, G.: Mining e-mail content for author identification forensics. ACM Sigmod Record 30(4), 55–64 (2001)

    CrossRef  Google Scholar 

  10. Ding, X., Liu, B., Yu, P.: A holistic lexicon-based approach to opinion mining. In: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 231–240 (2008)

    Google Scholar 

  11. Gamon, M.: Sentiment classification on customer feedback data: noisy data, large feature vectors and the role of linguistic analysis. In: Proceedings of the Twentieth International Conference on Computational Linguistics, pp. 841–847 (2004)

    Google Scholar 

  12. Go, A., Bhayani, R., Huang, L.: Twitter Sentiment Classification using Distant Supervision, CS224N Final Project Report, Department of Computer Science, Stanford University, Stanford, California (2009)

    Google Scholar 

  13. Han, B., Cook, P., Baldwin, T.: Lexical normalization for social media text. ACM Transactions on Intelligent Systems and Technology 4(1), article no. 5 (2013)

    Google Scholar 

  14. Iqbal, F., Binsalleeh, H., Fung, B., Debbabi, M.: Mining writeprints from anonymous e-mails for forensic investigation. Digital Investigation 7(1–2), 56–64 (2010)

    CrossRef  Google Scholar 

  15. Jin, X., Li, Y., Mah, T., Tong, J.: Sensitive webpage classification for content advertising. In: Proceedings of the First International Workshop on Data Mining and Audience Intelligence for Advertising, pp. 28–33 (2007)

    Google Scholar 

  16. Juola, P.: Authorship attribution. Foundations and Trends in Information Retrieval 1(3), 233–334 (2006)

    CrossRef  Google Scholar 

  17. Kobus, C., Yvon, F., Damnati, G.: Normalizing SMS: Are two metaphors better than one? Proceedings of the Twenty-Second International Conference on Computational Linguistics 1, 441–448 (2008)

    Google Scholar 

  18. Ling, W., Dyer, C., Black, A., Trancoso, I.: Paraphrasing 4 microblog normalization. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 73–84 (2013)

    Google Scholar 

  19. Martinez-Camara, E., Martin-Valdivia, M., Urena Lopez, L., Montejo-Raez, A.: Sentiment analysis in Twitter. Natural Language Engineering, 1–28 (2012)

    Google Scholar 

  20. Mohammad, S., Kiritchenko, S., Zhu, X.: NRC-canada: building the state-of-the-art in sentiment analysis of tweets. In: Proceedings of the Seventh International Workshop on Semantic Evaluation Exercises (2013)

    Google Scholar 

  21. Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.: Improved part-of-speech tagging for online conversational text with word clusters. In: Proceedings of the Conference of the North American Chapter of the Association of Computational Linguistics, pp. 380–390 (2013)

    Google Scholar 

  22. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1–2), 1–135 (2008)

    CrossRef  Google Scholar 

  23. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Association for Computational Linguistics Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86 (2002)

    Google Scholar 

  24. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  25. Stolerman, A., Overdorf, R., Afroz, S., Greenstadt, R.: Breaking the closed-world assumption in stylometric authorship attribution. In: Peterson, G., Shenoi, S. (eds.) DigitalForensics 2014. IAICT, vol. 433, pp. 185–205. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44952-3_13

    Google Scholar 

  26. Suttles, J., Ide, N.: Distant supervision for emotion classification with discrete binary values. Proceedings of the Fourteenth International Conference on Computational Linguistics and Intelligent Text Processing 2, 121–136 (2013)

    CrossRef  Google Scholar 

  27. Turney, P.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the Fortieth Annual Meeting of the Association for Computational Linguistics, pp. 417–424 (2002)

    Google Scholar 

  28. Venkata Subramaniam, L., Roy, S., Faruquie, T., Negi, S.: A survey of types of text noise and techniques to handle noisy text. In: Proceedings of the Third Workshop on Analytics for Noisy Unstructured Text Data, pp. 115–122 (2009)

    Google Scholar 

  29. Wang, H., Can, D., Kazemzadeh, A., Bar, F., Narayanan, S.: A system for real-time Twitter sentiment analysis of the 2012 U.S. presidential election cycle. In: Proceedings of the Association for Computational Linguistics 2012 System Demonstrations, pp. 115–120 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panagiotis Andriotis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 IFIP International Federation for Information Processing

About this paper

Cite this paper

Aboluwarin, O., Andriotis, P., Takasu, A., Tryfonas, T. (2016). Optimizing Short Message Text Sentiment Analysis for Mobile Device Forensics. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics XII. DigitalForensics 2016. IFIP Advances in Information and Communication Technology, vol 484. Springer, Cham. https://doi.org/10.1007/978-3-319-46279-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46279-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46278-3

  • Online ISBN: 978-3-319-46279-0

  • eBook Packages: Computer ScienceComputer Science (R0)