Filtering of Mobile Short Messaging Service Communication Using Latent Dirichlet Allocation with Social Network Analysis

  • Abiodun Modupe
  • Oludayo O. Olugbara
  • Sunday O. Ojo
Conference paper


In this study, we introduce Latent Dirichlet Allocation (LDA) with Social Network Analysis (SNA) to extract and evaluate latent features arising from mobile Short Messaging Services (SMSs) communication. This would help to automatically filter unsolicited SMS messages in order to proactively prevent their delivery. In addition, content-based filters may have their performance seriously jeopardized, because SMS messages are fairly short and their meanings are generally rife with idioms, onomatopoeias, homophones, phonemes and acronyms. As a result, the problem of text-mining was explored to understand the linguistic or statistical properties of mobile SMS messages in order to improve the performance of filtering applications. Experiments were successfully performed by collecting time-stamped short messages via mobile phones across a number of different categories on the Internet, using an English language-based platform, which is available on streaming APIs. The derived filtering system can in the future contribute in optimal decision-making, for instance, in a scenario where an imposter attempts to illegally gain confidential information from a subscriber or an operator by sending SMS messages.


Dirichlet Filtering Message Mining Mobile Network Topic 


  1. 1.
    K.Y. Kamath, J. Caverlee, Expert-driven topical classification of short message streams. Paper presented at the privacy, security, risk and trust (passat), 2011 IEEE third international conference on and 2011 IEEE third international conference on social computing (socialcom), 2011Google Scholar
  2. 2.
    T. Chen, M.-Y. Kan, Creating a live, public short message service corpus: the NUS SMS corpus. Lang. Res. Eval. 47, 1–37 (2013)CrossRefGoogle Scholar
  3. 3.
    S.J. Delany, M. Buckley, D. Greene, Review: SMS spam filtering: methods and data. Expert Syst. Appl. 39(10), 9899–9908 (2012). doi: 10.1016/j.eswa.2012.02.053 CrossRefGoogle Scholar
  4. 4.
    Page, M., Molina, M., & Gordon, J., The mobile economy (2013), Accessed 15 Nov 2013
  5. 5.
    International Telecommunication Union, The World in 2011: ICT Facts and Figures (ITU, 2011)Google Scholar
  6. 6.
    I. Fette, N. Sadeh, A. Tomasic, Learning to detect phishing emails. Paper presented at the proceedings of the 16th international conference on world wide web, 2007Google Scholar
  7. 7.
    Y. Cha, J. Cho, Social-network analysis using topic models. Paper presented at the proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval, 2012Google Scholar
  8. 8.
    G. Inches, F. Crestani, Online conversation mining for author characterization and topic identification. Paper presented at the proceedings of the 4th workshop on workshop for Ph.D. students in information and knowledge management, 2011Google Scholar
  9. 9.
    A. Aizawa, An information-theoretic perspective of tf–idf measures. Inf. Process. Manage. 39(1), 45–65 (2003)CrossRefMATHMathSciNetGoogle Scholar
  10. 10.
    L. Yang, F. Liu, J.M. Kizza, R.K. Ege, Discovering topics from dark websites. Paper presented at the IEEE symposium on computational intelligence in cyber security, 2009 (CICS’09)Google Scholar
  11. 11.
    Y. Zhou, E. Reid, J. Qin, H. Chen, G. Lai, US domestic extremist groups on the web: link and content analysis. IEEE Intell. Syst. 20(5), 44–51 (2005)CrossRefGoogle Scholar
  12. 12.
    E. Reid, J. Qin, Y. Zhou, G. Lai, M. Sageman, G. Weimann, H. Chen, Collecting and analyzing the presence of terrorists on the web: a case study of Jihad websites. Intelligence and security informatics (Springer, 2005), pp. 402–411Google Scholar
  13. 13.
    R.B. Bradford, Application of latent semantic indexing in generating graphs of terrorist networks. Intelligence and security informatics (Springer, 2006), pp. 674–675Google Scholar
  14. 14.
    D. Patel, M. Bhatnagar, Mobile SMS classification. Int. J. Soft Comput. Eng. (IJSCE) (2011). ISSN:2231-2307Google Scholar
  15. 15.
    A. Abbasi, H. Chen, Applying authorship analysis to extremist-group web forum messages. IEEE Intell. Syst. 20(5), 67–75 (2005)CrossRefGoogle Scholar
  16. 16.
    D.M. Blei, J. Lafferty, Topic Models (illustrated ed. vol. 10). (Taylor & Francis, London, England, 2009)Google Scholar
  17. 17.
    C. Kobus, F. Yvon, G. Damnati, Normalizing SMS: are two metaphors better than one? Paper presented at the proceedings of the 22nd international conference on computational linguistics, vol. 1, 2008Google Scholar
  18. 18.
    A. Modupe, O.O. Olugbara, S.O. Ojo, in Comparing Supervised Learning Classifiers to Detect Advanced Fee Fraud Activities on Internet. Advances in Computer Science and Information Technology. Computer Science and Information Technology (Springer, 2012), pp. 87–100Google Scholar
  19. 19.
    C.-C. Lai, An empirical study of three machine learning methods for spam filtering. Knowl.-Based Syst. 20(3), 249–254 (2007)CrossRefGoogle Scholar
  20. 20.
    T. Hofmann, Probabilistic latent semantic indexing. Paper presented at the proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, 1999Google Scholar
  21. 21.
    D.M. Blei, A.Y. Ng, M.I. Jordan, Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATHGoogle Scholar
  22. 22.
    D.M. Blei, J.D. McAuliffe, Supervised topic models. arXiv preprint arXiv:1003.0783, 2010Google Scholar
  23. 23.
    B. Walsh, Markov Chain Monte Carlo and Gibbs Sampling, Lecture Notes for EEB 581, University of Arizona (2004), Accessed Oct 13 2013
  24. 24.
    M.E. Newman, Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103(23), 8577–8582 (2006)CrossRefGoogle Scholar
  25. 25.
    L. Šubelj, M. Bajec, Robust network community detection using balanced propagation. Eur. Phys. J. B 81(3), 353–362 (2011)CrossRefGoogle Scholar
  26. 26.
    L. Tang, H. Liu, Graph mining applications to social network analysis, in Managing and Mining Graph Data (Springer, 2010), pp. 487–513Google Scholar
  27. 27.
    A. Modupe, O.O. Olugbara, S.O. Ojo, Investigating topic models for mobile short messaging service communication filtering. Lecture notes in engineering and computer science: Proceedings of The World Congress on Engineering, WCE 2013, 3 July–5 July, 2013, London, U.K., pp. 1197–1199Google Scholar
  28. 28.
    A. Bifet, E. Frank, in Discovery Science. Sentiment knowledge discovery in twitter streaming data. (Springer, Berlin, 2010), pp. 1–15Google Scholar
  29. 29.
    P. Willett, The Porter stemming algorithm: then and now. Program: Electron. Libr. Inf. Syst. 40(3), 219–223 (2006)CrossRefMathSciNetGoogle Scholar
  30. 30.
    M. Bastian, S. Heymann, M. Jacomy, Gephi: an open source software for exploring and manipulating networks. In ICWSM, May 2009Google Scholar
  31. 31.
    V.D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 2008(10), 10008 (2008)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  • Abiodun Modupe
    • 1
  • Oludayo O. Olugbara
    • 2
  • Sunday O. Ojo
    • 3
  1. 1.College of Science, Engineering and TechnologySchool of Computing, JohannesburgFloridaSouth Africa
  2. 2.Department of Information TechnologyDurban University of TechnologyDurbanSouth Africa
  3. 3.Faculty of Information and Communication TechnologyTshwane University of TechnologyPretoriaSouth Africa

Personalised recommendations