Skip to main content

Detecting Suspicious Discussion on Online Forums Using Data Mining

  • Conference paper
  • First Online:
Intelligent Technologies and Applications (INTAP 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 932))

Included in the following conference series:

Abstract

As we know people are using lot of social business and many other platforms for different purposes by using internet. Huge amount of data is transferred over networks. Internet has made communication and business online very easy and fast. People are using internet world wide for different purposes. Where internet technology is used for positive purposes same as it is also used for negative or illegal activities. These platforms are also used for lot of illegal activities like terrorism, threads, violation of copyrights, phishing scams, frauds and spams etc. The law enforcement agencies and departments are trying to overcome these problems by using different techniques. This paper includes some tools and techniques to detect these illegal activities on online forums by identifying suspicious discussions, words, users and groups. Stop word, Stemming Algorithm, Suffix & Affix Stemmers, Emotional Algorithms, Levenshtein algorithm, Classification, Brute Force Algorithms and some statistical formulas are discussed in this paper to detect suspicious activities on online forums.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Murugesan, M.S., Devi, R.P., Deepthi, S., Lavanya, V.S., Princy, A.: Automated monitoring suspicious discussions on online forums using data mining statistical corpus based approach. Imp. J. Interdiscip. Res. 2(5) (2016)

    Google Scholar 

  2. Upganlawar, H., Sambhe, N.: Surveillance of suspicious discussions on online forums using text data mining. Int. J. Adv. Electron. Comput. Sci. 4(4) (2017)

    Google Scholar 

  3. Alami, S., Beqqali, O.E.: Detecting suspicious profiles using text analysis within social media. J. Theor. Appl. Inf. Technol. 73(3) (2015)

    Google Scholar 

  4. Kaiser, C., Bodendorf, F.: Monitoring opinions in online forums-a case study from the sports industry. Int. J. Inf. Educ. Technol. 2(3), 212 (2012)

    Google Scholar 

  5. Hosseinkhani, J., Koochakzaei, M., Keikhaee, S., Naniz, J.H.: Detecting suspicion information on the Web using crime data mining techniques. Int. J. Adv. Comput. Sci. Inf. Technol. 3(1), 32–41 (2014)

    Google Scholar 

  6. Yao, Z., Ze-wen, C.: Research on the construction and filter method of stop-word list in text preprocessing. In: Proceedings of 2011 IEEE Intelligent Computation Technology and Automation (ICICTA), pp. 217–221, 11–13 (2011)

    Google Scholar 

  7. Ayral, H., Yavuz, S.: An automated domain specific stop word generation method for natural language text classification. In: International Symposium on Proceedings of Innovations in Intelligent Systems and Applications (INISTA), pp. 500–503, 15–18 June 2011

    Google Scholar 

  8. Silva, C., Ribeiro, B.: The importance of stop word removal on recall values in text categorization. In: 2003 Proceedings of the International Joint Conference on Neural Networks, vol. 3. IEEE (2003)

    Google Scholar 

  9. Yu, S.: Stemming algorithm for text data and application to data mining. In: Proceedings of 2010 IEEE 5th International Conference on Computer Science & Education (ICCSE), pp. 507–510, 24–27 (2010)

    Google Scholar 

  10. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  11. O’Connor, B., Balasubramanyan, R., Routledge, B.R., Smith, N.A.: From tweets to polls: linking text sentiment to public opinion time series. In: Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media (2010)

    Google Scholar 

  12. Ho, T.K.: Stop word location and identification for adaptive text recognition. In: Proceedings of 2000 IEEE International Journal on Document Analysis and Recognition, vol. 3, no. 1 (2000)

    Google Scholar 

  13. Zeng, Z., Yang, H., Feng, T.: Data mining methods for knowledge discovery. In: Proceedings of 2011 IEEE International Conference on Data Mining Methods for Extraction of Data, pp. 412–415, 29–31 (2011)

    Google Scholar 

  14. Yang, Y.: An evaluation of statistical approaches to text categorization. In: Proceedings of 1999 IEEE Journal on Information Retrieval, vol. 1, no. 1 (1999)

    Google Scholar 

  15. Li, R., Wang, S., Deng, H., Wang, R., Chang, K.C.-C.: Towards social user profiling: unified and discriminative influence model for inferring home locations. In: KDD 2012, Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA (2012)

    Google Scholar 

  16. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  17. Marquiz, S.: Classificateur de Kolmogorov sur le web 7 Juin (2004)

    Google Scholar 

  18. Levorato, V., Van Le, T., Lamure, M., Bui, M.: Distance de compression et classification prétopologique (2009)

    Google Scholar 

  19. Kaufman L., Rousseeuw P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Interscience (1990)

    Google Scholar 

  20. Dommers, M.: Calculating the normalized compression distance between two strings, 20 January 2009

    Google Scholar 

  21. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Google Scholar 

  22. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1993)

    Google Scholar 

  23. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2009)

    Google Scholar 

  24. Agrawal, R., Srikant, R.: Mining sequential motifs. In: 11th International Conference on Data Engineering (1995)

    Google Scholar 

  25. Frank, R., Cheng, C., Pun, V.: Social media sites: new fora for criminal, communication, and investigation opportunities. Research and National Coordination Organized Crime Division Law Enforcement and Policy Branch Public Safety Canada (2011)

    Google Scholar 

  26. Alderson, M.: Facebook: a useful tool for police? Connectedcops. 25 January 2011. Web, 3 February 2011

    Google Scholar 

  27. Sentistrength - sentiment strength detection in short texts. http://sentistrength.wlv.ac.uk

  28. Caren, N.: An Introduction to Text Analysis with Python. http://nealcaren.web.unc.edu/

  29. Gokulakrishnan, B., Priyanthan, P., Ragavan, T., Prasath, N., Perera, A.: Opinion mining and sentiment analysis on a Twitter data stream. In: 2012 International Conference on Advances in ICT for Emerging Regions (ICTer), pp. 182–188 (2012)

    Google Scholar 

  30. Recorded future: Creating an insightful world. https://www.recordedfuture.com/

  31. Voices of the Mumbai terror siege: Police taped chilling phone conversations between suicide terrorists and their Pakistani handlers. http://transcripts.cnn.com/TRANSCRIPTS/0911/15/fzgps.01.html

  32. The Hindu: Audio of 26/11 tape: Zabiuddin ansari briefs terrorists. http://www.thehindu.com/news/resources/article3568903.ecel

  33. Black Friday: The shocking truth behind the 1993 Bombay blast film conversation subtitle. http://www.subtitles.net/en/ppodnapisi/podnapis/i/206775/black-friday-2004-subtitlesl

  34. Jurafsky, D., Bethard, S.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Pearson Education Inc. (2009)

    Google Scholar 

  35. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. 1005 Gravenstein Highway North. O Reilly Media, Inc. Sebastopol (2009)

    Google Scholar 

  36. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)

    Google Scholar 

  37. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Google Scholar 

  38. Gephi: Network analysis and visualization. https://gephi.org/

  39. Kumar, A.S., Singh, S.: Detection of user cluster with suspicious activity in online social networking sites. In: 2013 2nd International Conference on Advanced Computing, Networking and Security (ADCONS), pp. 220–225. IEEE (2013)

    Google Scholar 

  40. Bavane, A.B., Ambilwade Priyanka, V., Bachhav Mourvika, D., Dafal Sumit, N., Fulari Priyanka, Y.: Monitoring suspicious discussions on online forum by data mining

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haroon ur Rasheed .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

ur Rasheed, H., Khan, F.H., Bashir, S., Fatima, I. (2019). Detecting Suspicious Discussion on Online Forums Using Data Mining. In: Bajwa, I., Kamareddine, F., Costa, A. (eds) Intelligent Technologies and Applications. INTAP 2018. Communications in Computer and Information Science, vol 932. Springer, Singapore. https://doi.org/10.1007/978-981-13-6052-7_23

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-6052-7_23

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-6051-0

  • Online ISBN: 978-981-13-6052-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics