Detecting Suspicious Discussion on Online Forums Using Data Mining

ur Rasheed, Haroon; Khan, Farhan Hassan; Bashir, Saba; Fatima, Irsa

doi:10.1007/978-981-13-6052-7_23

Haroon ur Rasheed¹¹,
Farhan Hassan Khan¹²,
Saba Bashir^11,12 &
…
Irsa Fatima¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 932))

Included in the following conference series:

International Conference on Intelligent Technologies and Applications

1549 Accesses
1 Citations

Abstract

As we know people are using lot of social business and many other platforms for different purposes by using internet. Huge amount of data is transferred over networks. Internet has made communication and business online very easy and fast. People are using internet world wide for different purposes. Where internet technology is used for positive purposes same as it is also used for negative or illegal activities. These platforms are also used for lot of illegal activities like terrorism, threads, violation of copyrights, phishing scams, frauds and spams etc. The law enforcement agencies and departments are trying to overcome these problems by using different techniques. This paper includes some tools and techniques to detect these illegal activities on online forums by identifying suspicious discussions, words, users and groups. Stop word, Stemming Algorithm, Suffix & Affix Stemmers, Emotional Algorithms, Levenshtein algorithm, Classification, Brute Force Algorithms and some statistical formulas are discussed in this paper to detect suspicious activities on online forums.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Murugesan, M.S., Devi, R.P., Deepthi, S., Lavanya, V.S., Princy, A.: Automated monitoring suspicious discussions on online forums using data mining statistical corpus based approach. Imp. J. Interdiscip. Res. 2(5) (2016)
Google Scholar
Upganlawar, H., Sambhe, N.: Surveillance of suspicious discussions on online forums using text data mining. Int. J. Adv. Electron. Comput. Sci. 4(4) (2017)
Google Scholar
Alami, S., Beqqali, O.E.: Detecting suspicious profiles using text analysis within social media. J. Theor. Appl. Inf. Technol. 73(3) (2015)
Google Scholar
Kaiser, C., Bodendorf, F.: Monitoring opinions in online forums-a case study from the sports industry. Int. J. Inf. Educ. Technol. 2(3), 212 (2012)
Google Scholar
Hosseinkhani, J., Koochakzaei, M., Keikhaee, S., Naniz, J.H.: Detecting suspicion information on the Web using crime data mining techniques. Int. J. Adv. Comput. Sci. Inf. Technol. 3(1), 32–41 (2014)
Google Scholar
Yao, Z., Ze-wen, C.: Research on the construction and filter method of stop-word list in text preprocessing. In: Proceedings of 2011 IEEE Intelligent Computation Technology and Automation (ICICTA), pp. 217–221, 11–13 (2011)
Google Scholar
Ayral, H., Yavuz, S.: An automated domain specific stop word generation method for natural language text classification. In: International Symposium on Proceedings of Innovations in Intelligent Systems and Applications (INISTA), pp. 500–503, 15–18 June 2011
Google Scholar
Silva, C., Ribeiro, B.: The importance of stop word removal on recall values in text categorization. In: 2003 Proceedings of the International Joint Conference on Neural Networks, vol. 3. IEEE (2003)
Google Scholar
Yu, S.: Stemming algorithm for text data and application to data mining. In: Proceedings of 2010 IEEE 5th International Conference on Computer Science & Education (ICCSE), pp. 507–510, 24–27 (2010)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar
O’Connor, B., Balasubramanyan, R., Routledge, B.R., Smith, N.A.: From tweets to polls: linking text sentiment to public opinion time series. In: Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media (2010)
Google Scholar
Ho, T.K.: Stop word location and identification for adaptive text recognition. In: Proceedings of 2000 IEEE International Journal on Document Analysis and Recognition, vol. 3, no. 1 (2000)
Google Scholar
Zeng, Z., Yang, H., Feng, T.: Data mining methods for knowledge discovery. In: Proceedings of 2011 IEEE International Conference on Data Mining Methods for Extraction of Data, pp. 412–415, 29–31 (2011)
Google Scholar
Yang, Y.: An evaluation of statistical approaches to text categorization. In: Proceedings of 1999 IEEE Journal on Information Retrieval, vol. 1, no. 1 (1999)
Google Scholar
Li, R., Wang, S., Deng, H., Wang, R., Chang, K.C.-C.: Towards social user profiling: unified and discriminative influence model for inferring home locations. In: KDD 2012, Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA (2012)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar
Marquiz, S.: Classificateur de Kolmogorov sur le web 7 Juin (2004)
Google Scholar
Levorato, V., Van Le, T., Lamure, M., Bui, M.: Distance de compression et classification prétopologique (2009)
Google Scholar
Kaufman L., Rousseeuw P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Interscience (1990)
Google Scholar
Dommers, M.: Calculating the normalized compression distance between two strings, 20 January 2009
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Google Scholar
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1993)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2009)
Google Scholar
Agrawal, R., Srikant, R.: Mining sequential motifs. In: 11th International Conference on Data Engineering (1995)
Google Scholar
Frank, R., Cheng, C., Pun, V.: Social media sites: new fora for criminal, communication, and investigation opportunities. Research and National Coordination Organized Crime Division Law Enforcement and Policy Branch Public Safety Canada (2011)
Google Scholar
Alderson, M.: Facebook: a useful tool for police? Connectedcops. 25 January 2011. Web, 3 February 2011
Google Scholar
Sentistrength - sentiment strength detection in short texts. http://sentistrength.wlv.ac.uk
Caren, N.: An Introduction to Text Analysis with Python. http://nealcaren.web.unc.edu/
Gokulakrishnan, B., Priyanthan, P., Ragavan, T., Prasath, N., Perera, A.: Opinion mining and sentiment analysis on a Twitter data stream. In: 2012 International Conference on Advances in ICT for Emerging Regions (ICTer), pp. 182–188 (2012)
Google Scholar
Recorded future: Creating an insightful world. https://www.recordedfuture.com/
Voices of the Mumbai terror siege: Police taped chilling phone conversations between suicide terrorists and their Pakistani handlers. http://transcripts.cnn.com/TRANSCRIPTS/0911/15/fzgps.01.html
The Hindu: Audio of 26/11 tape: Zabiuddin ansari briefs terrorists. http://www.thehindu.com/news/resources/article3568903.ecel
Black Friday: The shocking truth behind the 1993 Bombay blast film conversation subtitle. http://www.subtitles.net/en/ppodnapisi/podnapis/i/206775/black-friday-2004-subtitlesl
Jurafsky, D., Bethard, S.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Pearson Education Inc. (2009)
Google Scholar
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. 1005 Gravenstein Highway North. O Reilly Media, Inc. Sebastopol (2009)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Google Scholar
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Google Scholar
Gephi: Network analysis and visualization. https://gephi.org/
Kumar, A.S., Singh, S.: Detection of user cluster with suspicious activity in online social networking sites. In: 2013 2nd International Conference on Advanced Computing, Networking and Security (ADCONS), pp. 220–225. IEEE (2013)
Google Scholar
Bavane, A.B., Ambilwade Priyanka, V., Bachhav Mourvika, D., Dafal Sumit, N., Fulari Priyanka, Y.: Monitoring suspicious discussions on online forum by data mining
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Federal Urdu University of Arts, Science and Technology, Islamabad, Pakistan
Haroon ur Rasheed, Saba Bashir & Irsa Fatima
Knowledge and Data Science Research Center, Department of Computer Science, College of E&ME, NUST, Islamabad, Pakistan
Farhan Hassan Khan & Saba Bashir

Authors

Haroon ur Rasheed
View author publications
You can also search for this author in PubMed Google Scholar
Farhan Hassan Khan
View author publications
You can also search for this author in PubMed Google Scholar
Saba Bashir
View author publications
You can also search for this author in PubMed Google Scholar
Irsa Fatima
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haroon ur Rasheed .

Editor information

Editors and Affiliations

Department of Computer Science and IT, Islamia University of Bahawalpur, Baghdad, Pakistan
Imran Sarwar Bajwa
Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK
Fairouz Kamareddine
Department of Computer Engineering and Digital Systems, University of Sao Paulo, São Paulo, Brazil
Anna Costa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

ur Rasheed, H., Khan, F.H., Bashir, S., Fatima, I. (2019). Detecting Suspicious Discussion on Online Forums Using Data Mining. In: Bajwa, I., Kamareddine, F., Costa, A. (eds) Intelligent Technologies and Applications. INTAP 2018. Communications in Computer and Information Science, vol 932. Springer, Singapore. https://doi.org/10.1007/978-981-13-6052-7_23

Download citation

DOI: https://doi.org/10.1007/978-981-13-6052-7_23
Published: 12 March 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6051-0
Online ISBN: 978-981-13-6052-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics