Web Scams Detection System

Badawi, Emad; Jourdan, Guy-Vincent; Onut, Iosif-Viorel

doi:10.1007/978-3-031-57537-2_11

Emad Badawi^13,14,
Guy-Vincent Jourdan^13,14 &
Iosif-Viorel Onut^13,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14551))

Included in the following conference series:

International Symposium on Foundations and Practice of Security

58 Accesses

Abstract

Web-based scams rely on scam websites to provide fraudulent business or fake services to steal money and sensitive information from unsuspecting victims. Despite many researchers’ efforts to develop anti-scam detection techniques, their main focus has been on understanding, detecting, and analyzing scam sites. State-of-the-art anti-scam research still faces several challenges, such as acquiring a properly labeled scam dataset, especially when there is no blacklist, central repository, or previous large-scale analysis. The researchers have created labeled datasets in different ways, such as manually collecting and labeling the dataset or using a semi-automatic crawler followed by manual inspection. However, this process requires previous knowledge and understanding of the scam and much manual work.

In this paper, we propose a data-driven model to create a labeled training dataset for web-based scams that have a web presence. Given a small scam sample, our model formulates scam-related search queries and uses them on multiple search engines to search for, and collect, potential scam pages. After collecting a sufficiently large corpus of web pages, our model semi-automatically clusters the search results and creates a labeled training dataset with minimal human interaction. We have validated our model using two different scam types that we have studied in our previous work. We tested our classifiers against the databases of web pages we collected during our previous analysis of the scams and successfully detected more than 87% of the scam pages while maintaining a false positive value as low as 0.23%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.scamwatch.gov.au/scam-statistics.
2.
We describe applying the model to create and validate the training datasets for identifying BGS and GHS scams in a separate report hosted with our dataset https://bit.ly/DatasetPaper.
3.
https://trends.google.com/trends/?geo=US.
4.
https://urlscan.io/.
5.
https://website.informer.com/.
6.
https://www.cutestat.com/.
7.
https://web.archive.org/.
8.
https://www.alexa.com/.
9.
http://chromedriver.chromium.org/.
10.
https://selenium-python.readthedocs.io.
11.
https://pypi.org/project/beautifulsoup4/.
12.
We attached the dataset creation and validation process on a separate report hosted with our dataset https://bit.ly/DatasetPaperReport.
13.
We collected the domains by crawling cutestat.com search engine and a blacklist maintained by Bitcoin.fr.
14.
In our analysis, for both the manual and automated approaches, we did not include any automated process, such as crawling time. We only included the time we spent manually searching, inspecting, and labeling the pages.
15.
https://websitesetup.org/news/internet-facts-stats/, accessed in 2022.

References

Abhishta, A., Joosten, R., Dragomiretskiy, S., Nieuwenhuis, L.J.: Impact of successful ddos attacks on a major crypto-currency exchange. In: 2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 379–384. IEEE (2019)
Google Scholar
Afandi, N.A., Hamid, I.R.A.: Covid-19 phishing detection based on hyperlink using k-nearest neighbor (knn) algorithm. Appli. Inform. Technol. Comput. Sci. 2(2), 287–301 (2021)
Google Scholar
Alarab, I., Prakoonwit, S., Nacer, M.I.: Comparative analysis using supervised learning methods for anti-money laundering in bitcoin. In: Proceedings of the 2020 5th International Conference on Machine Learning Technologies, pp. 11–17 (2020)
Google Scholar
Alarab, I., Prakoonwit, S., Nacer, M.I.: Competence of graph convolutional networks for anti-money laundering in bitcoin blockchain. In: Proceedings of the 2020 5th International Conference on Machine Learning Technologies, pp. 23–27 (2020)
Google Scholar
ARSLAN, A.: On the usefulness of html meta elements for web retrieval. Eskişehir Tech. Univ. . Sci. Technol. A-Appl. Sci. Eng. 21(1), 182–198 (2020)
Google Scholar
Badawi, E., Jourdan, G.V., Bochmann, G., Onut, I.V.: An automatic detection and analysis of the bitcoin generator scam. In: 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS &PW), pp. 407–416. IEEE Computer Society, Los Alamitos, CA, USA (sep 2020)
Google Scholar
Badawi, E., Jourdan, G.V., Bochmann, G., Onut, I.V.: Automatic detection and analysis of the “Game Hack” Scam. J. Web Eng. 18(8) (2020)
Google Scholar
Badawi, E., Jourdan, G.-V., Bochmann, G., Onut, I.-V., Flood, J.: The “Game Hack’’ scam. In: Bakaev, M., Frasincar, F., Ko, I.-Y. (eds.) ICWE 2019. LNCS, vol. 11496, pp. 280–295. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-19274-7_21
Chapter Google Scholar
Bartoletti, M., Carta, S., Cimoli, T., Saia, R.: Dissecting ponzi schemes on ethereum: identification, analysis, and impact. Futur. Gener. Comput. Syst. 102, 259–277 (2020)
Article Google Scholar
Bartoletti, M., Lande, S., Loddo, A., Pompianu, L., Serusi, S.: Cryptocurrency scams: analysis and perspectives. IEEE Access 9, 148353–148373 (2021)
Article Google Scholar
Bartoletti, M., Pes, B., Serusi, S.: Data mining for detecting bitcoin ponzi schemes. In: 2018 Crypto Valley Conference on Blockchain Technology (CVCBT), pp. 75–84. IEEE (2018)
Google Scholar
Bidgoli, M., Grossklags, J.: ”hello. this is the irs calling.”: a case study on scams, extortion, impersonation, and phone spoofing. In: Electronic Crime Research (eCrime), 2017 APWG Symposium on, pp. 57–69. IEEE (2017)
Google Scholar
Bistarelli, S., Parroccini, M., Santini, F.: Visualizing bitcoin flows of ransomware: Wannacry one week later, In: ITASEC (2018)
Google Scholar
Bouma-Sims, E., Reaves, B.: A first look at scams on youtube. arXiv preprint arXiv:2104.06515 (2021)
Buchanan, T., Whitty, M.T.: The online dating romance scam: causes and consequences of victimhood. Psychol. Crime Law 20(3), 261–283 (2014)
Article Google Scholar
Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler: a fast filter for the large-scale detection of malicious web pages. In: Proceedings of the 20th international conference on World wide web, pp. 197–206 (2011)
Google Scholar
Charan, A.N.S., Chen, Y.H., Chen, J.L.: Phishing websites detection using machine learning with url analysis. In: 2022 IEEE World Conference on Applied Intelligence and Computing (AIC), pp. 808–812 (2022)
Google Scholar
Chen, W., Xu, Y., Zheng, Z., Zhou, Y., Yang, J.E., Bian, J.: Detecting" pump & dump schemes" on cryptocurrency market using an improved apriori algorithm. In: 2019 IEEE International Conference on Service-Oriented System Engineering (SOSE), pp. 293–2935. IEEE (2019)
Google Scholar
Chen, W., Zheng, Z., Cui, J., Ngai, E., Zheng, P., Zhou, Y.: Detecting ponzi schemes on ethereum: towards healthier blockchain technology. In: Proceedings of the 2018 World Wide Web Conference, pp. 1409–1418 (2018)
Google Scholar
Chen, W., Zheng, Z., Ngai, E.C.H., Zheng, P., Zhou, Y.: Exploiting blockchain data to detect smart ponzi schemes on ethereum. IEEE Access 7, 37575–37586 (2019)
Article Google Scholar
Clark, J.W., McCoy, D.: There are no free ipads: an analysis of survey scams as a business. In: Presented as part of the 6th USENIX Workshop on Large-Scale Exploits and Emergent Threats. USENIX, Washington, D.C. (2013)
Google Scholar
Conti, M., Gangwal, A., Ruj, S.: On the economic significance of ransomware campaigns: a bitcoin transactions perspective. Comput. Sec. 79, 162–189 (2018)
Article Google Scholar
Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious javascript code. In: Proceedings of the 19th International Conference on World Wide Web, pp. 281–290 (2010)
Google Scholar
Crawford, J., Guan, Y.: Knowing your bitcoin customer: money laundering in the bitcoin economy. In: 2020 13th International Conference on Systematic Approaches to Digital Forensic Engineering (SADFE), pp. 38–45. IEEE (2020)
Google Scholar
Custers, B., Oerlemans, J.J., Pool, R.: Laundering the profits of ransomware: money laundering methods for vouchers and cryptocurrencies. Euro. J. Crime Criminal Law Criminal Justice 28(2), 121–152 (2020)
Article Google Scholar
Dashevskyi, S., Zhauniarovich, Y., Gadyatskaya, O., Pilgun, A., Ouhssain, H.: Dissecting android cryptocurrency miners. In: Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy, pp. 191–202 (2020)
Google Scholar
Farrugia, S., Ellul, J., Azzopardi, G.: Detection of illicit accounts over the ethereum blockchain. Expert Syst. Appl. 150, 113318 (2020)
Article Google Scholar
Gopal, R.D., Hojati, A., Patterson, R.A.: Analysis of third-party request structures to detect fraudulent websites. Decis. Support Syst. 154, 113698 (2022)
Article Google Scholar
Goyal, P.S., Kakkar, A., Vinod, G., Joseph, G.: Crypto-ransomware detection using behavioural analysis. In: Varde, P.V., Prakash, R.V., Vinod, G. (eds.) Reliability, Safety and Hazard Assessment for Risk-Based Technologies. LNME, pp. 239–251. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-9008-1_20
Chapter Google Scholar
Harley, D., Grooten, M., Burn, S., Johnston, C.: My pc has 32,539 errors: how telephone support scams really work. Virus Bulletin (2012)
Google Scholar
Hong, G., et al.: Analyzing ground-truth data of mobile gambling scams. In: 2022 IEEE Symposium on Security and Privacy (SP), pp. 2176–2193. IEEE (2022)
Google Scholar
Invernizzi, L., Comparetti, P.M., Benvenuti, S., Kruegel, C., Cova, M., Vigna, G.: Evilseed: a guided approach to finding malicious web pages. In: 2012 IEEE symposium on Security and Privacy, pp. 428–442. IEEE (2012)
Google Scholar
Jung, E., Le Tilly, M., Gehani, A., Ge, Y.: Data mining-based ethereum fraud detection. In: 2019 IEEE International Conference on Blockchain (Blockchain), pp. 266–273. IEEE (2019)
Google Scholar
Kamps, J., Kleinberg, B.: To the moon: defining and detecting cryptocurrency pump-and-dumps. Crime Sci. 7(1), 18 (2018)
Article Google Scholar
Karhade, A., Yogi, A., Gupta, A., Landge, P., Galphade, M.: CNN for detection of COVID-19 using chest x-ray images. In: Verma, P., Charan, C., Fernando, X., Ganesan, S. (eds.) Advances in Data Computing, Communication and Security. LNDECT, vol. 106, pp. 251–259. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-8403-6_22
Chapter Google Scholar
Kharraz, A., et al.: Outguard: detecting in-browser covert cryptocurrency mining in the wild. In: The World Wide Web Conference, pp. 840–852 (2019)
Google Scholar
Kharraz, A., Robertson, W., Kirda, E.: Surveylance: automatically detecting online survey scams. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 70–86. IEEE (2018)
Google Scholar
Kikerpill, K., Siibak, A.: Mazephishing: the covid-19 pandemic as credible social context for social engineering attacks. Trames: J. Humanities Soc. Sci. 25(4), 371–393 (2021)
Google Scholar
Kumar, N., Singh, A., Handa, A., Shukla, S.K.: Detecting malicious accounts on the ethereum blockchain with supervised learning. In: Dolev, S., Kolesnikov, V., Lodha, S., Weiss, G. (eds.) CSCML 2020. LNCS, vol. 12161, pp. 94–109. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49785-9_7
Chapter Google Scholar
Liao, K., Zhao, Z., Doupé, A., Ahn, G.J.: Behind closed doors: measurement and analysis of cryptolocker ransoms in bitcoin. In: 2016 APWG eCrime, pp. 1–13. IEEE (2016)
Google Scholar
Miramirkhani, N., Starov, O., Nikiforakis, N.: Dial one for scam: a large-scale analysis of technical support scams. arXiv preprint arXiv:1607.06891 (2016)
Modic, D., Anderson, R.: It’s all over but the crying: the emotional and financial impact of internet fraud. IEEE Sec. Priv. 13(5), 99–103 (2015)
Article Google Scholar
Mohan, K.J., Poojitha, P.A., Reddy, V.A., Ajay, Y., Vardhan, T.H.: Prediction and analysis of crime rate for tourists by using data mining 13(2), 1–12 (2022)
Google Scholar
Moore, T., Clayton, R.: Evil Searching: compromise and recompromise of internet hosts for phishing. In: Dingledine, R., Golle, P. (eds.) FC 2009. LNCS, vol. 5628, pp. 256–272. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03549-4_16
Chapter Google Scholar
Musch, M., Wressnegger, C., Johns, M., Rieck, K.: Thieves in the browser: web-based cryptojacking in the wild. In: Proceedings of the 14th International Conference on Availability, Reliability and Security, pp. 1–10 (2019)
Google Scholar
Phillips, R., Wilder, H.: Tracing cryptocurrency scams: clustering replicated advance-fee and phishing websites. arXiv preprint arXiv:2005.14440 (2020)
Ravenelle, A.J., Janko, E., Kowalski, K.C.: Good jobs, scam jobs: detecting, normalizing, and internalizing online job scams during the covid-19 pandemic. New Media Soc. 24(7), 1591–1610 (2022)
Google Scholar
Razali, M.A., Mohd Shariff, S.: CMBlock: in-browser detection and prevention cryptojacking tool using blacklist and behavior-based detection method. In: Badioze Zaman, H., et al. (eds.) IVIC 2019. LNCS, vol. 11870, pp. 404–414. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34032-2_36
Chapter Google Scholar
Sadi, S.H., Pk, M.R.H., Zeki, A.M.: Threat detector for social media using text analysis. Inter. J. Perceptive Cognit. Comput. 7(1), 113–117 (2021)
Google Scholar
Sahin, M., Relieu, M., Francillon, A.: Using chatbots against voice spam: Analyzing lenny’s effectiveness. In: Thirteenth Symposium on Usable Privacy and Security (SOUPS 2017), pp. 319–337. USENIX Association, Santa Clara, CA (2017)
Google Scholar
Samarasinghe, N., Mannan, M.: On cloaking behaviors of malicious websites. Comput. Sec. 101, 102114 (2021)
Article Google Scholar
SatheeshKumar, M., Srinivasagan, K., UnniKrishnan, G.: A lightweight and proactive rule-based incremental construction approach to detect phishing scam. Inform. Technol. Manag., 1–28 (2022)
Google Scholar
Shaari, A.H., Kamaluddin, M.R., Paizi, W.F., Mohd, M., et al.: Online-dating romance scam in malaysia: An analysis of online conversations between scammers and victims. GEMA Online® J. Lang. Stud. 19(1) (2019)
Google Scholar
Shalke, C.J., Achary, R.: Social engineering attack and scam detection using advanced natural langugae processing algorithm. In: 6th International Conference on Trends in Electronics and Informatics, pp. 1749–1754. IEEE (2022)
Google Scholar
Sherman, I.N., Bowers, J., McNamara Jr, K., Gilbert, J.E., Ruiz, J., Traynor, P.: Are you going to answer that? measuring user responses to anti-robocall application indicators. In: NDSS (2020)
Google Scholar
Srinivasan, B., Kountouras, A., Miramirkhani, N., Alam, M., Nikiforakis, N., Antonakakis, M., Ahamad, M.: Exposing search and advertisement abuse tactics and infrastructure of technical support scammers. In: WWW 2018, pp. 319–328 (2018)
Google Scholar
Starov, O., Zhou, Y., Wang, J.: Detecting malicious campaigns in obfuscated javascript with scalable behavioral analysis. In: 2019 IEEE Security and Privacy Workshops (SPW), pp. 218–223. IEEE (2019)
Google Scholar
Tanana, D.: Behavior-based detection of cryptojacking malware. In: 2020 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT), pp. 0543–0545. IEEE (2020)
Google Scholar
Tashtoush, Y., Alrababah, B., Darwish, O., Maabreh, M., Alsaedi, N.: A deep learning framework for detection of covid-19 fake news on social media platforms. Data 7(5), 65 (2022)
Article Google Scholar
Torres, C.F., Baden, M., State, R.: Towards usable protection against honeypots. In: 2020 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), pp. 1–2. IEEE (2020)
Google Scholar
Toyoda, K., Mathiopoulos, P.T., Ohtsuki, T.: A novel methodology for hyip operators’ bitcoin addresses identification. IEEE Access 7, 74835–74848 (2019)
Article Google Scholar
Toyoda, K., Ohtsuki, T., Mathiopoulos, P.: Time series analysis for bitcoin transactions: the case of pirate@ 40’s hyip scheme. In: IEEE ICDMW 2018, pp. 151–155. IEEE (2018)
Google Scholar
Toyoda, K., Ohtsuki, T., Mathiopoulos, P.T.: Identification of high yielding investment programs in bitcoin via transactions pattern analysis. In: GLOBECOM 2017, pp. 1–6. IEEE (2017)
Google Scholar
Toyoda, K., Ohtsuki, T., Mathiopoulos, P.T.: Multi-class bitcoin-enabled service identification based on transaction history summarization. In: iThings/ GreenCom/ CPSCom/ SmartData/ Blockchain/ CIT/Cybermatics 2018, pp. 1153–1160. IEEE (2018)
Google Scholar
Tripathi, A., Ghosh, M., Bharti, K.: Analyzing the uncharted territory of monetizing scam videos on youtube. Soc. Netw. Anal. Min. 12(1), 1–18 (2022)
Article Google Scholar
Tu, H., Doupé, A., Zhao, Z., Ahn, G.J.: Users really do answer telephone scams. In: 28th \(\{\)USENIX\(\}\) Security Symposium, pp. 1327–1340 (2019)
Google Scholar
Ueno, D., et al.: Mild cognitive decline is a risk factor for scam vulnerability in older adults. Front. Psychiatry, 2365 (2021)
Google Scholar
Vasek, M., Moore, T.: There’s no free lunch, even using bitcoin: tracking the popularity and profits of virtual currency scams. In: Böhme, R., Okamoto, T. (eds.) FC 2015. LNCS, vol. 8975, pp. 44–61. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-47854-7_4
Vasek, M., Moore, T.: Analyzing the bitcoin ponzi scheme ecosystem. In: Zohar, A., Eyal, I., Teague, V., Clark, J., Bracciali, A., Pintore, F., Sala, M. (eds.) FC 2018. LNCS, vol. 10958, pp. 101–112. Springer, Heidelberg (2019). https://doi.org/10.1007/978-3-662-58820-8_8
Vasek, M., Thornton, M., Moore, T.: Empirical analysis of denial-of-service attacks in the bitcoin ecosystem. In: Böhme, R., Brenner, M., Moore, T., Smith, M. (eds.) FC 2014. LNCS, vol. 8438, pp. 57–71. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44774-1_5
Victor, F., Hagemann, T.: Cryptocurrency pump and dump schemes: Quantification and detection. In: 2019 International Conference on Data Mining Workshops (ICDMW), pp. 244–251. IEEE (2019)
Google Scholar
Whitty, M.T.: Anatomy of the online dating romance scam. Secur. J. 28(4), 443–455 (2015)
Article MathSciNet Google Scholar
Whitty, M.T.: Do you love me? psychological characteristics of romance scam victims. Cyberpsychol. Behav. Soc. Netw. 21(2), 105–109 (2018)
Article Google Scholar
Xu, J., Livshits, B.: The anatomy of a cryptocurrency pump-and-dump scheme. In: 28th \(\{\)USENIX\(\}\) Security Symposium, pp. 1609–1625 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Ottawa, Ottawa, ON, Canada
Emad Badawi, Guy-Vincent Jourdan & Iosif-Viorel Onut
IBM Centre for Advanced Studies, Ottawa, ON, Canada
Emad Badawi, Guy-Vincent Jourdan & Iosif-Viorel Onut

Authors

Emad Badawi
View author publications
You can also search for this author in PubMed Google Scholar
Guy-Vincent Jourdan
View author publications
You can also search for this author in PubMed Google Scholar
Iosif-Viorel Onut
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emad Badawi .

Editor information

Editors and Affiliations

University of Bordeaux, Bordeaux, France
Mohamed Mosbah
Toulouse III - Paul Sabatier University, Toulouse, France
Florence Sèdes
Université Laval, Québec, QC, Canada
Nadia Tawbi
University of Bordeaux, Bordeaux, France
Toufik Ahmed
Polytechnique Montréal, Montreal, QC, Canada
Nora Boulahia-Cuppens
Telecom SudParis, Palaiseau, France
Joaquin Garcia-Alfaro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Badawi, E., Jourdan, GV., Onut, IV. (2024). Web Scams Detection System. In: Mosbah, M., Sèdes, F., Tawbi, N., Ahmed, T., Boulahia-Cuppens, N., Garcia-Alfaro, J. (eds) Foundations and Practice of Security. FPS 2023. Lecture Notes in Computer Science, vol 14551. Springer, Cham. https://doi.org/10.1007/978-3-031-57537-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-57537-2_11
Published: 25 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57536-5
Online ISBN: 978-3-031-57537-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics