Abstract
Web-based scams rely on scam websites to provide fraudulent business or fake services to steal money and sensitive information from unsuspecting victims. Despite many researchers’ efforts to develop anti-scam detection techniques, their main focus has been on understanding, detecting, and analyzing scam sites. State-of-the-art anti-scam research still faces several challenges, such as acquiring a properly labeled scam dataset, especially when there is no blacklist, central repository, or previous large-scale analysis. The researchers have created labeled datasets in different ways, such as manually collecting and labeling the dataset or using a semi-automatic crawler followed by manual inspection. However, this process requires previous knowledge and understanding of the scam and much manual work.
In this paper, we propose a data-driven model to create a labeled training dataset for web-based scams that have a web presence. Given a small scam sample, our model formulates scam-related search queries and uses them on multiple search engines to search for, and collect, potential scam pages. After collecting a sufficiently large corpus of web pages, our model semi-automatically clusters the search results and creates a labeled training dataset with minimal human interaction. We have validated our model using two different scam types that we have studied in our previous work. We tested our classifiers against the databases of web pages we collected during our previous analysis of the scams and successfully detected more than 87% of the scam pages while maintaining a false positive value as low as 0.23%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
We describe applying the model to create and validate the training datasets for identifying BGS and GHS scams in a separate report hosted with our dataset https://bit.ly/DatasetPaper.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
We attached the dataset creation and validation process on a separate report hosted with our dataset https://bit.ly/DatasetPaperReport.
- 13.
We collected the domains by crawling cutestat.com search engine and a blacklist maintained by Bitcoin.fr.
- 14.
In our analysis, for both the manual and automated approaches, we did not include any automated process, such as crawling time. We only included the time we spent manually searching, inspecting, and labeling the pages.
- 15.
References
Abhishta, A., Joosten, R., Dragomiretskiy, S., Nieuwenhuis, L.J.: Impact of successful ddos attacks on a major crypto-currency exchange. In: 2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 379–384. IEEE (2019)
Afandi, N.A., Hamid, I.R.A.: Covid-19 phishing detection based on hyperlink using k-nearest neighbor (knn) algorithm. Appli. Inform. Technol. Comput. Sci. 2(2), 287–301 (2021)
Alarab, I., Prakoonwit, S., Nacer, M.I.: Comparative analysis using supervised learning methods for anti-money laundering in bitcoin. In: Proceedings of the 2020 5th International Conference on Machine Learning Technologies, pp. 11–17 (2020)
Alarab, I., Prakoonwit, S., Nacer, M.I.: Competence of graph convolutional networks for anti-money laundering in bitcoin blockchain. In: Proceedings of the 2020 5th International Conference on Machine Learning Technologies, pp. 23–27 (2020)
ARSLAN, A.: On the usefulness of html meta elements for web retrieval. Eskişehir Tech. Univ. . Sci. Technol. A-Appl. Sci. Eng. 21(1), 182–198 (2020)
Badawi, E., Jourdan, G.V., Bochmann, G., Onut, I.V.: An automatic detection and analysis of the bitcoin generator scam. In: 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS &PW), pp. 407–416. IEEE Computer Society, Los Alamitos, CA, USA (sep 2020)
Badawi, E., Jourdan, G.V., Bochmann, G., Onut, I.V.: Automatic detection and analysis of the “Game Hack” Scam. J. Web Eng. 18(8) (2020)
Badawi, E., Jourdan, G.-V., Bochmann, G., Onut, I.-V., Flood, J.: The “Game Hack’’ scam. In: Bakaev, M., Frasincar, F., Ko, I.-Y. (eds.) ICWE 2019. LNCS, vol. 11496, pp. 280–295. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-19274-7_21
Bartoletti, M., Carta, S., Cimoli, T., Saia, R.: Dissecting ponzi schemes on ethereum: identification, analysis, and impact. Futur. Gener. Comput. Syst. 102, 259–277 (2020)
Bartoletti, M., Lande, S., Loddo, A., Pompianu, L., Serusi, S.: Cryptocurrency scams: analysis and perspectives. IEEE Access 9, 148353–148373 (2021)
Bartoletti, M., Pes, B., Serusi, S.: Data mining for detecting bitcoin ponzi schemes. In: 2018 Crypto Valley Conference on Blockchain Technology (CVCBT), pp. 75–84. IEEE (2018)
Bidgoli, M., Grossklags, J.: ”hello. this is the irs calling.”: a case study on scams, extortion, impersonation, and phone spoofing. In: Electronic Crime Research (eCrime), 2017 APWG Symposium on, pp. 57–69. IEEE (2017)
Bistarelli, S., Parroccini, M., Santini, F.: Visualizing bitcoin flows of ransomware: Wannacry one week later, In: ITASEC (2018)
Bouma-Sims, E., Reaves, B.: A first look at scams on youtube. arXiv preprint arXiv:2104.06515 (2021)
Buchanan, T., Whitty, M.T.: The online dating romance scam: causes and consequences of victimhood. Psychol. Crime Law 20(3), 261–283 (2014)
Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler: a fast filter for the large-scale detection of malicious web pages. In: Proceedings of the 20th international conference on World wide web, pp. 197–206 (2011)
Charan, A.N.S., Chen, Y.H., Chen, J.L.: Phishing websites detection using machine learning with url analysis. In: 2022 IEEE World Conference on Applied Intelligence and Computing (AIC), pp. 808–812 (2022)
Chen, W., Xu, Y., Zheng, Z., Zhou, Y., Yang, J.E., Bian, J.: Detecting" pump & dump schemes" on cryptocurrency market using an improved apriori algorithm. In: 2019 IEEE International Conference on Service-Oriented System Engineering (SOSE), pp. 293–2935. IEEE (2019)
Chen, W., Zheng, Z., Cui, J., Ngai, E., Zheng, P., Zhou, Y.: Detecting ponzi schemes on ethereum: towards healthier blockchain technology. In: Proceedings of the 2018 World Wide Web Conference, pp. 1409–1418 (2018)
Chen, W., Zheng, Z., Ngai, E.C.H., Zheng, P., Zhou, Y.: Exploiting blockchain data to detect smart ponzi schemes on ethereum. IEEE Access 7, 37575–37586 (2019)
Clark, J.W., McCoy, D.: There are no free ipads: an analysis of survey scams as a business. In: Presented as part of the 6th USENIX Workshop on Large-Scale Exploits and Emergent Threats. USENIX, Washington, D.C. (2013)
Conti, M., Gangwal, A., Ruj, S.: On the economic significance of ransomware campaigns: a bitcoin transactions perspective. Comput. Sec. 79, 162–189 (2018)
Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious javascript code. In: Proceedings of the 19th International Conference on World Wide Web, pp. 281–290 (2010)
Crawford, J., Guan, Y.: Knowing your bitcoin customer: money laundering in the bitcoin economy. In: 2020 13th International Conference on Systematic Approaches to Digital Forensic Engineering (SADFE), pp. 38–45. IEEE (2020)
Custers, B., Oerlemans, J.J., Pool, R.: Laundering the profits of ransomware: money laundering methods for vouchers and cryptocurrencies. Euro. J. Crime Criminal Law Criminal Justice 28(2), 121–152 (2020)
Dashevskyi, S., Zhauniarovich, Y., Gadyatskaya, O., Pilgun, A., Ouhssain, H.: Dissecting android cryptocurrency miners. In: Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy, pp. 191–202 (2020)
Farrugia, S., Ellul, J., Azzopardi, G.: Detection of illicit accounts over the ethereum blockchain. Expert Syst. Appl. 150, 113318 (2020)
Gopal, R.D., Hojati, A., Patterson, R.A.: Analysis of third-party request structures to detect fraudulent websites. Decis. Support Syst. 154, 113698 (2022)
Goyal, P.S., Kakkar, A., Vinod, G., Joseph, G.: Crypto-ransomware detection using behavioural analysis. In: Varde, P.V., Prakash, R.V., Vinod, G. (eds.) Reliability, Safety and Hazard Assessment for Risk-Based Technologies. LNME, pp. 239–251. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-9008-1_20
Harley, D., Grooten, M., Burn, S., Johnston, C.: My pc has 32,539 errors: how telephone support scams really work. Virus Bulletin (2012)
Hong, G., et al.: Analyzing ground-truth data of mobile gambling scams. In: 2022 IEEE Symposium on Security and Privacy (SP), pp. 2176–2193. IEEE (2022)
Invernizzi, L., Comparetti, P.M., Benvenuti, S., Kruegel, C., Cova, M., Vigna, G.: Evilseed: a guided approach to finding malicious web pages. In: 2012 IEEE symposium on Security and Privacy, pp. 428–442. IEEE (2012)
Jung, E., Le Tilly, M., Gehani, A., Ge, Y.: Data mining-based ethereum fraud detection. In: 2019 IEEE International Conference on Blockchain (Blockchain), pp. 266–273. IEEE (2019)
Kamps, J., Kleinberg, B.: To the moon: defining and detecting cryptocurrency pump-and-dumps. Crime Sci. 7(1), 18 (2018)
Karhade, A., Yogi, A., Gupta, A., Landge, P., Galphade, M.: CNN for detection of COVID-19 using chest x-ray images. In: Verma, P., Charan, C., Fernando, X., Ganesan, S. (eds.) Advances in Data Computing, Communication and Security. LNDECT, vol. 106, pp. 251–259. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-8403-6_22
Kharraz, A., et al.: Outguard: detecting in-browser covert cryptocurrency mining in the wild. In: The World Wide Web Conference, pp. 840–852 (2019)
Kharraz, A., Robertson, W., Kirda, E.: Surveylance: automatically detecting online survey scams. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 70–86. IEEE (2018)
Kikerpill, K., Siibak, A.: Mazephishing: the covid-19 pandemic as credible social context for social engineering attacks. Trames: J. Humanities Soc. Sci. 25(4), 371–393 (2021)
Kumar, N., Singh, A., Handa, A., Shukla, S.K.: Detecting malicious accounts on the ethereum blockchain with supervised learning. In: Dolev, S., Kolesnikov, V., Lodha, S., Weiss, G. (eds.) CSCML 2020. LNCS, vol. 12161, pp. 94–109. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49785-9_7
Liao, K., Zhao, Z., Doupé, A., Ahn, G.J.: Behind closed doors: measurement and analysis of cryptolocker ransoms in bitcoin. In: 2016 APWG eCrime, pp. 1–13. IEEE (2016)
Miramirkhani, N., Starov, O., Nikiforakis, N.: Dial one for scam: a large-scale analysis of technical support scams. arXiv preprint arXiv:1607.06891 (2016)
Modic, D., Anderson, R.: It’s all over but the crying: the emotional and financial impact of internet fraud. IEEE Sec. Priv. 13(5), 99–103 (2015)
Mohan, K.J., Poojitha, P.A., Reddy, V.A., Ajay, Y., Vardhan, T.H.: Prediction and analysis of crime rate for tourists by using data mining 13(2), 1–12 (2022)
Moore, T., Clayton, R.: Evil Searching: compromise and recompromise of internet hosts for phishing. In: Dingledine, R., Golle, P. (eds.) FC 2009. LNCS, vol. 5628, pp. 256–272. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03549-4_16
Musch, M., Wressnegger, C., Johns, M., Rieck, K.: Thieves in the browser: web-based cryptojacking in the wild. In: Proceedings of the 14th International Conference on Availability, Reliability and Security, pp. 1–10 (2019)
Phillips, R., Wilder, H.: Tracing cryptocurrency scams: clustering replicated advance-fee and phishing websites. arXiv preprint arXiv:2005.14440 (2020)
Ravenelle, A.J., Janko, E., Kowalski, K.C.: Good jobs, scam jobs: detecting, normalizing, and internalizing online job scams during the covid-19 pandemic. New Media Soc. 24(7), 1591–1610 (2022)
Razali, M.A., Mohd Shariff, S.: CMBlock: in-browser detection and prevention cryptojacking tool using blacklist and behavior-based detection method. In: Badioze Zaman, H., et al. (eds.) IVIC 2019. LNCS, vol. 11870, pp. 404–414. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34032-2_36
Sadi, S.H., Pk, M.R.H., Zeki, A.M.: Threat detector for social media using text analysis. Inter. J. Perceptive Cognit. Comput. 7(1), 113–117 (2021)
Sahin, M., Relieu, M., Francillon, A.: Using chatbots against voice spam: Analyzing lenny’s effectiveness. In: Thirteenth Symposium on Usable Privacy and Security (SOUPS 2017), pp. 319–337. USENIX Association, Santa Clara, CA (2017)
Samarasinghe, N., Mannan, M.: On cloaking behaviors of malicious websites. Comput. Sec. 101, 102114 (2021)
SatheeshKumar, M., Srinivasagan, K., UnniKrishnan, G.: A lightweight and proactive rule-based incremental construction approach to detect phishing scam. Inform. Technol. Manag., 1–28 (2022)
Shaari, A.H., Kamaluddin, M.R., Paizi, W.F., Mohd, M., et al.: Online-dating romance scam in malaysia: An analysis of online conversations between scammers and victims. GEMA Online® J. Lang. Stud. 19(1) (2019)
Shalke, C.J., Achary, R.: Social engineering attack and scam detection using advanced natural langugae processing algorithm. In: 6th International Conference on Trends in Electronics and Informatics, pp. 1749–1754. IEEE (2022)
Sherman, I.N., Bowers, J., McNamara Jr, K., Gilbert, J.E., Ruiz, J., Traynor, P.: Are you going to answer that? measuring user responses to anti-robocall application indicators. In: NDSS (2020)
Srinivasan, B., Kountouras, A., Miramirkhani, N., Alam, M., Nikiforakis, N., Antonakakis, M., Ahamad, M.: Exposing search and advertisement abuse tactics and infrastructure of technical support scammers. In: WWW 2018, pp. 319–328 (2018)
Starov, O., Zhou, Y., Wang, J.: Detecting malicious campaigns in obfuscated javascript with scalable behavioral analysis. In: 2019 IEEE Security and Privacy Workshops (SPW), pp. 218–223. IEEE (2019)
Tanana, D.: Behavior-based detection of cryptojacking malware. In: 2020 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT), pp. 0543–0545. IEEE (2020)
Tashtoush, Y., Alrababah, B., Darwish, O., Maabreh, M., Alsaedi, N.: A deep learning framework for detection of covid-19 fake news on social media platforms. Data 7(5), 65 (2022)
Torres, C.F., Baden, M., State, R.: Towards usable protection against honeypots. In: 2020 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), pp. 1–2. IEEE (2020)
Toyoda, K., Mathiopoulos, P.T., Ohtsuki, T.: A novel methodology for hyip operators’ bitcoin addresses identification. IEEE Access 7, 74835–74848 (2019)
Toyoda, K., Ohtsuki, T., Mathiopoulos, P.: Time series analysis for bitcoin transactions: the case of pirate@ 40’s hyip scheme. In: IEEE ICDMW 2018, pp. 151–155. IEEE (2018)
Toyoda, K., Ohtsuki, T., Mathiopoulos, P.T.: Identification of high yielding investment programs in bitcoin via transactions pattern analysis. In: GLOBECOM 2017, pp. 1–6. IEEE (2017)
Toyoda, K., Ohtsuki, T., Mathiopoulos, P.T.: Multi-class bitcoin-enabled service identification based on transaction history summarization. In: iThings/ GreenCom/ CPSCom/ SmartData/ Blockchain/ CIT/Cybermatics 2018, pp. 1153–1160. IEEE (2018)
Tripathi, A., Ghosh, M., Bharti, K.: Analyzing the uncharted territory of monetizing scam videos on youtube. Soc. Netw. Anal. Min. 12(1), 1–18 (2022)
Tu, H., Doupé, A., Zhao, Z., Ahn, G.J.: Users really do answer telephone scams. In: 28th \(\{\)USENIX\(\}\) Security Symposium, pp. 1327–1340 (2019)
Ueno, D., et al.: Mild cognitive decline is a risk factor for scam vulnerability in older adults. Front. Psychiatry, 2365 (2021)
Vasek, M., Moore, T.: There’s no free lunch, even using bitcoin: tracking the popularity and profits of virtual currency scams. In: Böhme, R., Okamoto, T. (eds.) FC 2015. LNCS, vol. 8975, pp. 44–61. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-47854-7_4
Vasek, M., Moore, T.: Analyzing the bitcoin ponzi scheme ecosystem. In: Zohar, A., Eyal, I., Teague, V., Clark, J., Bracciali, A., Pintore, F., Sala, M. (eds.) FC 2018. LNCS, vol. 10958, pp. 101–112. Springer, Heidelberg (2019). https://doi.org/10.1007/978-3-662-58820-8_8
Vasek, M., Thornton, M., Moore, T.: Empirical analysis of denial-of-service attacks in the bitcoin ecosystem. In: Böhme, R., Brenner, M., Moore, T., Smith, M. (eds.) FC 2014. LNCS, vol. 8438, pp. 57–71. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44774-1_5
Victor, F., Hagemann, T.: Cryptocurrency pump and dump schemes: Quantification and detection. In: 2019 International Conference on Data Mining Workshops (ICDMW), pp. 244–251. IEEE (2019)
Whitty, M.T.: Anatomy of the online dating romance scam. Secur. J. 28(4), 443–455 (2015)
Whitty, M.T.: Do you love me? psychological characteristics of romance scam victims. Cyberpsychol. Behav. Soc. Netw. 21(2), 105–109 (2018)
Xu, J., Livshits, B.: The anatomy of a cryptocurrency pump-and-dump scheme. In: 28th \(\{\)USENIX\(\}\) Security Symposium, pp. 1609–1625 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Badawi, E., Jourdan, GV., Onut, IV. (2024). Web Scams Detection System. In: Mosbah, M., Sèdes, F., Tawbi, N., Ahmed, T., Boulahia-Cuppens, N., Garcia-Alfaro, J. (eds) Foundations and Practice of Security. FPS 2023. Lecture Notes in Computer Science, vol 14551. Springer, Cham. https://doi.org/10.1007/978-3-031-57537-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-57537-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57536-5
Online ISBN: 978-3-031-57537-2
eBook Packages: Computer ScienceComputer Science (R0)