Skip to main content

Early-stage phishing detection on the Ethereum transaction network

Abstract

As cryptocurrency is widely accepted and used, attendant illegal activities have attracted extensive attention, especially phishing scams, which bring great losses to both customers and countries. From the perspective of crime prevention, early warning of such illegal behaviors is of great significance. However, most existing studies focus on detecting phishing scams that have already occurred and been reported. In addition, previous studies ignore the temporal order of users' appearance and thus cannot accurately extract features reflecting users’ transaction patterns. In this paper, we propose a framework called early-stage phishing detection to address the problem of early phishing detection. According to the phishing amount, we first divide the process of phishing scams into three stages: early stage, middle stage, and late stage. Then, we develop a feature extraction method to capture features from both the local network structures and the time series of transactions. In experiments, the dataset is strictly partitioned by time series, and experimental results show that our proposed method outperforms existing graph embedding methods on a real-world Ethereum transaction dataset. Finally, we select the ten most important features and analyze the differences between phishing users and normal users on these features, which provide useful insights for regulators and platforms to detect phishing scams in advance.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Data availability

The dataset analyzed during the current study is available in the XBlock, one of the blockchain data platforms in the academic community https://xblock.pro/#/.

Notes

  1. https://coinmarketcap.com/.

  2. https://go.chainalysis.com/2020-crypto-crime-report.

  3. https://etherscan.io/.

References

  • Chang W-H, Chang J-S (2012) An effective early fraud detection method for online auctions. Electron Commer R A 11(4):346–360

    Article  Google Scholar 

  • Chen L, Peng J, Liu Y, Li J, Xie F, Zheng Z (2020a) Phishing scams detection in ethereum transaction network. ACM Trans Internet Techn 21(1):1–16

    Article  Google Scholar 

  • Chen T, Li Z, Zhu Y, Chen J, Luo X, Lui JC-S, Lin X, Zhang X (2020b) Understanding ethereum via graph analysis. ACM Trans Internet Techn 20(2):1–32. https://doi.org/10.1145/3381036

    Article  Google Scholar 

  • Chen W, Guo X, Chen Z, Zheng Z, Lu Y (2020c) Phishing scam detection on ethereum towards financial security for blochchain ecosystem. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Ferretti S, D’Angelo G (2019) On the Ethereum blockchain structure: A complex networks theory perspective. Concurr Comp-Pract E 32:12. https://doi.org/10.1002/cpe.5493

    Article  Google Scholar 

  • Gao M, Ma L, Liu H, Zhang Z, Ning Z, Xu J (2020) Malicious network traffic detection based on deep neural networks and association analysis. Sensors-Basel 20(5):1452

    Article  Google Scholar 

  • Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. KDD

  • Guo D, Dong J, Wang K (2019) Graph structure and statistical properties of Ethereum transaction relationships. Inform Sci 492:58–71. https://doi.org/10.1016/j.ins.2019.04.013

    Article  MathSciNet  Google Scholar 

  • Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1):389–422

    Article  MATH  Google Scholar 

  • Han W, Cao Y, Bertino E, Yong J (2012) Using automated individual white-list to protect web digital identities. Expert Syst Appl 39(15):11861–11869

    Article  Google Scholar 

  • Jain AK, Gupta BB (2018) Two-level authentication approach to protect from phishing attacks in real time. J Amb Intel Hum Comp 9(6):1783–1796

    Article  Google Scholar 

  • Khonji M, Iraqi Y, Jones A (2013) Phishing detection: a literature survey. IEEE Commun Surv Tutor 15(4):2091–2121. https://doi.org/10.1109/surv.2013.032213.00009

    Article  Google Scholar 

  • Kotsiantis SB, Zaharakis ID, Pintelas PE (2006) Machine learning: a review of classification and combining techniques. Artif Intell Rev 26(3):159–190

    Article  Google Scholar 

  • Lakhani KR, Iansiti M (2017) The truth about blockchain. Harvard Bus Rev 95(1):119–127

    Google Scholar 

  • Lee XT, Khan A, Sen Gupta S, Ong YH, Liu X (2020) Measurements, analyses, and insights on the entire ethereum blockchain network. In: Proceedings of the web conference

  • Li Y, Akcora UIC, Smirnova E, Gel YR, Kantarcioglu M (2020) Dissecting ethereum blockchain analytics what we learn from topology and geometry of the ethereum graph. In: Proceedings of the 2020 SIAM international conference on data mining

  • Lin D, Wu J, Yuan Q, Zheng Z (2020) Modeling and understanding Ethereum transaction records via a complex network approach. IEEE Trans Circuits-II 67(11):2737–2741. https://doi.org/10.1109/tcsii.2020.2968376

    Article  Google Scholar 

  • Narayanan A, Chandramohan M, Chen L, Liu Y, Saminathan S (2016) subgraph2vec: learning distributed representations of rooted sub-graphs from large graphs. arXiv:1606.08928

  • Podgorelec B, Turkanović M, Karakatič S (2019) A machine learning-based method for automated blockchain transaction signing including personalized anomaly detection. Sensors-Basel 20(1):147

    Article  Google Scholar 

  • Ramzan Z (2010) Phishing attacks and countermeasures. In: Handbook of information and communication security, pp 433–448

  • Sahingoz OK, Buber E, Demir O, Diri B (2019) Machine learning based phishing detection from URLs. Expert Syst Appl 117:345–357

    Article  Google Scholar 

  • Sharifi M, Siadati S H (2008) A phishing sites blacklist generator. In: 2008 IEEE/ACS international conference on computer systems and applications

  • Stojanović B, Božić J, Hofer-Schmitz K, Nahrgang K, Weber A, Badii A, Sundaram M, Jordan E, Runevic J (2021) Follow the trail: machine learning for fraud detection in Fintech applications. Sensors-Basel 21(5):1594

    Article  Google Scholar 

  • Van der Merwe A, Loock M, Dabrowski M (2005) Characteristics and responsibilities involved in a phishing attack. In: Proceedings of the 4th international symposium on information and communication technologies

  • Victor F, Lüders B K (2019) Measuring ethereum-based ERC20 token networks. In: International conference on financial cryptography and data security

  • Wang J, Chen P, Yu S, Xuan Q (2021) TSGN transaction subgraph networks for identifying Ethereum phishing accounts. arXiv:2104.08767

  • Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’networks. Nature 393(6684):440–442

    Article  MATH  Google Scholar 

  • Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics 1:80–83

    Article  MathSciNet  Google Scholar 

  • Wolsing K, Roepert L, Bauer J, Wehrle K (2022) Anomaly detection in maritime ais tracks: a review of recent approaches. J Mar Sci Eng 10(1):112

    Article  Google Scholar 

  • Wu J, Yuan Q, Lin D, You W, Chen W, Chen C, Zheng Z (2020) Who are the phishers? Phishing scam detection on Ethereum via network embedding. IEEE Trans Syst Man Cybern Syst A. https://doi.org/10.1109/tsmc.2020.3016821

    Article  Google Scholar 

  • Yuan Q, Huang B, Zhang J, Wu J, Zhang H, Zhang X (2020) Detecting phishing scams on ethereum based on transaction records. In: IEEE international symposium on circuits and systems (ISCAS)

  • Zheng P, Zheng Z, Wu J, Dai H-N (2020) Xblock-ETH: Extracting and exploring blockchain data from Ethereum. IEEE Open J Comp Soc 1:95–106

    Article  Google Scholar 

Download references

Funding

The work described in this paper was supported by the National Natural Science Foundation of China (72025104), the Fundamental Research Funds for the Central Universities (JBK2103009), the fund of Financial Innovation Center, SWUFE and the achievements transformation projects of SWUFE Jiaozi Institute of Fintech Innovation.

Author information

Authors and Affiliations

Authors

Contributions

The initial idea and theoretical framework were first proposed by Yun Wan. Material preparation, data collection, analysis and experiment design were performed by Yun Wan, Dapeng Zhang and Feng Xiao. The first draft of the manuscript was written by Yun Wan and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Feng Xiao.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wan, Y., Xiao, F. & Zhang, D. Early-stage phishing detection on the Ethereum transaction network. Soft Comput 27, 3707–3719 (2023). https://doi.org/10.1007/s00500-022-07661-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-022-07661-0

Keywords

  • Early-stage
  • Phishing detection
  • Cryptocurrency
  • Feature extraction