Abstract
Bitcoin is a digital currency that provides a way to transact without any trusted intermediary; however, privacy is an issue. Numerous deanonymization endeavors have been proposed, in spite of the fact that Bitcoin addresses aren’t linked with a specific identity. In this work, blockchain transactions are deanonymized using ensemble learning. An excess of four million labeled dataset samples comprising user activities such as pools, services, gambling, and exchanges have been gathered from various repositories and prepared for training and validation to perform the classification. The main aim is to deanonymize blockchain transactions via classification and separate legitimate ones from illegitimate ones. On the class imbalanced dataset, remarkable cross-validation accuracy was attained using the EXtreme Gradient Boosting with default parameters and hyperparameters. Using EXtreme Gradient Boosting, Random Forest, and Bagging on the class-balanced dataset produced the best cross-validation accuracy when using the default parameters and hyperparameters. The empirical findings indicate that the effectiveness of the proposed deanonymization using the proposed ensemble learning model has achieved up to 98.45% accuracy.
Similar content being viewed by others
Data availability
The data set generated and/or analyzed during the current study is available upon reasonable request from the corresponding author. However, data sets are available as open source.
References
Nayyer N, Javaid N, Akbar Ma, Aldegheıshem A, Alrajeh N, Jamil M (2023) A new framework for fraud detection in Bitcoin transactions through Ensemble Stacking Model in Smart cities. IEEE Access 11:90916–90938. https://doi.org/10.1109/ACCESS.2023.3308298
Mundhe P, Phad P, Yuvaraj R et al (2023) Blockchain-based conditional privacy-preserving authentication scheme in VANETs. Multimed Tools Appl 82:24155–24179. https://doi.org/10.1007/s11042-022-14288-8
Nicholls J, Kuppa A, Le-Khac NA (2023) SoK: The next phase of identifying illicit activity in Bitcoin. In: Proc IEEE Int Conf Blockchain Cryptocurrency (ICBC), pp 1–10. https://doi.org/10.1109/ICBC56567.2023.10174963
Nakamoto S (2008) Bitcoin: a peer-to-peer electronic cash system. Available at SSRN 3440802. Accessed 11 Sept 2023
Bohme R, Christin N, Edelman B, Moore T (2015) Bitcoin: Economics, technology, and governance. J Economic Perspect 29(2):213–238. https://doi.org/10.1257/jep.29.2.213
Rahouti M, Xiong K, Ghani N (2018) Bitcoin concepts, threats, and machine-learning security solutions. IEEE Access 6:67189–67205. https://doi.org/10.1109/ACCESS.2018.2874539
Panda SK, Sathya AR, Das S (2023) Bitcoin: beginning of the Cryptocurrency era. In: Panda SK, Mishra V, Dash SP, Pani AK (eds) Recent advances in Blockchain Technology. Intelligent systems Reference Library, vol 237. Springer, Cham. https://doi.org/10.1007/978-3-031-22835-3_2
Christin N (2013) Traveling the Silk Road: A measurement analysis of a large anonymous online marketplace. In: Proceedings of the 22nd International Conference on World Wide Web, pp 213–224. https://doi.org/10.1145/2488388.2488408
Hout MCV, Bingham T (2013) Silk Road’, the virtual drug marketplace: a single case study of user experiences. Int J Drug Policy 24(5):385–391. https://doi.org/10.1016/j.drugpo.2013.01.005
Martin J (2014) Lost on the Silk Road: online drug distribution and the ‘cryptomarket.’ Criminol Criminal Justice 14(3):351–367. https://doi.org/10.1177/1748895813505234
Karlstrøm H (2014) Do libertarians dream of electric coins? The material embeddedness of Bitcoin. Distinktion: Scandinavian J Social Theory 15(1):23–36. https://doi.org/10.1080/1600910X.2013.870083
Nouman M, Qasim U, Nasir H, Almasoud A, Imran M, Javaid N (2023) Malicious Node Detection Using Machine Learning and Distributed Data Storage Using Blockchain in WSNs. IEEE Access 11:6106–6121. https://doi.org/10.1109/ACCESS.2023.3236983
Meiklejohn S, Pomarole M, Jordan G, Levchenko K, McCoy D, Voelker GM, Savage S (2016) A fistful of bitcoins: characterizing payments among men with no names. Commun ACM 59(4):86–93. https://doi.org/10.1145/2896384
Chaurasia BK, Verma S (2010) Maximising Anonymity of a Vehicle. In: International Journal of Autonomous and Adaptive Communications Systems (IJAACS), Special Issue on: Security, Trust, and Privacy in DTN and Vehicular Communications, Inderscience 3(2):198–216. https://doi.org/10.1504/IJAACS.2010.031091https://doi.org/10.1080/07421222.2016.1205918
Samtani S, Chinn R, Chen H, Nunamaker JF Jr (2017) Exploring emerging hacker assets and key hackers for proactive cyber threat intelligence. J Manage Inform Syst 34(4):1023–1053. https://doi.org/10.1080/07421222.2017.1394049
Andola N, Yadav VK, Venkatesan S, Verma S (2021) Anonymity on blockchain based e-cash protocols—A survey. Comput Sci Rev 40:100394–100411. https://doi.org/10.1016/j.cosrev.2021.100394
Andola N, Raghav, Yadav VK et al (2021) SpyChain: a Lightweight Blockchain for Authentication and Anonymous authorization in IoD. Wirel Pers Commun 119:343–362. https://doi.org/10.1007/s11277-021-08214-8
Beck R (2018) Beyond bitcoin: The rise of blockchain world. Computer 51(2):54–58
Abbasi A, Zahedi FM, Zeng D, Chen Y, Chen H, Nunamaker JF Jr (2015) Enhancing predictive analytics for anti-phishing by exploiting website genre information. J Manage Inform Syst 31(4):109–157. https://doi.org/10.1080/07421222.2014.1001260
Benjamin V, Zhang B, Nunamaker JF Jr, Chen H (2016) Examining hacker participation length in cybercriminal internet-relay-chat communities. J Manage Inform Syst 33(2):482–510
Abbasi A, Hsinchun C (2005) Applying authorship analysis to extremist-group web forum messages. IEEE Intell Syst 20(5):67–75. https://doi.org/10.1109/mis.2005.81
Beck R, Czepluch JS, Lollike N, Malone S (2016) Blockchain–the gateway to trust-free cryptographic transactions. In: Twenty-Fourth European Conference on Information Systems (ECIS), pp 1–14
Koshy P, Koshy D, McDaniel P (2014) An analysis of anonymity in bitcoin using p2p network traffic. In: Christin N, Safavi-Naini R (eds) Financial Cryptography and Data Security. FC 2014. Lecture Notes in Computer Science() 8437:469–485. https://doi.org/10.1007/978-3-662-45472-5_30
Androulaki E, Karame GO, Roeschlin M, Scherer T, Capkun S (2013) Evaluating user privacy in bitcoin In: Sadeghi AR (eds) Financial Cryptography and Data Security 7859: 34–51. https://doi.org/10.1007/978-3-642-39884-1_4
Bonneau J, Narayanan A, Miller A, Clark J, Kroll JA, Felten EW (2014) Mixcoin: Anonymity for bitcoin with accountable mixes. In: Christin, N., Safavi-Naini, R. (eds) Financial Cryptography and Data Security. FC 2014. Lecture Notes in Computer Science 8437: 486–504. https://doi.org/10.1007/978-3-662-45472-5_31
Misra G, Hazela B, Chaurasia BK (2013) Zero knowledge based authentication for internet of medical things. In: 14th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp 1–6
Meiklejohn S, Orlandi C (2015) Privacy-enhancing overlays in bitcoin. In: International Conference on Financial Cryptography and Data Security, pp 127–141.https://doi.org/10.1007/978-3-662-48051-9_10
Harlev MA, Sun Yin H, Langenheldt KC, Mukkamala R, Vatrapu R (2018) Breaking bad: De-anonymising entity types on the bitcoin blockchain using supervised machine learning. In: Proceedings of the 51st Hawaii International Conference on System Sciences, pp 3497–3506
Zola F, Eguimendia M, Bruse JL, Urrutia RO (2019) Cascading machine learning to attack bitcoin anonymity. In: IEEE International Conference on Blockchain (Blockchain), pp 10–17. https://doi.org/10.1109/Blockchain.2019.00011
Yin HHS, Langenheldt K, Harlev M, Mukkamala RR, Vatrapu R (2019) Regulating cryptocurrencies: a supervised machine Learning Approach to de-anonymizing the Bitcoin Blockchain. J Manage Inform Syst 36(1):37–73. https://doi.org/10.1080/07421222.2018.1550550
Lin YJ, Wu PW, Hsu CH, Tu IP, Liao SW (2019) An evaluation of bitcoin address classification based on transaction history summarization. In: 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), pp 302–310. https://doi.org/10.1109/BLOC.2019.8751410
Lee C, Maharjan S, Ko K, Hong JWK (2020) Toward detecting illegal transactions on Bitcoin using machine-learning methods. In: Zheng Z, Dai HN, Tang M, Chen X (eds) Blockchain and Trustworthy systems. BlockSys 2019. Communications in Computer and Information Science, vol 1156. Springer, Singapore.https://doi.org/10.1007/978-981-15-2777-7_42
Li Y, Cai Y, Tian H, Xue G, Zheng Z (2020) Identifying Illicit addresses in Bitcoin Network. In: Zheng Z, Dai HN, Fu X, Chen B (eds) Blockchain and Trustworthy systems. BlockSys 2020, vol 1267. Springer, Singapore. https://doi.org/10.1007/978-981-15-9213-3_8
Liu T et al (2020) A new Bitcoin address Association Method using a two-level learner model. In: Wen S, Zomaya A, Yang LT et al (eds) Algorithms and architectures for parallel Processing. ICA3PP 2019, vol 11945. Springer, Cham. https://doi.org/10.1007/978-3-030-38961-1_31
Farrugia S, Ellul J, Azzopardi G (2020) Detection of illicit accounts over the Ethereum blockchain. Expert Syst Appl 150:113318. https://doi.org/10.1016/j.eswa.2020.113318
Michalski R, Dziubałtowska D, Macek P (2020) Revealing the character of nodes in a blockchain with supervised learning. IEEE Access 8:109639–109647. https://doi.org/10.1109/ACCESS.2020.3001676
Poursafaei F, Hamad GB, Zilic Z (2020) Detecting malicious Ethereum entities via application of machine learning classification. In: 2nd IEEE Conference on Blockchain Research & Applications for Innovative Networks and Services (BRAINS), pp 120–127
Kang C, Lee C, Ko K, Woo J, Hong JWK (2020) De-anonymization of the Bitcoin Network using address clustering. In: Zheng Z, Dai HN, Fu X, Chen B (eds) Blockchain and Trustworthy systems. BlockSys 2020, vol 1267. Springer, Singapore. https://doi.org/10.1007/978-981-15-9213-3_38
Ibrahim RF, Elian AM, Ababneh M (2021) Illicit account detection in the ethereum blockchain using machine learning. In: 2021 InternationaL Conference on Information Technology (ICIT), pp 488–493. https://doi.org/10.1109/ICIT52682.2021.9491653
Elbaghdadi A, Mezroui S, El Oualkadi A (2021) K-Nearest Neighbors Algorithm (KNN): An approach to detect illicit transaction in the bitcoin network. In: Azevedo A, Santos M (eds) Integration Challenges for Analytics, Business Intelligence, and Data Mining, (pp 161–178). IGI Global. https://doi.org/10.4018/978-1-7998-5781-5.ch008
Nerurkar P, Bhirud S, Patel D, Ludinard R, Busnel Y, Kumari S (2021) Supervised learning model for identifying illegal activities in Bitcoin. Appl Intell 51:3824–3843. https://doi.org/10.1007/s10489-020-02048-w
Jatoth C, Jain R, Fiore U, Chatharasupalli S (2022) Improved classification of Blockchain transactions using feature Engineering and Ensemble Learning. Future Internet 14(1):16. https://doi.org/10.3390/fi14010016
Nerurkar P (2023) Illegal activity detection on bitcoin transaction using deep learning. Soft Comput 27:5503–5520. https://doi.org/10.1007/s00500-022-07779-1
De Juan Fidalgo P, Cámara C, Peris-Lopez P (2023) Generation and Classification of Illicit Bitcoin Transactions. In: Bravo J, Ochoa S, Favela J (eds) Proceedings of the International Conference on Ubiquitous Computing & Ambient Intelligence (UCAmI 2022). UCAmI 2022. Lecture Notes in Networks and Systems, vol 594. Springer, Cham. https://doi.org/10.1007/978-3-031-21333-5_108
Sharma AK, Peelam MS, Chaurasia BK, Chamola V (2023) QIoTChain: Quantum IoT-blockchain fusion for advanced data protection in Industry 4.0. IET Blockchain published by John Wiley & Sons Ltd, pp 1–11.https://doi.org/10.1049/blc2.12059
Al-Hashedi KG et al (2023) A supervised model to detect suspicious activities in the bitcoin network. In: Al-Sharafi MA, Al-Emran M, Al-Kabi MN, Shaalan K (eds) Proceedings of the 2nd International Conference on Emerging Technologies and Intelligent Systems. ICETIS 2022. Lecture Notes in Networks and Systems, vol 584. Springer, Cham. https://doi.org/10.1007/978-3-031-25274-7_53
Umer Q, Li JW, Ashraf MR, Bashir RN, Ghous H (2023) Ensemble deep learning based prediction of fraudulent Cryptocurrency transactions. IEEE Access. https://doi.org/10.1109/ACCESS.2023.3310576
Khalilov MCK, Levi A (2018) A survey on anonymity and privacy in bitcoin-like digital cash systems. IEEE Commun Surv Tutorials 20(3):2543–2585. https://doi.org/10.1109/COMST.2018.2818623
Reiter MK, Rubin AD (1998) Crowds: anonymity for Web transactions. ACM Transactions on Information and System Security (TISSEC) 1(1):66–92. https://doi.org/10.1145/290163.290168
Chaurasia BK, Verma S, Tomar GS (2013) Intersection attack on anonymity in VANET. In: Gavrilova ML, Tan CJK (eds) Transactions on Computational Science XVII, Springer-Verlag Berlin Heidelberg 7420:133–149. https://doi.org/10.1007/978-3-642-35840-1_7
Wu X, Bertino E (2007) An analysis study on Zone-based Anonymous Communication in Mobile Ad Hoc Networks. IEEE Trans Dependable Secure Comput 4(4):252–264. https://doi.org/10.1109/TDSC.2007.70213
Froomkin AM (1995) Anonymity and its enmities. Journal of Online Law, Online available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2715621. Accessed 15 May 2023
Froomkin AM (1999) Legal issues in anonymity and pseudonymity. Inform Soc 15(2):113–127. https://doi.org/10.1080/019722499128574
Cui J, Huang C, Meng H, Wei R (2023) Tor network anonymity evaluation based on node anonymity. Cybersecurity 6(55):1–16. https://doi.org/10.1186/s42400-023-00191-8
Zhang W, Lu T, Du Z (2021) TNRAS: Tor nodes reliability analysis scheme. In: The 11th International Conference on Communication and Network Security, pp 21–26
Schnitzler T, Pöpper C, Dürmuth M, Kohls K (2021) We built this circuit: Exploring threat vectors in circuit establishment in Tor. In: 2021 IEEE European Symposium on Security and Privacy (EuroS&P), pp 319–336
Mienye ID, Sun Y (2022) A survey of ensemble learning: concepts, algorithms, applications, and prospects. IEEE Access 10:99129–99149. https://doi.org/10.1109/ACCESS.2022.3207287
Chaurasia BK, Raj H, Rathour SS, Singh PB (2023) Transfer learning driven ensemble model for detection of diabetic retinopathy disease. In Medical, Biological Engineering and Computing, Springer 61:2033–2049. https://doi.org/10.1007/s11517-023-02863-6
Zhou ZH (2012) Ensemble methods: foundations and algorithms, 1st edn. Chapman and Hall/CRC. https://doi.org/10.1201/b12207
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: ICML 96:148–156. Online Available at: https://cseweb.ucsd.edu/~yfreund/papers/boostingexperiments.pdf. Accessed 17 Sept 2023
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J (2011) Scikit-learn: Machine learning in Python. In: The Journal of machine Learning research 12:2825–2830. Online Available at: https://jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf. Accessed 17 Sept 2023
Merkle RC (2019) Protocols for public key cryptosystems. IEEE Symposium on Security and Privacy, pp 122–134. https://doi.org/10.1109/SP.1980.10006
Andrychowicz M, Dziembowski S, Malinowski D, Mazurek Ł (2015) On the Malleability of Bitcoin Transactions. In: Brenner M, Christin N, Johnson B, Rohloff K (eds) Financial Cryptography and Data Security. Lecture Notes in Computer Science 8976: 1–18. https://doi.org/10.1007/978-3-662-48051-9_1
Blockchair Database, Online available at: https://gz.blockchair.com/bitcoin/transactions. Accessed 29 Mar 2023
WalletExplorer, Online available at: https://www.walletexplorer.com/. Accessed 30 Mar 2023
Beautiful Soup, Online available at: https://www.browserstack.com/guide/web-scraping-using-beautiful-soup. Accessed 30 Mar 2023
LabelEncoder, Online available at: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html. Accessed 23 Apr 2023
Singh D, Singh B (2020) Investigating the impact of data normalization on classification performance. Appl Soft Comput 97:105524
García S, Luengo J, Herrera F (2015) Data preprocessing in data mining, vol 72. Springer International Publishing, Cham, Switzerland, pp 59–139
Han J, Pei, J, Tong, H (2012) Data mining: concepts and techniques. Morgan Kaufmann. https://doi.org/10.1016/C2009-0-61819-5
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 1(16):321–357. https://doi.org/10.1613/jair.953
Saxena R, Arora D, Nagar V (2023) Classifying blockchain cybercriminal transactions using hyperparameter tuned supervised machine learning models. Int J Comput Sci Eng 26(6):615–626. https://doi.org/10.1504/IJCSE.2022.10056854
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsl 6(1):20–29. https://doi.org/10.1145/1007730.1007735
SMOTE Module, Online available at: https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.SMOTE.html. Accessed on 23/04/2023
RandomOverSampler, Online available at: https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.RandomOverSampler.html. Accessed 23/04/2023
RandomUnderSampler, Online available at https://imbalanced-learn.org/stable/references/generated/imblearn.under_sampling.RandomUnderSampler.html. Accessed 23/04/2023
Schratz P, Muenchow J, Iturritxa E, Richter J, Brenning A (2019) Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data. Ecol Model 406:109–120. https://doi.org/10.1016/j.ecolmodel.2019.06.002
Probst P, Wright MN, Boulesteix AL (2019) Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: Data Min Knowl Discovery 9(3):1–15. https://doi.org/10.1002/widm.1301
Zhang L, Zhan C (2017) Machine learning in rock facies classification: an application of XGBoost. In: International Geophysical Conference on Society of Exploration Geophysicists and Chinese Petroleum Society, pp 1371–1374. https://doi.org/10.1190/IGC2017-351
Bajpai S, Sharma K, Chaurasia BK (2023) Intrusion detection Framework in IoT Networks. Springer Nature Computer Science Journal. Special Issue Mach Learn Smart Syst 4(350):1–17. https://doi.org/10.1007/s42979-023-01770-9
Liashchynskyi P, Liashchynskyi P (2019) Grid search, random search, genetic algorithm: a big comparison for NAS. Online available at https://arxiv.org/pdf/1912.06059.pdf. Accessed 12/02/ 2023
Putatunda S, Rama K (2018) A comparative analysis of hyperopt as against other approaches for hyper-parameter optimization of XGBoost. In: Proceedings of the 2018 International Conference on Signal Processing and Machine Learning, pp 6–10. https://doi.org/10.1145/3297067.3297080
RandomizedSearchCV Online available at: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html. Accessed 23/04/2023
Syarif I, Prugel-Bennett A, Wills G (2016) SVM parameter optimization using grid search and genetic algorithm to improve classification performance. (TELKOMNIKA) Telecommun Comput Electronics Control 14(4): 1502–1509. https://doi.org/10.12928/TELKOMNIKA.v14i4.3956
Ataei M, Osanloo M (2004) Using a combination of genetic algorithm and the Grid Search Method to Determine Optimum Cutoff grades of multiple metal deposits. Int J Surf Min Reclam Environ 18(1):60–78. https://doi.org/10.1076/ijsm.18.1.60.23543
Xiao T, Ren D, Lei S, Zhang J, Liu X (2014) Based on grid-search and PSO parameter optimization for Support Vector Machine. In: Proceeding of the 11th World Congress on Intelligent Control and Automation, pp 1529–1533. https://doi.org/10.1109/WCICA.2014.7052946
GridSearchCV Online available at https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html. Accessed 23/04/2023
Aversana PD (2019) Comparison of different machine learning algorithms for lithofacies classification from well logs. Bollettino Di Geofis Teorica Ed Appl 60(1):69–80. https://doi.org/10.4430/bgta0256
Funding
The authors have not received funding from any of the sources.
Author information
Authors and Affiliations
Contributions
The idea and problem formulation along with proposed solution, result analysis, and by corresponding author & supervisor, and verified by all other authors.
Corresponding author
Ethics declarations
Conflict of interest
The work is not submitted in any other journal. There is no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Saxena, R., Arora, D., Nagar, V. et al. Blockchain transaction deanonymization using ensemble learning. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19233-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-19233-5