Alarab, I. and Prakoonwit, S., 2022. Effect of data resampling on feature importance in imbalanced blockchain data: Comparison studies of resampling techniques. Data Science and Management, 5 (2), 66-76.
Full text available as:
|
PDF (OPEN ACCESS ARTICLE)
1-s2.0-S2666764922000145-main.pdf - Published Version Available under License Creative Commons Attribution. 2MB | |
PDF
Effect of Data Resampling on Feature Importance in Imbalanced Blockchain Data Comparison Studies of Resampling Techniques.pdf - Accepted Version Restricted to Repository staff only Available under License Creative Commons Attribution Non-commercial. 998kB | ||
Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk. Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material. |
DOI: 10.1016/j.dsm.2022.04.003
Abstract
Cryptocurrency blockchain data encounters a class-imbalance problem due to only a few known labels of illicit or fraudulent activities in the blockchain network. For this purpose, we seek to provide a comparison of various resampling methods applied to two highly imbalanced datasets derived from the blockchain of Bitcoin and Ethereum after further dimensionality reductions, unlike previous studies on these datasets. Firstly, we study the performance of various classical supervised learning methods to classify illicit transactions/accounts on Bitcoin/Ethereum datasets, respectively. Consequently, we apply a variety of resampling techniques to these datasets using the best performing learning algorithm on each of these datasets. Subsequently, we study the feature importance of the given models, wherein the resampled datasets have revealed a direct influence on the explainability of the model. Our main finding is that undersampling using the edited nearest-neighbour technique has attained an accuracy of more than 99% on the given datasets by removing the noisy data points from the whole dataset. Moreover, the best-performing learning algorithms have shown superior performance after feature reduction on these datasets in comparison to their original studies. The matchless contribution lies in discussing the effect of the data resampling on feature importance which is interconnected with explainable artificial intelligence techniques.
Item Type: | Article |
---|---|
ISSN: | 2666-7649 |
Additional Information: | Funded by Artificial intelligence assisted virtual reality system for blockchain network |
Uncontrolled Keywords: | Resampling techniques; Cryptocurrency data; Bitcoin blockchain; Ethereum blockchain |
Group: | Faculty of Science & Technology |
ID Code: | 37046 |
Deposited By: | Symplectic RT2 |
Deposited On: | 10 Jun 2022 11:05 |
Last Modified: | 07 Sep 2022 14:35 |
Downloads
Downloads per month over past year
Repository Staff Only - |