Skip to main content

Effect of data resampling on feature importance in imbalanced blockchain data: Comparison studies of resampling techniques.

Alarab, I. and Prakoonwit, S., 2022. Effect of data resampling on feature importance in imbalanced blockchain data: Comparison studies of resampling techniques. Data Science and Management, 5 (2), 66-76.

Full text available as:

[img]
Preview
PDF (OPEN ACCESS ARTICLE)
1-s2.0-S2666764922000145-main.pdf - Published Version
Available under License Creative Commons Attribution.

2MB
[img] PDF
Effect of Data Resampling on Feature Importance in Imbalanced Blockchain Data Comparison Studies of Resampling Techniques.pdf - Accepted Version
Restricted to Repository staff only
Available under License Creative Commons Attribution Non-commercial.

998kB

DOI: 10.1016/j.dsm.2022.04.003

Abstract

Cryptocurrency blockchain data encounters a class-imbalance problem due to only a few known labels of illicit or fraudulent activities in the blockchain network. For this purpose, we seek to provide a comparison of various resampling methods applied to two highly imbalanced datasets derived from the blockchain of Bitcoin and Ethereum after further dimensionality reductions, unlike previous studies on these datasets. Firstly, we study the performance of various classical supervised learning methods to classify illicit transactions/accounts on Bitcoin/Ethereum datasets, respectively. Consequently, we apply a variety of resampling techniques to these datasets using the best performing learning algorithm on each of these datasets. Subsequently, we study the feature importance of the given models, wherein the resampled datasets have revealed a direct influence on the explainability of the model. Our main finding is that undersampling using the edited nearest-neighbour technique has attained an accuracy of more than 99% on the given datasets by removing the noisy data points from the whole dataset. Moreover, the best-performing learning algorithms have shown superior performance after feature reduction on these datasets in comparison to their original studies. The matchless contribution lies in discussing the effect of the data resampling on feature importance which is interconnected with explainable artificial intelligence techniques.

Item Type:Article
ISSN:2666-7649
Additional Information:Funded by Artificial intelligence assisted virtual reality system for blockchain network
Uncontrolled Keywords:Resampling techniques; Cryptocurrency data; Bitcoin blockchain; Ethereum blockchain
Group:Faculty of Science & Technology
ID Code:37046
Deposited By: Symplectic RT2
Deposited On:10 Jun 2022 11:05
Last Modified:07 Sep 2022 14:35

Downloads

Downloads per month over past year

More statistics for this item...
Repository Staff Only -