A New Generative Adversarial Network for Improving Classification Performance for Imbalanced Data.

Tools

Strelcenia, E., 2024. A New Generative Adversarial Network for Improving Classification Performance for Imbalanced Data. Doctoral Thesis (Doctoral). Bournemouth University.

Full text available as:

Preview

PDF
STRELCENIA, Emilija_Ph.D._2023.pdf
Available under License Creative Commons Attribution Non-commercial.
13MB

Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk.

Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material.

Abstract

Data is a common issue in many industries, particularly in fields such as fraud detection and medical diagnosis. Imbalanced data refers to datasets where the distribution of classes is not equal, resulting in an over- representation of one class and an under-representation of another. This can lead to biassed and inaccurate machine learning models, as the algorithm may be inclined to favour the majority class and overlook important patterns in the minority class. Various sectors have utilised deep neural networks for data synthesis. However, according to research papers in these fields, balanced data outperforms imbalanced data when it comes to deep neural networks. Although deep generative approaches, such as Generative Adversarial Networks (GANs), are an efficient method of augmenting high-dimensional data, there is a lack of research on their effectiveness with credit card or breast cancer data and the current methods demonstrate limitations. Our research focuses on obtaining a great number of sets of data that are valid and resemble the minority class, in this case, fraudulent or malignant samples. Having more data like this can be used to train a binary classifier so it's effective against fraud or cancer diagnosis. To overcome challenges opposed to existing methods we have developed a novel GAN-based method called K-CGAN, which has been tested on credit card fraud and breast cancer data. K- CGAN is designed to generate synthetic data that resembles the minority class, effectively balancing the dataset and improving the performance of binary classifiers. Our research demonstrates the effectiveness of K-CGAN in handling complex data imbalance problems often encountered in practical applications. In addition, the experiments performed on different datasets indicate that K-CGAN can be used for various purposes. The application of machine learning algorithms in various industries has become increasingly popular in recent years. However, the quality and quantity of available data are crucial factors that directly impact the accuracy and reliability of these models. The scarcity and imbalance of datasets in certain domains pose challenges for researchers and practitioners, and the need for effective solutions is more pressing than ever. In this context, K- CGAN provides a promising approach to address data imbalance and improve the performance of machine learning models. Our results show that K-CGAN can be applied to different datasets with different characteristics, making it a valuable tool for data scientists and practitioners in various fields.

Item Type:	Thesis (Doctoral)
Additional Information:	If you feel that this work infringes your copyright please contact the BURO Manager.
Uncontrolled Keywords:	generative adversarial network; artificial intelligence; machine learning
Group:	Faculty of Science & Technology
ID Code:	39677
Deposited By:	Symplectic RT2
Deposited On:	05 Apr 2024 15:03
Last Modified:	05 Apr 2024 15:03

Downloads

Downloads per month over past year

More statistics for this item...

Repository Staff Only -