Hybrid genetic algorithm-neural network: feature extraction for unpreprocessed microarray data.

Tong, D. L. and Schierz, A. C., 2011. Hybrid genetic algorithm-neural network: feature extraction for unpreprocessed microarray data. Artificial Intelligence in Medicine, 53 (1), pp. 47-56.

This is the latest version of this eprint.

Full text not available from this repository.

Official URL: http://www.sciencedirect.com/science/article/pii/S...

DOI: 10.1016/j.artmed.2011.06.008

Abstract

Objective: Suitable techniques for microarray analysis have been widely researched, particularly for the study of marker genes expressed to a specific type of cancer. Most existing methods focus on classification ability and require the microarray data to be preprocessed. The objective of this study is to develop a hybrid Genetic Algorithm-Neural Network (GANN) model that emphasizes feature extraction and can operate on raw, unprocessed microarray data. Method: The GANN is a hybrid model where the fitness value of the genetic algorithm is based upon the number of samples correctly labeled by a standard feedforward neural network. The model is evaluated by using two benchmark microarray datasets with different array platforms and differing number of classes (a 2-class oligonucleotide microarray data for leukaemiagenesis and a 4-class cDNA microarray dataset for small round blue cell tumors). The underlying concept of the GANN algorithm is to select highly informative genes by co-evolving both the GA fitness function and the ANN weights at the same time. Results: For both datasets, the novel GANN selected approximately 50% of the same genes as previous studies. This may indicate that these common genes are more biologically significant than other genes in the datasets. The remaining 50% of the significant genes identified were used to build predictive models and for both datasets, the models based on the set of genes extracted by the GANN method produced more accurate results. The results also suggest that the GANN method not only can detect genes that are exclusively associated with a single cancer type but can also explore the genes that are differentially expressed in multiple cancer types. Conclusions: The results show that the GANN model has successfully extracted statistically significant genes from the unprocessed microarray data as well as extracting known biologically significant genes. We also show that assessing the statistical significance of genes based on classification accuracy may be misleading and though the GANN’s set of extra genes prove to be more statistically significant than those selected by other methods, a biological assessment of these genes is highly recommended to confirm their functionality.

Item Type:Article
ISSN:0933-3657
Uncontrolled Keywords:Genetic Algorithm (GA), Artificial Neural Network (ANN), Microarray Data, Feature Extraction, Predictive Modeling
Subjects:Science > Biology and Botany
Generalities > Computer Science and Informatics > Artificial Intelligence
Group:School of Design, Engineering & Computing > Smart Technology Research Centre
ID Code:18456
Deposited By:Dr Amanda C. Schierz LEFT
Deposited On:09 Sep 2011 12:47
Last Modified:07 Mar 2013 15:48
Repository Staff Only -
BU Staff Only -
Help Guide - Editing Your Items in BURO