Affiliation: Department of Software & Information Science
Iwate Prefectural University
Iwate Ken, Takizawa, Japan 020-0193
Analysis of Gene-expression Data-complexities, Solutions and Pending Issues
In recent years, microarray technology has advanced to such a sophistication that, it is possible to obtain geneexpression level of several thousand genes in a single experiment. Simultaneous measurements of tens of thousands of mRNAs can be performed, in which gene expressions of two samples are compared. Depending on the source of the two compared samples, important investigations, like disease progress, diagnosis, drug response, etc., can be done by analyzing DNA microarray data. When one sample source is a healthy cell, and the other a cancerous one, it is possible to identify changes in particular gene expression with the progress of the disease. The aim is to identify a few number of genes, which as a set of features, could clearly classify the target disease. The target is to find minimum number of genes whose expression data could classify the disease type with minimum classification error. As we view the genes as features, the whole microarray data is of enormously high dimensional, where expression values of most of the genes are irrelevant to the targeted investigation. Moreover, the number of samples are in tens to a maximum of around hundred. Under such situation, identifying and eliminating irrelevant genes is of utmost importance. In this paper, we present a two stage reduction. In Stage 1, the number of genes are reduced from thousands to around hundred. We propose a new algorithm for Stage 1 reduction phase. In Stage 2, the number of selected genes are only a few. We proposed two ways two ways to achieve that optimization, one based on artificial neural network and the other using genetic algorithm.
Goutam Chakraborty is a Professor and Head of the Intelligent Informatics Laboratory, Department of the Software and Information Science, Iwate Prefectural University, Takizawa, Japan. His main research interests are soft computing algorithms and their applications to solve pattern recognition, prediction, scheduling and optimization problems including applications in wired and wireless networking problems.