Abstract
Background: The aim of this study was to apply a new method for se¬lecting a few genes, out of thousands, as plausible markers of a disease. Methods: Hierarchical clustering technique was used along with Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers to select marker-genes of three types of breast cancer. In this method, at each step, one sub¬ject is left out and the algorithm iteratively selects some clusters of genes from the remainder of subjects and selects a representative gene from each cluster. Then, classifiers are constructed based on these genes and the accu¬racy of each classifier to predict the class of left-out subject is recorded. The classifier with higher precision is considered superior. Results: Combining classification techniques with clustering method re-sulted in fewer genes with high degree of statistical precision. Although all classifiers selected a few genes from pre-determined highly ranked genes, the precision did not decrease. SVM precision was 100% with 22 genes instead of 50 genes while the NB resulted in higher precision of 97.95% in this case. When 20 highly ranked genes selected to be fed to the algorithm, same precision was obtained using 6 and 5 genes with SVM and NB clas¬sifiers respectively. Conclusion: Using hybrid method could be effective in choosing fewer number of plausible marker genes so that the classification precision of these markers is increased. In addition, this method enables detecting new plausible markers that their association to disease under study is not bio¬logically proved.