Home   People   Research   Publications    Bioinformatics & Evolutionary Genomics 
bioinformatics separator
 • S U P P L E M E N T A R Y   D A T A • 
bioinformatics separator

A review of feature selection techniques in bioinformatics.

Saeys, Y., Inza, I. & Larrañaga, P.

Abstract

Feature selection techniques have become an apparent need in many bioinformatics applications. In addition to the large pool of techniques that have already been developped in the machine learning and data mining fields, specific applications in bioinformatics have led to a wealth of newly proposed techniques.
In this paper, we make the interested reader aware of the possibilities of feature selection, providing a basic taxonomy of feature selection techniques, and discussing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications.

FS Review → Sorted by year

  1975

• J. Kittler (1978) "Feature set search algorithms." In Pattern Recognition and Signal Processing, Sijthoff and Noordhoff, Alphen aan den Rijn, Netherlands

  1988

• W. Siedelecky and J. Sklansky (1988) "On automatic feature selection." International Journal of Pattern Recognition, Vol. 2, pages 197-220

  1993

• F.J. Ferri and P. Pudil and M. Hatef and J. Kittler (1994) "Comparative study of techniques for large-scale feature selection." In Pattern Recognition in Practice IV, Multiple Paradigms, Comparative Studies and Hybrid Systems, Elsevier

• D.B. Skalak (1994) "Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms." In Proceedings of the Eleventh International Conference on Machine Learning, pages 293-301.

  1996

• R. Kohavi and G. John (1997) "Wrappers for feature subset selection." Artificial Intelligence, Vol. 97, Nr. 1-2, pages 273-324

• F.J. Provost and T. Fawcett (1997) "Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions." In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pages 43-48.

  1998

• N.A. Chuzhanova and A.J. Jones and S. Margetts (1998) "Feature selection for genetic sequence classification." Bioinformatics, Vol. 14, Nr. 2, pages 139-143

• H. Liu and H. Motoda (1998) "Feature Selection for Knowledge Discovery and Data Mining." , Kluwer Academic Publishers

• S.L. Salzberg and A. Delcher and S. Kasif and O. White (1998) "Microbial gene identification using interpolated Markov models." Nucleic Acids Research, Vol. 26, pages 544-548

  1999

• A.L. Delcher and D. Harnon and S. Kasif and O. White and S.L. Salzberg (1999) "Improved microbial gene identification with GLIMMER." Nucleic Acids Research, Vol. 27, pages 4636-4641

• R. Etxebarria and P. Larrañaga (1999) "Global optimization with Bayesian networks." In II Symposium on Artificial Intelligence. CIMAF99. Special Session on Distributions and Evolutionary Optimization, pages 332-339.

• M.A. Hall (1999) "Correlation-based Feature Selection for Machine Learning." PhD thesis. University of Waikato.

• I. Inza and P. Larrañaga and R. Etxebarria and B. Sierra (1999) "Feature subset selection by Bayesian networks based optimization." Artificial Intelligence, Vol. 27, pages 143-164

• T.R. Golub and D.K. Slonim and P. Tamayo and C. Huard and M. Gaasenbeek and J.P. Mesirov and H. Coller and M.L. Loh and J.R. Downing and M.A. Caliguri and C.D. Bloomfield and E.S. Lander (1999) "Molecular classification of cancer: class discovery and class prediction by gene expression monitoring." Science, Vol. 286, pages 531-537

• U. Alon and N. Barkai and D. Notterman, K. Gish and S. Ybarra and D. Mack and A.J. Levine (1999) "Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays." In Proceedings of the National Academy of Sciences, USA, pages 6745-6750.

• H. Zhang and C.Y. Yu and B. Singer and M. Xiong (1999) "Recursive partitioning for tumor classification with gene expression microarray data." In Proceedings of the National Academy of Sciences, USA, pages 6730-6735.

  2000

• M. Beibel (2000) "Selection of informative genes in gene expression based diagnosis: a nonparametric approach." In Proceedings of the First International Symposium on Medical Data Analysis. Lecture Notes in Computer Science, pages 300-307.

• D.T. Ross and U. Scherf and M.B. Eisen and C.M. Perou and C. Rees and P. Spellman and V. Iyer and S.S. Jeffrey and M. Van de Rijn and M. Waltham and A. Pergamenschikov and J.C.F. Lee and D. Lashkari and D. Shalon and T.G. Myers and J.N. Weinstein and D. Botstein and P.O. Brown (2000) "Systematic variation in gene expression patterns in human cancer cell lines." Nature Genetics, Vol. 24, Nr. 3, pages 227-234

• A. Ben-Dor and L. Bruhn and N. Friedman and I. Nachman and M. Schummer and Z. Yakhini (2000) "Tissue classification with gene expression profiles." Journal of Computational Biology, Vol. 7, Nr. 3-4, pages 559-584

• T.S. Furey and N. Cristianini and N. Duffy and D.W. Bednarski and M. Schummer and D. Haussler (2000) "Support vector machine classification and validation of cancer tissue samples using microarray expression data." Bioinformatics, Vol. 16, Nr. 10, pages 906-914

• S. Raychaudhuri and J.M. Stuart and R.B. Altman (2000) "Principal component analysis to summarize microarray experiments: application to sporulation time series." In Pacific Symposium on Biocomputing 5, pages 452-463.

• N. Friedman and M. Linial and I. Nachman and D. Pe'er (2000) "Using Bayesian networks to analyze expression data." Journal of Computational Biology, Vol. 7, pages 601-620

• F.M. Stefanini and A. Camussi (2000) "The reduction of large molecular profiles to informative components using a genetic algorithm." Bioinformatics, Vol. 16, pages 923-931

• I. Inza and P. Larrañaga and R. Etxebarria and B. Sierra (2000) "Feature Subset Selection by Bayesian networks based optimization." Artifical Intelligence, Vol. 123, Nr. 1-2, pages 157-184

  2001

• M.K. Kerr and G.A. Churchill (2001) "Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments." In Proceedings of the National Academy of Sciences, pages 8961-8965.

• R. Tibshirani and T. Hastie and B. Narasimhan and G. Chu (2001) "Diagnosis of multiple cancer types by shrunken centroids of gene expression." In Proceedings of the National Academy of Sciences, pages 6567-6572.

• P.O. Duda and P.E. Hart and D.G. Stork (2001) "Pattern Classification." , Wiley, New York

• I. Inza and M. Merino and P. Larrañaga and J. Quiroga and B. Sierra and M. Girala (2001) "Feature subset selection by genetic algorithms and estimation of distribution algorithms - A case study in the survival of cirrhotic patients treated with TIPS." Artificial Intelligence in Medicine, Vol. 23, Nr. 2, pages 187-205

• E. P. Xing and M. I. Jordan and R. M. Karp (2001) "Feature selection for high-dimensional genomic microarray data." In Proceedings of the Eighteenth International Conference on Machine Learning, pages 601-608.

• J. Khan and J.S. Wei and M. Ringér and M. Saal and L.H. Ladanyi and F. Westermann and F. Berthold and M. Schwab and C.R. Antonescu and C. Peterson and P.S. Meltzer (2001) "Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks." Nature Medicine, Vol. 7, Nr. 6, pages 673-679

• D. Pe'er and A. Regev and G. Elidan and N. Friedman (2001) "Inferring Subnetworks from Perturbed Expression Profiles." Bioinformatics, Vol. 17, pages 215-224

• M. Xiong and Z. Fang and J. Zhao (2001) "Biomarker identification by feature wrappers." Genome Research, Vol. 11, pages 1878-1887

• P. Baldi and A.D. Long (2001) "A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes." Bioinformatics, Vol. 17, Nr. 6, pages 509-516

• J.G. Thomas and J.M. Olson and S.J. Tapscott and L.P. Zhao (2001) "An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles." Genome Research, Vol. 11, pages 1227-1236

• M.A. Newton and C.M. Kendziorski and C.M. Richmond and C.S. Blattner and K.W. Tsui (2001) "On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data." Journal of Computational Biology, Vol. 8, pages 37-52

• B. Efron and R. Tibshirani and J.D. Storey and V. Tusher (2001) "Empirical Bayes analysis of a microarray experiment." Journal of the American Statistical Association, Vol. 96, Nr. 456, pages 1151-1160

• P.J. Park and M. Pagano and M. Bonetti (2001) "A nonparametric scoring algorithm for identifying informative genes from microarray data." Pacific Symposium on Biocomputing, Vol. 6, pages 52-63

• V.G. Tusher and R. Tibshirani and G. Chu (2001) "Significance analysis of microarrays applied to ionizing radiation response." In Proceedings of the National Academy of Sciences, pages 5116-5121.

• L. Li and C.R. Weinberg and T.A. Darden and L.G. Pedersen (2001) "Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method." Bioinformatics, Vol. 17, Nr. 12, pages 1131-1142

• E.R. Dougherty (2001) "Small sample issues for microarray-based classification." Comparative and Functional Genomics, Vol. 2, Nr. 1, pages 28-34

• L. Kruglyak and D. A. Nickerson (2001) "Variation in the spice of life." Nature Genetics, Vol. 27, pages 234-236

• M. Daly and J.D. Rioux and S.F. Schaffner and T.J. Hudson and E.S. Lander (2001) "High-resolution haplotype structure in the human genome." Nature Genetics, Vol. 29, pages 229-232

  2002

• T. Li and C. Zhang and M. Ogihara (2002) "A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments." Bioinformatics, Vol. 18, Nr. 4, pages 546-554

• V. Roth (2002) "The generalized LASSO: a wrapper approach to gene selection for microarray data." Technical Report IAI-TR-2002-8, Department of Computer Science III, University of Bonn.

• S. Degroeve and B. De Baets and Y. Van de Peer and P. Rouzé (2002) "Feature subset selection for splice site prediction." Bioinformatics, Vol. 18, pages 75-83

• I. Guyon and J. Weston and S. Barnhill and V. Vapnik (2002) "Gene Selection for Cancer Classification using Support Vector Machines." Machine Learning, Vol. 46, Nr. 1-3, pages 389-422, Kluwer Academic Publishers, Boston

• S. Keles and M. van der Laan and M.B. Eisen (2002) "Identification of regulatory elements using a feature selection method." Bioinformatics, Vol. 18, Nr. 9, pages 1167-1175

• N. Zavaljevsky and F.J. Stevens and J. Reifman (2002) "Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions." Bioinformatics, Vol. 18, Nr. 5, pages 689-696

• L. Li and L.G. Pedersen and T.A. Darden and C.R. Weinberg (2002) "Computational analysis of leukemia microarray expression data using the GA/KNN method." In Methods of Microarray Data Analysis. First Conference on Critical Assessment of Microarray Data Analysis, CAMDA2000, pages 81-96.

• W. Li and Y. Yang (2002) "How many genes are needed for a discriminant microarray data analysis?." In Methods of Microarray Data Analysis. First Conference on Critical Assessment of Microarray Data Analysis, CAMDA2000, pages 137-150.

• W. Li and Y. Yang (2002) "How many genes are needed for a discriminant microarray data analysis?." In Methods of Microarray Data Analysis. First Conference on Critical Assessment of Microarray Data Analysis, CAMDA2000, pages 137-150.

• T.H. Bø and I. Jonassen (2002) "New feature subset selection procedures for classification of expression profiles." Genome Biology, Vol. 3, Nr. 4

• L.R. Grate and C. Bhattacharyya and M.I. Jordan and I.S. Mian (2002) "Simultaneous relevant feature identification and classification in high-dimensional spaces." In Workshop on Algorithms in Bioinformatics, WABI 2002

• J.B. Tobler and M.N. Molla and J.W. Shavlik and E. Nuwaysir and R. Green (2002) "Evaluating machine learning approaches for aiding probe selection for gene-expression arrays." Bioinformatics, Vol. 18, pages S164-S171

• D.V. Nguyen and D.M. Rocke (2002) "Tumor classification by partial least squares using microarray gene expression data." Bioinformatics, Vol. 18, Nr. 1, pages 39-50

• S. Dudoit and J. Fridlyand and T.P. Speed (2002) "Comparison of discriminant methods for the classification of tumors using gene expression data." Journal of the American Statistical Association, Vol. 97, Nr. 457, pages 77-87

• L.J. van 't Veer and H. Daiand M.J. van de Vijver and Y.D. He and A.A.M. Hart and M. Mao and H.L. Peterse and K. van der Kooy and M.J. Marton and A.T. Witteveen and R.M. Schreiber and R.M. Kerkhoven and C. Roberts and P.S. Linsley and R. Bernards and S.H. Friend (2002) "Gene expression profiling predicts clinical outcome of breast cancer." Nature, Vol. 415, pages 530-535

• O.G. Troyanskaya and M.E. Garber and P.O. Brown and D. Bolstein and R.B. Altman (2002) "Nonparametric methods for identifying differentially expressed genes in microarray data." Bioinformatics, Vol. 18, Nr. 11, pages 1454-1461

• J.D. Storey (2002) "A direct approach to false discovery rates." Journal of the Royal Statistical Society. Series B, Vol. 64, pages 479-498

• E.J. Yeoh and M.E. Ross and S.A. Shurtleff and W.K. Williams and D. Patel and R. Mahfouz and F.G. Behm and S.C. Raimondi and M.V. Relling and A. Patel and Patel A and C. Cheng and D. Campana and D. Wilkins and X. Zhou and J. Li and H. Liu and C.H. Pui and W.E. Evans and C. Naeve and L. Wong and J.R. Downing (2002) "Classification, subtype discovery, and prediction of outcome in pediatric lymphoblastic leukemia by gene expression profiling." Cancer Cell, Vol. 1, pages 133-143

• C. Ambroise and G.J. McLachlan (2002) "Selection bias in gene extraction on the basis of microarray gene-expression data." In Proceedings of the National Academy of Sciences, pages 6562-6566.

• E.F. Petricoin and A.M. Ardekani and B.A. Hitt and P.J. Levine and V.A. Fusaro and S.M. Steinberg and G.B. Mills and C. Simone and D.A. Fishman and E.C. Kohn and L.A. Liotta (2002) "Use of proteomics patterns in serum to identify ovarian cancer." The Lancet, Vol. 359, Nr. 9306, pages 572-577

• G. Ball and S. Mian and F. Holding and R.O. Allibone and J. Lowe and S. Ali and G. Li and S. McCardle and I.O. Ellis and C. Creaser and R.C. Rees (2002) "An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers." Bioinformatics, Vol. 18, Nr. 3, pages 395-404

• H. Liu and J. Li and L. Wong (2002) "A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns." Genome Informatics, Vol. 13, pages 51-60

• S.B. Gabriel and S.F. Schaffner and H. Nguyen and J.M. Moore and J. Roy and B. Blumenstiel and J. Higgins and M. DeFelice and A. Lochner and M. Faggart and S.N. Liu-Cordero and C. Rotimi and A. Adeyemo and R. Cooper and R. Ward and E.S. Lander and M.J. Daly and D. Altshuler (2002) "The structure of haplotype blocks in the human genome." Science, Vol. 296, pages 2225-2229

  2003

• Y. Zhao and W. Pan (2003) "Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments." Bioinformatics, Vol. 19, Nr. 9, pages 1046-1054

• S. Canu and Y. Grandvalet and A. Rakotomamonjy (2003) "SVM and Kernel Methods Matlab Toolbox." In Perception Systèmes et Information, INSA de Rouen, Rouen, France

• W. Daelemans and V. Hoste and F. De Meulder and B. Naudts (2003) "Combined Optimization of Feature Selection and Algorithm Parameter Interaction in Machine Learning of Language." In Proceedings of the 14th European Conference on Machine Learning (ECML-2003), pages 84-95.

• P.B. Dobrokhotov and C. Goutte and A.L. Veuthey and E. Gaussier (2003) "Combining NLP and probabilistic categorisation for document and term selection for Swiss-Prot medical annotation." Bioinformatics, Vol. 19, pages 91-94

• G. Forman (2003) "An Extensive Empirical Study of Feature Selection Metrics for Text Classification." Journal of Machine Learning Research, Vol. 3, pages 1289-1305

• Y. Saeys and S. Degroeve and D. Aeyels and Y. Van de Peer and P. Rouzé (2003) "Fast feature selection using a simple Estimation of Distribution Algorithm: a case study on splice site prediction." Bioinformatics, Vol. 19, Nr. 2, pages 179-188

• S. Sinha (2003) "Discriminative motifs." Journal of Computational Biology, Vol. 10, Nr. 3-4, pages 599-615

• Y. Su and T.M. Murali and V. Pavlovic and M. Schaffer and S. Kasif (2003) "RankGene: identification of diagnostic genes based on expression data." Bioinformatics, Vol. 19, Nr. 12, pages 1587-1579

• W. Pan (2003) "On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression." Bioinformatics, Vol. 19, Nr. 11, pages 1333-1340

• W. Pan and J. Lin and C. Le (2003) "A mixture model approach to detecting differentially expressed genes with microarray data." In Functional and Integrative Genomics, Vol. 3, Nr. 3, pages 117-124

• S. Dudoit and J.P. Shaffer and J.C. Boldrick (2003) "Multiple hypothesis testing in microarray experiments." Statistical Science, Vol. 18, pages 7-103

• C. Ding and H. Peng (2003) "Minimum redundancy feature selection from microarray gene expression data." In Proceedings of the IEEE Conference on Computational Systems Bioinformatics, pages 523-528.

• K.Y. Yeung and R.E. Bumgarner (2003) "Multiclass classification of microarray data with repeated measurements: application to cancer." Genome Biology, Vol. 4, Nr. 12, pages R83

• C.H. Ooi and P. Tan (2003) "Genetic algorithms applied to multi-class prediction for the analysis of gene expression data." Bioinformatics, Vol. 19, Nr. 1, pages 37-44

• J.M. Deutsch (2003) "Evolution algorithms for finding optimal gene sets in microarray prediction." Bioinformatics, Vol. 19, Nr. 1, pages 45-52

• K.E. Lee and N. Sha and E.R. Dougherty and M. Vannucci and B.N. Mallick (2003) "Gene selection: a Bayesian variable selection approach." Bioinformatics, Vol. 19, Nr. 1, pages 90-97

• E.F. Petricoin and L.A. Liotta (2003) "Mass spectometry-based diagnostic: the upcoming revolution in disease detection." Clinical Chemistry, Vol. 49, Nr. 4, pages 533-534

• R.L. Somorjai and B. Dolenko and R. Baumgartner (2003) "Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions." Bioinformatics, Vol. 19, Nr. 12, pages 1484-1491

• B. Wu and T. Abbott and D. Fishman and W. McMurray and G. Mor and K. Stone and D. Ward and K. Williams and H. Zhao (2003) "Comparison of statistical methods for classification of ovarian cancer using mass spectometry data." Bioinformatics, Vol. 19, Nr. 13, pages 1636-1643

  2004

• I. Inza and P. Larrañaga and R. Blanco and A.J. Cerrolaza (2004) "Filter versus wrapper gene selection approaches in DNA microarray domains." Artificial Intelligence in Medicine, Vol. 31, Nr. 2, pages 91-103

• T.K. Paul and H. Iba (2004) "Identification of informative genes for molecular classification using probabilistic model building genetic algorithms." In Proceedings of the Genetic and Evolutionary Computation Conference, pages 414-425.

• G. Weber and S. Vinterbo and L. Ohno-Machado (2004) "Multivariate selection of genetic markers in diagnostic classification." Artificial Intelligence in Medicine, Vol. 31, pages 155-167

• H. Liu and H. Han and J. Li and L. Wong (2004) "Using amino acid patterns to accurately predict translation initiation sites." In Silico Biology, Vol. 4, Nr. 3, pages 255-269

• Y. Saeys and S. Degroeve and D. Aeyels and P. Rouzé and Y. Van de Peer (2004) "Feature selection for splice site prediction: A new method using EDA-based feature ranking." BMC Bioinformatics, Vol. 5, Nr. 1, pages 64

• Y. Saeys (2004) "Feature selection for classification of nucleic acid sequences." PhD thesis, Ghent University.

• G.K. Smyth (2004) "Linear models and empirical Bayes methods for assessing differential expression in microarray experiments." Statistical Applications in Genetics and Molecular Biology, Vol. 3, Nr. 1, pages Article 3

• M.G. Tadesse and M. Vannucci and P. Lio (2004) "Identification of DNA regulatory motifs using Bayesian variable selection." Bioinformatics, Vol. 20, Nr. 16, pages 2553-2561

• L. Yu and H. Liu (2004) "Efficient feature selection via analysis of relevance and redundancy." Journal of Machine Learning Research, Vol. 5, Nr. (Oct), pages 1205-1224

• T. Li and C. Zhang and M. Ogihara (2004) "A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression." Bioinformatics, Vol. 20, Nr. 15, pages 2429-2437

• R. Breitling and P. Armengaud and A. Amtmann and P. Herzyk (2004) "Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments." FEBS Letters, Vol. 573, pages 83-92

• J. Lyons-Weiler and S. Patel and M.J. Becich and T.E. Godfrey (2004) "Tests for finding complex patterns of differential expression in cancers: towards individualized medicine." BMC Bioinformatics, Vol. 5, Nr. 110

• S. Pounds and C. Cheng (2004) "Improving false discovery rate estimation." Bioinformatics, Vol. 20, Nr. 11, pages 1737-1754

• H. Jiang and Y. Deng and H.-S. Cheng and L. Tao and Q. Sha and J. Chen and C.-J. Tsai and S. Zhang (2004) "Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes." BMC Bioinformatics, Vol. 5, Nr. 81

• I. Inza and P. Larrañaga and R. Blanco and A.J. Cerrolaza (2004) "Filter versus wrapper gene selection approaches in DNA microarray domains." Artificial Intelligence in Medicine, Vol. 31, Nr. 2, pages 91-103

• R. Blanco and P. Larrañaga and I. Inza and B. Sierra (2004) "Gene selection for cancer classification using wrapper approaches." International Journal of Pattern Recognition and Artificial Intelligence, Vol. 18, Nr. 8

• U.M. Braga-Neto and E.R. Dougherty (2004) "Is cross-validation valid for small-sample microarray classification?." Bioinformatics, Vol. 20, Nr. 3, pages 374-380

• G. Weber and S. Vinterbo and L. Ohno-Machado (2004) "Multivariate selection of genetic markers in diagnostic classification." Artificial Intelligence in Medicine, Vol. 31, pages 155-167

• L. Li and D.M. Umbach and P. Terry and J.A. Taylor (2004) "Applications of the GA/KNN method to SELDI proteomics data." Bioinformatics, Vol. 20, Nr. 10, pages 1638-1640

• R. Tibshirani and T. Hastie and B. Narasimhan and S. Soltys and G. Shi and A. Koong and Q-T. Le (2004) "Sample classification from protein mass spectrometry, by `peak probability contrast'." Bioinformatics, Vol. 20, Nr. 17, pages 3034-3044

• J. Prados and A. Kalousis and J-C. Sánchez and L. Allard and O. Carrette and M. Hilario (2004) "Mining mass-spectra for diagnosis and biomarker discovery of cerebral accidents." Proteomics, Vol. 4, Nr. 8, pages 2320-2332

• K. Jong and E. Marchiori and M. Sebag and A. van der Vaart (2004) "Feature selection in proteomic pattern data with support vector machines." In Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pages 41-48.

• L. Li and D.M. Umbach and P. Terry and J.A. Taylor (2004) "Application of the GA/KNN method to SELDI proteomics data." Bioinformatics, Vol. 20, Nr. 10, pages 1638-1640

• C.S. Carlson and M.A. Eberle and M.J. Rieder and Q. Yi and L. Kruglyak and D.A. Nickerson (2004) "Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium." American Journal of Human Genetics, Vol. 74, pages 106-120

• Z. Lin and R. B. Altman (2004) "Finding haplotype tagging SNPs by use of principal components analysis." American Journal of Human Genetics, Vol. 73, pages 850-861

• S. Lee and C. Kang (2004) "CHOISS for selection on single nucleotide polymorphism markers on interval regularity." Bioinformatics, Vol. 20, Nr. 4, pages 581-582

• S.C. Shah and A. Kusiak (2004) "Data mining and genetic algorithm based gene/SNP selection." Artificial Intelligence in Medicine, Vol. 31, pages 183-196

  2005

• Z. Guan and H.Zhao (2005) "A semiparametric approach for marker gene selection based on gene expression data." Bioinformatics, Vol. 21, Nr. 4, pages 529-536

• X. Yan and M. Deng and W.K. Fung and M. Qian (2005) "Detecting differentially expressed genes by relative entropy." Journal of Theoretical Biology, Vol. 234, Nr. 3, pages 395-402

• J.J. Liu and G. Cutler and W. Li and Z. Pan and S. Peng and T. Hoey and L. Chen and X.B. Ling (2005) "Multiclass cancer classification and biomarker discovery using GA-based algorithms." Bioinformatics, Vol. 21, Nr. 11, pages 2691-2697

• X. Liu and A. Krishnan and and A. Mondry (2005) "An entropy-based gene selection method for cancer classification using microarray data." BMC Bioinformatics, Vol. 6, Nr. 76

• A. Al-Shahib and R. Breitling and D. Gilbert (2005) "Feature selection and the class imbalance problem in predicting protein function from sequence." Applied Bioinformatics, Vol. 4, Nr. 3, pages 195-203

• L. Buturovic (2005) "PCP: a program for supervised classification of gene expression profiles." Bioinformatics, Vol. 22, Nr. 2, pages 245-247

• A.M. Cohen and W.R. Hersch (2005) "A survey of current work in biomedical text mining." Briefings in Bioinformatics, Vol. 6, Nr. 1, pages 57-71

• P.C. Conilione and D. Wang (2005) "A Comparative Study on Feature Selection for E.coli Promoter Recognition." International Journal of Information Technology, Vol. 11, pages 54-66

• N. Dean and A.E. Raftery (2005) "Normal uniform mixture differential gene expression detection in cDNA microarrays." BMC Bioinformatics, Vol. 6, Nr. 173

• H. Liu and L. Yu (2005) "Toward Integrating Feature Selection Algorithms for Classification and Clustering." IEEE Transactions on Knowledge and Data Engineering, Vol. 17, Nr. 4, pages 491-502

• S. Scheid and R. Spang (2005) "twilight; a Bioconductor package for estimating the local false discovery rate." Bioinformatics, Vol. 21, Nr. 12, pages 2921-2922

• Witten,I.H. and Frank,E. (2005) "Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition." , Morgan Kaufmann, San Francisco

• A. Statnikov and C.F. Aliferis and I. Tsamardinos and D. Hardin and S. Levy (2005) "A comprensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis." Bioinformatics, Vol. 21, Nr. 5, pages 631-643

• J.W. Lee and J.B. Loo and M. Park and S.H. Song (2005) "An extensive comparison of recent classification tools applied to microarray data." Computational Statistics and Data Analysis, Vol. 48, pages 869-885

• Y. Wang and I.V. Tetko and M.A. Hall and E. Frank and A. Facius and K.F.X. Mayer and H.W. Mewes (2005) "Gene selection from microarray data for cancer classification - a machine learning approach." Computational Biology and Chemistry, Vol. 29, pages 37-46

• K.Y. Yeung and R.E. Bumgarner and A.E. Raftery (2005) "Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data." Bioinformatics, Vol. 21, Nr. 10, pages 2394-2402

• Y.H. Yang and Y. Xiao and M.R. Segal (2005) "Identifying differentially expressed genes from microarray experiments via statistic synthesis." Bioinformatics, Vol. 21, Nr. 7, pages 1084-1093

• T. Jirapech-Umpai and S. Aitken (2005) "Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes." BMC Bioinformatics, Vol. 6, Nr. 148

• S. Ma and J. Huang (2005) "Regularized ROC method for disease classification and biomarker selection with microarray data." Bioinformatics, Vol. 21, Nr. 24, pages 4356-4362

• C. Sima and U.M. Braga-Neto and E.R. Dougherty (2005) "Superior feature-set ranking for small samples using bolstered error estimation." Bioinformatics, Vol. 21, Nr. 7, pages 1046-1054

• A. Molinaro and R. Simon and R.M. Pfeiffer (2005) "Prediction error estimation: a comparison of resampling methods." Bioinformatics, Vol. 21, Nr. 15, pages 3301-3307

• D. Ghosh and M. Chinnaiyan (2005) "Classification and Selection of Biomarkers in Genomic Data Using LASSO." Journal of Biomedicine and Biotechnology, Vol. 2005, Nr. 2, pages 147-154

• I. Levner (2005) "Feature selection and nearest centroid classification for protein mass spectrometry." BMC Bioinformatics, Vol. 6, Nr. 68

• J.S. Yu and S. Ongarello and R. Fiedler and X.W. Chen and G. Toffolo and C. Cobelli and Z. Trajanoski (2005) "Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectometry data." Bioinformatics, Vol. 21, Nr. 12, pages 2200-2209

• P. Geurts and M. Fillet and D. de Seny and M-A. Meuwis and M. Malaise and M-P. Merville and L. Wehenkel (2005) "Proteomic mass spectra classification using decision tree based ensemble methods." Bioinformatics, Vol. 21, Nr. 15, pages 3138-3145

• H.W. Ressom and R.S. Varghese and M. Abel-Hamid and S. Abdel-Latif Eissa and D. Saha and L. Goldman and E.F. Petricoin and T.P. Conrads and T.D. Veenstra and C.A. Loffredo and R. Goldman (2005) "Analysis of mass spectral serum profiles for biomarker selection." Bioinformatics, Vol. 21, Nr. 21, pages 4039-4045

• J.S. Yu and X.W. Chen (2005) "Bayesian neural network approaches to ovarian cancer identification from high-resolution mass spectometry data." Bioinformatics, Vol. 21, Nr. Suppl., pages i487-i494

• X. Li and S. Rao and W.Zhang and G. Zheng and W. Jiang and L. Du (2005) "Large-scale ensemble decision analysis of sib-pair IBD profiles for identification of the relevant molecular signatures for alcoholism." In Lecture Notes in Computer Science 3614, pages 1184-1189., Springer

• B. Gong and Z. Guo and J. Li and G. Zhu and S. Lv and S. Rao and X. Li (2005) "Application of genetic algorithm -- support vector machine hybrid for prediction of clinical phenotypes based on geneome-wide SNP profiles of sib pairs." In Lecture Notes in Computer Science 3614, pages 830-835., Springer

• E. Halperin and G. Kimmel and R. Shamir (2005) "Tag SNP selection in genotype data for maximizing SNP prediction accuracy." Bioinformatics, Vol. 21, Nr. suppl., pages i195-203

  2006

• K. Yang and Z. Cai and J. Li and G. Lin (2006) "A stable gene selection in microarray data analysis." BMC Bioinformatics, Vol. 7, Nr. 228

• T-C. Lin and R-S. Liu and C-Y. Chen and Y-T. Chao and S-Y. Chen (2006) "Pattern classification in DNA microarray data of multiple tumor types." Pattern Recognition, Vol. 39, Nr. 12, pages 2426-2438.

• S. Niijima and S. Kuhara (2006) "Gene subset selection in kernel-induced feature space." Pattern Recognition Letters, Vol. 27, pages 1884-1892

• S. Datta and L.M. DePadilla (2006) "Feature selection and machine learning with mass spectrometry data for distinguishing cancer and non-cancer samples." Statistical Methodology, Vol. 3, Nr. 1, pages 79-92

• B. Han and Z. Obradovic and Z.Z. Hu and C.H. Wu and S. Vucetic (2006) "Substring selection for biomedical document classification." Bioinformatics, Vol. 22, Nr. 17, pages 2136-2142

• L.J. Jensen and J. Saric and P. Bork (2006) "Literature mining for the biologist: from information retrieval to biological discovery." Nature Reviews Genetics, Vol. 7, Nr. 2, pages 119-129

• S.K. Kim and J.W. Nam and J.K. Rhee and W.J. Lee and B.T. Zhang (2006) "miTarget: microRNA target gene prediction using a support vector machine." BMC Bioinformatics, Vol. 7, Nr. 411

• J.T. Leek and E. Monsen and A.R. Dabney and J.D. Storey (2006) "EDGE: extraction and analysis of differential gene expression." Bioinformatics, Vol. 22, Nr. 4, pages 507-508

• T.M. Phuong and Z. Lin and R.B. Altman (2006) "Choosing SNPs using feature selection." Journal of Bioinformatics and Computational Biology, Vol. 4, Nr. 2, pages 241-57

• V. Trevino and F. Falciani (2006) "GALGO: an R package for multivariate variable selection using genetic algorithms." Bioinformatics, Vol. 22, Nr. 9, pages 1154-1156

• P. Jafari and F. Azuaje (2006) "An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors." BMC Medical Informatics and Decision Making, Vol. 6, Nr. 1, pages 27

• P. Pavlidis and P. Poirazi (2006) "Individualized markers optimize class prediction of microarray data." BMC Bioinformatics, Vol. 7, Nr. 1, pages 345

• S. Niijima and S. Kuhara (2006) "Gene subset selection in kernel-induced feature space." Pattern Recognition Letters, Vol. 27, pages 1884-1892

• R. Díaz-Uriarte and S. Alvarez de Andrés (2006) "Gene selection and classification of microarray data using random forest." BMC Bioinformatics, Vol. 7, Nr. 3

• H. Mamitsuka (2006) "Selecting features in microarray classification using ROC curves." Pattern Recognition, Vol. 39, pages 2393-2404

• O. Gevaert and F. De Smet and D. Timmerman and Y. Moreau and B. De Moor (2006) "Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks." Bioinformatics, Vol. 22, Nr. 14, pages e184-e190

• R. Ruiz and J.C. Riquelme and J.S. Aguilar-Ruiz (2006) "Incremental wrapper-based gene selection from microarray data for cancer classification." Pattern Recognition, Vol. 39, pages 2383-2392

• M.S. Pepe and T. Cai and G. Longton (2006) "Combining predictors for classification using the area under the ROC curve." Biometrics, Vol. 62, Nr. 1, pages 221–229

• R.J. Fox and M.W. Dimmic (2006) "A two-sample Bayesian t-test for microarray data." BMC Bioinformatics, Vol. 7, Nr. 1, pages 126

• C. Sima and E.R. Dougherty (2006) "What should be expected from feature selection in small-sample settings." Bioinformatics, Vol. 22, Nr. 19, pages 2430-2436

• M. Hilario and A. Kalousis and C. Pellegrini and M. Muller (2006) "Processing and classification of protein mass spectra." Mass Spectometry Reviews, Vol. 25, Nr. 3, pages 409-449

• H. Shin and M.K. Markey (2006) "A machine learning perspective on the development of clinical decision support systems utilizing mass spectra of blood samples." Journal of Biomedical Informatics, Vol. 39, pages 227-248

• G. Bhanot and G. Alexe and B. Venkataraghavan and A.J. Levine (2006) "A robust meta-classification strategy for cancer detection from MS data." Proteomics, Vol. 6, Nr. 2, pages 592-604

• X. Zhang and X. Liu and Q. Shi and X-Q. Xu and H-C.E. Leung and L.N. Harris and J.D. Iglehart and A. Miron and J.S. Liu and W.H. Wong (2006) "Recursive SVM feature selection and sample classification for mass-spectometry and microarray data." BMC Bioinformatics, Vol. 7, Nr. 197

• A. Ploner and S. Calza and Arief Gusnanto and Yudi Pawitan (2006) "Multidimensional local false discovery rate for microarray studies." Bioinformatics, Vol. 22, Nr. 5, pages 556-565

• J. Gould and G. Getz and S. Monti and M. Reich and J.P. Mesirov (2006) "Comparative gene marker selection suite." Bioinformatics, Vol. 22, Nr. 15, pages 1924-1925

• R. Varshavsky and A. Gottlieb and M. Linial and D. Horn (2006) "Novel Unsupervised Feature Filtering of Biological Data." Bioinformatics, Vol. 22, Nr. 14, pages e507-513

• P. H. Lee and H. Shatkay (2006) "BNTagger: improved tagging SNP selection using Bayesian networks." Bioinformatics, Vol. 22, Nr. 14, pages e211-e219

• Y. Wang and F. Makedon and J. Pearlman (2006) "Tumor classification based on DNA copy number aberrations determined using SNPS arrays." Oncology Reports, Vol. 5, pages 1057-1059

• J. He and A. Zelikovsky (2006) "MLR-tagging: informative SNP selection for unphased genotypes based on multiple linear regression." Bioinformatics, Vol. 22, Nr. 20, pages 2558-2561

  2007

• S. Ma and X. Song and J. Huang (2007) "Supervised group Lasso with applications to microarray data analysis." BMC Bioinformatics, Vol. 8, Nr. 60

• J.J. Chen and C-A. Tsai and S. Tzeng and C-H. Chen (2007) "Gene selection with multiple ordering criteria." BMC Bioinformatics, Vol. 8, Nr. 47

• S. Shah and A. Kusiak (2007) "Cancer gene search with data-mining and genetic algorithms." Computers in Biology and Medicine, Vol. 37, Nr. 2, pages 251-261

• I. Medina and D. Montaner and J. Tárraga and J. Dopazo (2007) "Prophet, a web-based tool for class prediction using microarray data." Bioinformatics, Vol. 23, Nr. 3, pages 390-391

• Y. Saeys and P. Rouzé and Y. Van de Peer (2007) "In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi, and protists." Bioinformatics, Vol. 23, Nr. 4, pages 414-420

• M. Xiong and X. Fang and J. Zhao (2007) "Biomarker identification by feature wrappers." Genome Research, Vol. 11, Nr. 11, pages 1878-1887

• H.W. Ressom and R.S. Varghese and S.K. Drake and G.L. Hortin and M. Abel-Hamid and C.A. Loffredo and R. Goldman (2007) "Peak selection from MALDI-TOF mass spectra using ant colony optimization." Bioinformatics, Vol. 23, Nr. 5, pages 619-626

• K.R. Coombes and K.A. Baggerly and J.S. Morris (2007) "Pre-processing mass spectometry data." In Fundamentals of Data Mining in Genomics and Proteomics, pages 79-99., Kluwer

• P.C. Sham and S.I. Ao and J.S.H. Kwan and P. Kao and F. Cheung and P.Y. Fong and M.K. Ng (2007) "Combining functional and linkage disequilibrium information in the selection of tag SNPs." Bioinformatics, Vol. 23, Nr. 1, pages 129-131