Yvan Saeys

Postdoc
(Group member since 2001)


+ 32 (0) 9 33 13 695
Personal homepage

VIB / Ghent University
Bioinformatics & Systems Biology
Technologiepark 927
B-9052 Gent
BELGIUM


CV

Date of birth: March 20, 1977 (Sint-Niklaas, Belgium).

October 2007 - now: Postdoc (FWO-VLaanderen) "An exploratory study of feature selection techniques for unsupervised learning", Bioinformatics & Evolutionary Genomics, Ghent University/VIB, Belgium.

October 2006 - September 2007: Postdoc (BOF) , Bioinformatics & Evolutionary Genomics, Ghent University/VIB, Belgium.

September 2004 - October 2006: Postdoc (VIB) "Machine learning techniques for automatic genome annotation", Bioinformatics & Evolutionary Genomics, Ghent University/VIB, Belgium.

January 2001 - September 2004: PhD thesis, entitled "Feature selection for classification of nucleic acid sequences", Promotor: Prof. Dr. Yves Van de Peer, co-promotor: Prof. Dr. ir. Dirk Aeyels, Bioinformatics & Evolutionary Genomics, Ghent University/VIB, Belgium.

January 2001 - December 2001: Part-time research scientist in the bioinformatics team of Tibotec (now Johnson & Johnson).

August 2000 - January 2001: Research assistant at the Centre for Evolutionary Language Engineering (CELE, Flanders Language Valley).

October 1996 - June 2000: Master studies in Computer Science, Ghent University, Belgium.


Supervisor: I am currently supervising the PhD theses of Thomas Abeel, Michiel Van Bel, and Sofie Van Landeghem. In the past, I have co-supervised the MSc. thesis of about 5 students majoring in computer science, and 3 students majoring in master of science in engineering.

Reviewer for: Bioinformatics, BMC Bioinformatics, IEEE Transactions on Knowledge and Data Engineering (IEEE-TKDE), IEEE/ACM Transactions on Computational Biology and Bioinformatics (IEEE-TCCB), IEEE Transactions on Evolutionary Computation (IEEE-TEC).

Editorial work: I co-edited the following volumes:

  • Saeys, Y., Tsiporkova, E., De Baets, B. & Van de Peer, Y. (2006) Proceedings of the fifteenth Belgian-Dutch Machine Learning Conference (Benelearn).
  • Nowé, A., Saeys, Y., Tuyls, K., Vanschoenwinkel, K. & Westra, R. (2006) Proceedings of the First Knowledge Discovery and Emergent Complexity in Bioinformatics (KDECB) workshop.
  • Tuyls, K., Westra, R., Saeys, Y. & Nowé, A. (2007) Lecture Notes in Bioinformatics, volume 4366 (KDECB Revised selected papers)

Awards: Our paper "Genome analysis of the smalles free-living eukaryote Ostreococcus tauri unveils unique genome heterogeneity" received the Genopole Languedoc-Roussillon prize for best publication in 2006.

Research

Fundamental research


Feature selection in machine learning

The selection of a subset of relevant features from potentially huge initial feature step is an important topic in machine learning. Sometimes the choice of the feature subset may be even more important than the learning model that is chosen to achieve the best results. My research focuses on selection methods that are able to deal with both (i) large feature sets, and (ii) feature dependencies. A more recent topic of investigation is the use of feature selection for clustering, a non-trivial and challenging topic that is gaining more and more attention from the scientific community.


Modelling gene networks using different sources of data

Modelling the interactions between genes remains a difficult research topic, as often the starting data is quite noisy and it is difficult to evaluate the obtained results. To minimise the amount of error in the results and get to more reliable models, different sources of data need to be combined (sequence data (motifs), expression data, interaction data,...). However, rigorous mathematical techniques to combine and reason with these different types of data are lacking, and hence present a great opportunity for research.


Mathematical models for gene splicing

Gene splicing is a very intricate and tightly regulated process in the cell. However, computational models for recognizing splice sites are still far from being perfect. A particular difficult issue from a machine learing point of view is the large amount of negative examples that occur in genomes. Therefore, additional submodels (e.g. branch point model) should be designed and evaluated to increase overall performance. Another important issue in the context of splicing is the analysis of alternative splicing. Using machine learning techniques, we try to find common patterns that could lead to increase our insight into the process of dectecting alternative splice variants.

Applied research


Feature selection for classification of nucleic acid sequences The application of feature selection to different recognition problems related to gene recognition/genome annotation can provide new biological insights in how some processes work. In addition, looking for a core set of relevant features can improve model robustness and increase classification performance.


Feature selection for promoter prediction

The computational identification of promoter regions on a genomic scale is still in its childhood. To improve the models that are used to locate promoters, one should first have some knowledge about which characteristics differentiate promoter regions from other genomic regions. The application of feature selection techniques can aid in finding new features that are important for promoter modelling.


Gene and genome annotation

Our team is involved in the genome annotation of several organisms. To do this job properly, advanced modelling techniques are needed to find and combine the different signals in the gene. We are developping software for the recognition of the most important gene features (start/stop codon and splice sites), as well as for methods to identify potential protein coding regions (coding potential prediction).


Hardware-based speed up of bioinformatics algorithms

As bioinformatics databases are increasing at an exponential rate, there is a need for fast implementations of very common algorithms (such as alignemnt). In this research project, together with the PARIS research group of the Laboratory of Electronics and Information systems (ELIS) we are experimenting with the implementation of several common bioinformatics algorithms in parallel, using specialised hardware (FPGA).

Teaching

Papers

(55) Abeel, T., Van Parys, T., Saeys, Y., Galagan, J., Van de Peer, Y. (2012) GenomeView: a next-generation genome browser. Nucleic Acids Res. 40:e12.

(54) Van Landeghem, S., De Baets, B., Van de Peer, Y., Saeys, Y. (2011) High-precision bio-molecular event extraction from text using parallel binary classifiers. Computational Intelligence 27: 645-664.

(53) * Fostier, J., * Proost, S., Dhoedt, B., Saeys, Y., Demeester, P., Van de Peer, Y., Vandepoele, K. (2011) A Greedy, Graph-Based Algorithm for the Alignment of Multiple Homologous Gene Lists. Bioinformatics 27, 749-56. *contributed equally

(52) Armañanzas, R., Saeys, Y., Inza, I., Garcia-Torres, M., Bielza, C., Van de Peer, Y., Larrañaga, P. (2011) Peakbin selection in mass spectrometry data using a consensus approach with estimation of distribution algorithms. IEEE-ACM Trans. Comput. Biol. Bioinform. 8:760-74.

(51) Nguyen, H., Couckuyt, I., Saeys, Y., Knockaert, L., Dhaene, T., Gorissen, D. (2011) Avoiding overfitting in surrogate modeling: an alternative approach . Proceedings of the 20th Machine Learning conference of Belgium and The Netherlands 93-94.

(50) Geurts, P., Saeys, Y. (2011) Exploring signature multiplicity using ensembles of randomized trees. . Proceedings of the 5th Machine Learning in Systems Biology conference (MLSB) 24-28.

(49) * Abeel, T., * Van Landeghem, S., Morante, R., Van Asch, V., Van de Peer, Y., Daelemans, W., Saeys, Y. (2010) Highlights of the BioTM 2010 workshop on advances in bio text mining. BMC Bioinformatics 11, I1. *contributed equally

(48) Middag, C., Saeys, Y., Martens, J.-P. (2010) Towards an ASR-free objective analysis of pathological speech. Proceedings of Interspeech 2010 294-297. Makuhari, Chiba, Japan.

(47) * Van Landeghem, S., * Abeel, T., Saeys, Y., Van de Peer, Y. (2010) Discriminative and informative features for biomolecular text mining with ensemble feature selection. Bioinformatics 26, 554-560. *contributed equally

(46) Saeys, Y., Van Landeghem, S., Van de Peer, Y. (2010) Event based text mining for integrated network construction. Journal of Machine Learning Research, Workshop and Conference proceedings 8, 112-121.

(45) Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, C., Saeys, Y. (2010) Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26, 392-398.

(44) Nguyen, H., Knockaert, L., Dhaene, T., Demeester, P., Saeys, Y. (2010) Influence of feature selection on ensemble diversity . Proceedings of the 19th Machine Learning conference of Belgium and The Netherlands 7-8.

(43) Verbiest, N., Cornelis, C., Saeys, Y. (2009) Valued constraint satisfaction problems applied to functional harmony. IFSA-EUSFLAT 2009 925-30.

(42) Saeys, Y., Van Landeghem, S., Van de Peer, Y. (2009) Integrated network construction using event based text mining. Proceedings of the 3rd Machine Learning in Systems Biology workshop (MLSB) 105-114.

(41) Abeel, T., Van de Peer, Y., Saeys, Y. (2009) Towards a gold standard for promoter prediction evaluation. Bioinformatics 25, i313-i320.

(40) Abeel, T., Van de Peer, Y., Saeys, Y. (2009) Java-ML: a machine learning library. Journal of Machine Learning Research. 10, 931-4.

(39) Van Landeghem, S., Saeys, Y., De Baets, B., Van de Peer, Y. (2009) Analyzing text in search of bio-molecular events: a high-precision machine learning framework. Proceedings of Natural Language Processing in Biomedicine (BioNLP) NAACL 2009 Workshop 128-136.

(38) Bonet, I., Rodríguez, A., Grau, R., Lorenzo, M., Saeys, Y., Nowe, A. (2008) Comparing distance measures with visual methods. 7th Mexican International Conference on Artificial Intelligence (MICAI). 90-99.

(37) Van Landeghem, S., Saeys, Y., De Baets, B., Van de Peer, Y. (2008) Extracting protein-protein interactions from text using rich feature vectors and feature selection. Proceedings of Third International Symposium on Semantic Mining in Biomedicine (SMBM 08) 77-84.

(36) Saeys, Y., Abeel, T., Van de Peer, Y. (2008) Robust Feature Selection using Ensemble Feature Selection Techniques. Proceedings of ECML/PKDD 5212, 313-25.

(35) Armañanzas, R., Inza, I., Santana, R., Saeys, Y., Flores, J., Lozano, J., Van de Peer, Y., Blanco, R., Robles, P., Bielza, C., Larrañaga, P. (2008) A review of estimation of distribution algorithms in bioinformatics. BioData Mining 1, 6.

(34) Van Bel, M., Saeys, Y., Van de Peer, Y. (2008) FunSiP : A Modular and Extensible Classifier for the Prediction of Functional Sites in DNA. Bioinformatics 24, 1532-3.

(33) Abeel, T., Saeys, Y., Rouzé, P., Van de Peer, Y. (2008) ProSOM: Core promoter prediction based on unsupervised clustering of DNA physical profiles. Bioinformatics 24, i24-i31.

(32) Saeys, Y., Abeel, T., Van de Peer, Y. (2008) Towards robust feature selection techniques. Proceedings of Benelearn 45-46.

(31) Abeel, T., Saeys, Y., Bonnet, E., Rouzé, P., Van de Peer, Y. (2008) Generic eukaryotic core promoter prediction using structural features of DNA. Genome Research 18, 310-23.

(30) Abeel, T., Saeys, Y., Van de Peer, Y. (2008) ProSOM: Core promoter identification in the human genome. Proceedings of Benelearn 77-78.

(29) Michoel, T., Maere, S., Bonnet, E., Joshi, A., Saeys, Y., Van den Bulcke, T., Van Leemput, K., van Remortel, P., Kuiper, M., Marchal, K., V. de Peer, Y. (2007) Validating module network learning algorithms using simulated data. BMC Bioinformatics 8, 860-871.

(28) Saeys, Y., Inza, I., Larrañaga, P. (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507-17.

(27) Saeys, Y., Abeel, T., Degroeve, S., Van de Peer, Y. (2007) Translation initiation site prediction on a genomic scale: beauty in simplicity. Bioinformatics 23, 418-23.

(26) Bonet, I., Lorenzo, M., Saeys, Y., Van de Peer, Y., Grau, R. (2007) Predicting Human Immunodeficiency Virus (HIV) Drug Resistance using Recurrent Neural Networks. Lect. Notes in Comput. Sci. 4527, 234-43.

(25) * Michoel, T., * Maere, S., Bonnet, E., Joshi, A., Saeys, Y., Van den Bulcke, T., Van Leemput, K., van Remortel, P., Kuiper, M., Marchal, K., Van de Peer, Y. (2007) Validating module networks learning algorithms using simulated data. BMC Bioinformatics 8, S5. *contributed equally

(24) Saeys, Y., Rouzé, P., Van de Peer, Y. (2007) In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi, and protists. Bioinformatics 23, 414-20.

(23) Saeys, Y., Van de Peer, Y. (2007) Enhancing coding potential prediction for short sequences using complementary sequence features and feature selection. Lect. Notes in Bioinf. 4366, 107-118.

(22) Westra, R., Tuyls, K., Saeys, Y., Nowe, A. (2007) Editorial of the proceedings of the first workshop on Knowledge Discovery and Emergent Complexity in Bioinformatics (KDECB). Lect. Notes in Bioinf. 4366, 1-9.

(21) Saeys, Y., Van de Peer, Y. (2007) Distribution based algorithms for feature weighting, ranking, and selection. Proceedings of the 16th Dutch Belgian Machine Learning Conference (Benelearn 2007)

(20) * Derelle, E., * Ferraz, C., * Rombauts, S., * Rouzé, P., Worden, A.Z., Robbens, S., Partensky, F., Degroeve, S., Echeynie, S., Cooke, R., Saeys, Y., Wuyts, J., Panaud, O., Piegu, B., Ball, S., Ral, J.P., Bouget, F.-Y., Piganeau, G., De Baets, B., Picard, A., Delseny, M., Demaille, J., Van de Peer, Y., Moreau, H. (2006) Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features. Proc. Natl. Acad. Sci. USA 103, 11647-52. *contributed equally

(19) Faes, P., Minnaert, B., Christiaens, M., Bonnet, E., Saeys, Y., Stroobandt, D., Van de Peer, Y. (2006) A Scalable Hardware Accelerator for Comparing Protein Sequences.. Proceedings of the First International Conference on Scalable Information Systems. Hong Kong, April 2006, on CD..

(18) Bonet, I., Garcia, M.M., Salazar, S., Sanchex, R., Saeys, Y., Van de Peer, Y., Grau, R. (2006) Predicting Human Immunodeficiency Virus (HIV) Drug Resistance using Recurrent Neural Networks. Proceedings of the 10th International Electronic Conference on Synthetic Organic Chemistry, 2006.

(17) Bonet, I., Garcia, M.M., Salazar, S., Sanchex, R., Saeys, Y., Van de Peer, Y., Grau, R. (2006) Feature Extraction Using Clustering of Proteins. Lect. Notes in Comput. Sci. (vol. 4225) pp. 614-623..

(16) Saeys, Y., Van de Peer, Y. (2006) Enhancing coding potential prediction for short sequences using complementary sequence features and feature selection. Proceedings of the 15th Dutch Belgian Machine Learning Conference (Benelearn 2006), pp. 105-112..

(15) Saeys, Y., Van de Peer, Y. (2006) Combining signal processing and machine learning techniques for coding potential prediction. Proceedings of the First International Workshop on Bioinforrmatics Cuba-Flanders 2006, Santa Clara, Cuba.

(14) Degroeve, S., Saeys, Y., De Baets, B., Rouzé, P., Van de Peer, Y. (2005) Predicting splice sites from high-dimensional local context representations. Bioinformatics 21, 1332-8.

(13) Saeys, Y., Degroeve, S., Van de Peer, Y. (2005) Feature ranking using an EDA-based wrapper approach. Invited book chapter in 'Towards a new evolutionary computation : advances in Estimation of Distribution algorithms.' Editors: Jose A. Lozano, Pedro Larranaga, Inaki Inza and Endika Bengoetxea.

(12) Florquin, K., Degroeve, S., Saeys, Y., Van de Peer, Y. (2005) Large-scale structural analysis of the core promoter inMammalian and plant genomes. Nucleic Acids Res. 33, 4255-64.

(11) Saeys, Y., Degroeve, S., Aeyels, D., Rouzé, P., Van de Peer, Y. (2004) Feature selection for splice site prediction: A new method using EDA-based feature ranking. BMC Bioinformatics 5, 64.

(10) Degroeve, S., Saeys, Y., De Baets, B., Van de Peer, Y., Rouzé, P. (2004) Splice site prediction in eukaryote genome sequences: the algorithmic issues. The New Avenues in Bioinformatics J. Seckbach (ed.) another book of the Cellular Origin and Life in Extreme Habitats Book Series. Kluwer Academic Publishers, Dordrecht, The Netherlands. (2004).

(9) Simillion, C., Vandepoele, K., Saeys, Y., Van de Peer, Y. (2004) Building genomic profiles for uncovering segmental homology in the twilight zone. Genome Res. 14, 1095-106.

(8) Saeys, Y., Degroeve, S., Aeyels, D., Rouzé, P., Van de Peer, Y. (2004) Selecting relevant features for gene structure prediction. Proceedings of Benelearn 13, 103-109.

(7) Saeys, Y., Degroeve, S., Van de Peer, Y. (2004) Digging into acceptor splice site prediction: an iterative feature selection approach. Proceedings of ECML/PKDD , Lecture Notes in Artificial Intelligence 3202, 386-397.

(6) Saeys, Y., Degroeve, S., Aeyels, D., Van de Peer, Y., Rouzé, P. (2003) Selecting Relevant Features for Splice Site Prediction by estimation of Distribution Algorithms. Proceedings of the 12th Belgian-Ducht Conference on Machine Learning (Benelearn 2002) 64-70. Utrecht, The Netherlands.

(5) Saeys, Y., Degroeve, S., Aeyels, D., Van de Peer, Y. (2003) Fast feature selection using a simple Estimation of Distribution Algorithm: A case study on splice site prediction. Bioinformatics 19 Suppl 2, II179-II188.

(4) Raes, J., Vandepoele, K., Saeys, Y., Simillion, C., Van de Peer, Y. (2003) Investigating ancient duplication events in the Arabidopsis genome. J. Struct. Func. Genomics. 3, 117-29.

(3) Vandepoele, K., Saeys, Y., Simillion, C., Raes, J., Van de Peer, Y. (2002) The Automatic Detection of Homologous Regions (ADHoRe) and its application to microcolinearity between Arabidopsis and Rice. Genome Res. 12, 1792-801.

(2) Saeys, Y., Van Marck, H. (2000) A Study and Improvement of the Genetic Algorithm in the CAM-Brain Machine. Proceedings of CEvoLE 2/TWLT 18: "Learing to Behave" 107-118. Ieper, Belgium.

(1) Nguyen, H., Couckuyt, I., Gorissen, D., Saeys, Y., Knockaert, L., Dhaene, T. (0000) An alternative approach to avoid overfitting for surrogate models. . Proceedings of the 2011 Winter Simulation Conference













Editor of

Lecture Notes in Bioinformatics, vol. 4366


Editor of

Proceedings Benelearn 2006 Conference


Contributor to

Towards a New Evolutionary Computation


Contributor to

Genome Evolution


Contributor to

The New Avenues in Bioinformatics


Contact:
VIB / UGent
Bioinformatics & Evolutionary Genomics
Technologiepark 927
B-9052 Gent
BELGIUM
+32 (0) 9 33 13807 (phone)
+32 (0) 9 33 13809 (fax)

Don't hesitate to contact the in case of problems with the website!