Herman De Beukelaer

Title: 
Postdoc

Publications

  1. De Beukelaer, H., Davenport, G. F., & Fack, V. (2018). Core Hunter 3 : flexible core subset selection. BMC BIOINFORMATICS, 19. https://doi.org/10.1186/s12859-018-2209-z
    Background: Core collections provide genebank curators and plant breeders a way to reduce size of their collections and populations, while minimizing impact on genetic diversity and allele frequency. Many methods have been proposed to generate core collections, often using distance metrics to quantify the similarity of two accessions, based on genetic marker data or phenotypic traits. Core Hunter is a multi-purpose core subset selection tool that uses local search algorithms to generate subsets relying on one or more metrics, including several distance metrics and allelic richness. Results: In version 3 of Core Hunter (CH3) we have incorporated two new, improved methods for summarizing distances to quantify diversity or representativeness of the core collection. A comparison of CH3 and Core Hunter 2 (CH2) showed that these new metrics can be effectively optimized with less complex algorithms, as compared to those used in CH2. CH3 is more effective at maximizing the improved diversity metric than CH2, still ensures a high average and minimum distance, and is faster for large datasets. Using CH3, a simple stochastic hill-climber is able to find highly diverse core collections, and the more advanced parallel tempering algorithm further increases the quality of the core and further reduces variability across independent samples. We also evaluate the ability of CH3 to simultaneously maximize diversity, and either representativeness or allelic richness, and compare the results with those of the GDOpt and SimEli methods. CH3 can sample equally representative cores as GDOpt, which was specifically designed for this purpose, and is able to construct cores that are simultaneously more diverse, and either are more representative or have higher allelic richness, than those obtained by SimEli. Conclusions: In version 3, Core Hunter has been updated to include two new core subset selection metrics that construct cores for representativeness or diversity, with improved performance. It combines and outperforms the strengths of other methods, as it (simultaneously) optimizes a variety of metrics. In addition, CH3 is an improvement over CH2, with the option to use genetic marker data or phenotypic traits, or both, and improved speed. Core Hunter 3 is freely available on http://www.corehunter.org.
  2. De Beukelaer, H., Badke, Y., Fack, V., & De Meyer, G. (2017). Moving beyond managing realized genomic relationship in long-term genomic selection. GENETICS, 206(2), 1127–1138. https://doi.org/10.1534/genetics.116.194449
    Long-term genomic selection (GS) requires strategies that balance genetic gain with population diversity, to sustain progress for traits under selection, and to keep diversity for future breeding. In a simulation model for a recurrent selection scheme, we provide the first head-to-head comparison of two such existing strategies: genomic optimal contributions selection (GOCS), which limits realized genomic relationship among selection candidates, and weighted genomic selection (WGS), which upscales rare allele effects in GS. Compared to GS, both methods provide the same higher long-term genetic gain and a similar lower inbreeding rate, despite some inherent limitations. GOCS does not control the inbreeding rate component linked to trait selection, and, therefore, does not strike the optimal balance between genetic gain and inbreeding. This makes it less effective throughout the breeding scheme, and particularly so at the beginning, where genetic gain and diversity may not be competing. For WGS, truncation selection proved suboptimal to manage rare allele frequencies among the selection candidates. To overcome these limitations, we introduce two new set selection methods that maximize a weighted index balancing genetic gain with controlling expected heterozygosity (IND-HE) or maintaining rare alleles (IND-RA), and show that these outperform GOCS and WGS in a nearly identical way. While requiring further testing, we believe that the inherent benefits of the IND-HE and IND-RA methods will transfer from our simulation framework to many practical breeding settings, and are therefore a major step forward toward efficient long-term genomic selection.
  3. De Beukelaer, H. (2017). Discrete optimization algorithms for marker-assisted plant breeding. Ghent University. Faculty of Sciences, Ghent, Belgium.
  4. De Beukelaer, H., Davenport, G. F., De Meyer, G., & Fack, V. (2017). JAMES : an object-oriented Java framework for discrete optimization using local search metaheuristics. SOFTWARE-PRACTICE & EXPERIENCE, 47(6), 921–938. https://doi.org/10.1002/spe.2459
  5. De Beukelaer, H., De Meyer, G., & Fack, V. (2015). Heuristic exploitation of genetic structure in marker-assisted gene pyramiding problems. BMC GENETICS, 16. https://doi.org/10.1186/s12863-014-0154-z
    Background: Over the last decade genetic marker-based plant breeding strategies have gained increasing attention because genotyping technologies are no longer limiting. Now the challenge is to optimally use genetic markers in practical breeding schemes. For simple traits such as some disease resistances it is possible to target a fixed multi-locus allele configuration at a small number of causal or linked loci. Efficiently obtaining this genetic ideotype from a given set of parental genotypes is known as the marker-assisted gene pyramiding problem. Previous methods either imposed strong restrictions or used black box integer programming solutions, while this paper explores the power of an explicit heuristic approach that exploits the underlying genetic structure to prune the search space. Results: Gene Stacker is introduced as a novel approach to marker-assisted gene pyramiding, combining an explicit directed acyclic graph model with a pruned generation algorithm inspired by a simple exhaustive search. Both exact and heuristic pruning criteria are applied to reduce the number of generated schedules. It is shown that this approach can effectively be used to obtain good solutions for stacking problems of varying complexity. For more complex problems, the heuristics allow to obtain valuable approximations. For smaller problems, fewer heuristics can be applied, resulting in an interesting quality-runtime tradeoff. Gene Stacker is competitive with previous methods and often finds better and/or additional solutions within reasonable time, because of the powerful heuristics. Conclusions: The proposed approach was confirmed to be feasible in combination with heuristics to cope with realistic, complex stacking problems. The inherent flexibility of this approach allows to easily address important breeding constraints so that the obtained schedules can be widely used in practice without major modifications. In addition, the ideas applied for Gene Stacker can be incorporated in and extended for a plant breeding context that e.g. also addresses complex quantitative traits or conservation of genetic background. Gene Stacker is freely available as open source software at http://genestacker.ugent.be. The website also provides documentation and examples of how to use Gene Stacker.
  6. De Beukelaer, H., Davenport, G. F., De Meyer, G., & Fack, V. (2015). JAMES : a modern object-oriented Java framework for discrete optimization using local search metaheuristics. In M. Doumpos & E. Grigoroudis (Eds.), Conference proceedings : 4th international symposium & 26th national conference on operational research (pp. 134–138). Hellenic Operational Research Society.
    This paper describes JAMES, a modern object-oriented Java framework for discrete optimization using local search algorithms that exploits the generality of such metaheuristics by clearly separating search implementation and application from problem specification. A wide range of generic local searches are provided, including (stochastic) hill climbing, tabu search, variable neighbourhood search and parallel tempering. These can be applied easily to any user-defined problem by plugging in a custom neighbourhood for the corresponding solution type. The performance of several different search algorithms can be assessed and compared in order to select an appropriate optimization strategy. Also, the influence of parameter values can be studied. Implementations of specific components are included for subset selection, such as a predefined solution type, a generic problem definition and several subset neighbourhoods used to modify the set of selected items. Additional components for other types of problems (e.g. permutation problems) are provided through an extensions module. Releases of JAMES are deployed to the Maven Central Repository so that the framework can easily be included as a dependency in other Java applications. The project is fully open source and hosted on GitHub. More information can be found at http://www.jamesframework.org.
  7. De Beukelaer, H., Smýkal, P., Davenport, G. F., & Fack, V. (2012). Core Hunter II : fast core subset selection based on multiple genetic diversity measures using Mixed Replica search. BMC BIOINFORMATICS, 13. https://doi.org/10.1186/1471-2105-13-312
    Background: Sampling core subsets from genetic resources while maintaining as much as possible the genetic diversity of the original collection is an important but computationally complex task for gene bank managers. The Core Hunter computer program was developed as a tool to generate such subsets based on multiple genetic measures, including both distance measures and allelic diversity indices. At first we investigate the effect of minimum (instead of the default mean) distance measures on the performance of Core Hunter. Secondly, we try to gain more insight into the performance of the original Core Hunter search algorithm through comparison with several other heuristics working with several realistic datasets of varying size and allelic composition. Finally, we propose a new algorithm (Mixed Replica search) for Core Hunter II with the aim of improving the diversity of the constructed core sets and their corresponding generation times. Results: Our results show that the introduction of minimum distance measures leads to core sets in which all accessions are sufficiently distant from each other, which was not always obtained when optimizing mean distance alone. Comparison of the original Core Hunter algorithm, Replica Exchange Monte Carlo (REMC), with simpler heuristics shows that the simpler algorithms often give very good results but with lower runtimes than REMC. However, the performance of the simpler algorithms is slightly worse than REMC under lower sampling intensities and some heuristics clearly struggle with minimum distance measures. In comparison the new advanced Mixed Replica search algorithm (MixRep), which uses heterogeneous replicas, was able to sample core sets with equal or higher diversity scores than REMC and the simpler heuristics, often using less computation time than REMC. Conclusion: The REMC search algorithm used in the original Core Hunter computer program performs well, sometimes leading to slightly better results than some of the simpler methods, although it doesn’t always give the best results. By switching to the new Mixed Replica algorithm overall results and runtimes can be significantly improved. Finally we recommend including minimum distance measures in the objective function when looking for core sets in which all accessions are sufficiently distant from each other. Core Hunter II is freely available as an open source project at http://www.corehunter.org.