Sofie Van Landeghem
The systematic development and application of high-throughput techniques in the domain of molecular biology generates a massive amount of data. For instance, the study of complete genomes has become a routine job due to advances in DNA sequencing. Online databases have been created to store such molecular data, but at the same time lots of information is still only captured in scientific articles, published in online literature resources such as PubMed.
The ever increasing dimensionality of these datasets and their constant growth have made manual analysis practically impossible. Still, progress in molecular biology is mainly due to the combination of different sources of knowledge (data integration). Combining heterogeneous data sources allows a researcher to find out more about a certain process, and can be useful to reduce ambiguities and uncertainties when these resources supply complementary information.
In my PhD, I want to study the feasibility of developing automated data mining and text mining techniques to help scientists gather as much information as possible about a certain biological process or a selected group of genes, by automatically generating biologically relevant summaries from online databases and literature.
There is a necessity to develop specific text mining algorithms for the domain of molecular biology. For example, named entity recognition (NER) of genes and proteins in english text is in itself a challenging task, considering gene name ambiguity. In 2007-2008, I have mainly worked on machine learning techniques to automatically extract protein-protein interactions from text. With the introduction of the BioNLP Shared Task in 2009, this research has broadened its scope by considering a more complex event-style representation of text mining results.
From 2010 on, I have been focusing more on the applicability of text mining techniques in real world cases, finding solutions for data abundance and ambiguity. In collaboration with the Turku BioNLP group in Finland, we maintain the EVEX Dataset, which is publicly available for download as a MySQL database. The EVEX Dataset contains the result of running the Turku Event Extraction System together with BANNER and the McClosky-Charniak Parser on a PubMed scale. In 2010, the system was applied to all abstracts in the 2009 distribution of PubMed. Furthermore, gene symbols are linked to well-defined gene families from Ensembl and HomoloGene, enabling integration of text mining results with external databases such as Entrez Gene and Uniprot. More up-to-date information can be found at the website of the Turku BioNLP group.
Papers(26) Kaewphan, S., Van Landeghem, S., Ohta, T., Van de Peer, Y., Ginter, F., Pyysalo, S. (2016) Cell line name recognition in support of the identification of synthetic lethality in cancer from text. Bioinformatics 32(2):276-282.
(25) Van Landeghem, S., Van Parys, T., Dubois, M., Inzé, D., Van de Peer, Y. (2016) Diffany: an ontology-driven framework to infer, visualise and analyse differential molecular networks. BMC Bioinformatics 17(1):18.
(24) Hakala, K., Van Landeghem, S., Salakoski, T., Van de Peer, Y., Ginter , F. (2015) Application of the EVEX resource to event extraction and network construction: Shared Task entry and result analysis. 16(Suppl 16):S3.
(23) Szakonyi, D., Van Landeghem, S., Baerenfaller, K., Baeyens, L., Blomme, J., Casanova-Sáez, R., De Bodt, S., Esteve-Bruna, D., Fiorani, F., Gonzalez, A., Grønlund, J., G.H. Immink, R., Jover-Gil, S., Kuwabara, A., Muñoz-Nortes, T., D.J. van Dijk, A., Wilson-Sánchez, D., Buchanan-Wollaston, V., C. Angenent, G., Van de Peer, Y., Inzé, D., Luis Micol, J., Gruissem, W., Walsh, S., Hilson, P. (2015) The KnownLeaf literature curation system captures knowledge about Arabidopsis leaf growth and development and facilitates integrated data mining. Current Plant Biology 2:1-11.
(22) Claeys, M., Van Landeghem, S., Dubois, M., Maleux, K., Inzé, D. (2014) What Is Stress? Dose-Response Effects in Commonly Used in Vitro Stress Assays. Plant Physiol. 165(2):519-527.
(21) Hakala, K., Van Landeghem, S., Salakoski, T., Van de Peer, Y., Ginter, F. (2014) Application of the EVEX resource to event extraction and network construction: Shared Task entry and result analysis. BMC Bioinformatics special issue on BioNLP Shared Task 2013.
(20) Hakala, K., Van Landeghem, S., Salakoski, T., Van de Peer, Y., Ginter, F. (2013) EVEX in ST'13: Application of a large-scale text mining resource to event extraction and network construction. Proceedings of the BioNLP Shared Task 2013 Workshop 26-34.
(19) Van Landeghem, S., Kaewphan, S., Ginter, F., Van de Peer, Y. (2013) Evaluating large-scale text mining applications beyond the traditional numeric performance measures. Proceedings of the BioNLP 2013 Workshop 63-71.
(18) Vandepoele, K., Van Bel, M., Richard, G., Van Landeghem, S., Verhelst, B., Moreau, H., Van de Peer, Y., Grimsley, N., Piganeau, G. (2013) pico-PLAZA, a genome database of microbial photosynthetic eukaryotes. Environ. Microbiol. 15(8):2147-53.
(17) Van Landeghem, S., Björne, J., Wei, C.-H., Hakala, K., Pyysalo, S., Ananiadou, S., Kao, H.-Y., Lu, Z., Salakoski, T., Van de Peer, Y., Ginter, F. (2013) Large-scale event extraction from literature with multi-level gene normalization. PLoS One 8(4):e55814.
(16) Van Landeghem, S., De Bodt, S., Drebert, Z. J., Inzé, D., Van de Peer, Y. (2013) The potential of text mining in data integration and network biology for plant research: a case study on Arabidopsis. The Plant Cell 25(3):794-807.
(15) Hakala, K., Van Landeghem, S., Kaewphan, S., Salakoski, T., Van de Peer, Y., Ginter, F. (2012) CyEVEX: Literature-scale network integration and visualization through Cytoscape. Proceedings of the 5th International Symposium on Semantic Mining in Biomedicine (SMBM) p. 91-96.
(14) Van Landeghem, S., Hakala, K., Rönnqvist, S., Salakoski, T., Van de Peer, Y., Ginter, F. (2012) Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology and Indirect Associations. Advances in Bioinformatics 2012:582765.
(13) Van Landeghem, S., Björne, J., Abeel, T., De Baets, B., Salakoski, T., Van de Peer, Y. (2012) Semantically linking molecular entities in literature through entity relationships. BMC Bioinformatics 13 Suppl 11:S6.
(12) Kaewphan, S., Peltonen, S., Van Landeghem, S., Van de Peer, Y., Jones, P., Ginter, F. (2012) Integrating large-scale text mining and co-expression networks: Targeting NADP(H) metabolism in E. coli with event extraction. Proceedings of the LREC workshop on Building and Evaluating Resources for Biomedical Text Mining. Istanbul, Turkey.
(11) Björne, J., Van Landeghem, S., Pyysalo, S., Ohta, T., Ginter, F., Van de Peer, Y., Ananiadou, S., Salakoski, T. (2012) PubMed-Scale Event Extraction for Post-Translational Modifications, Epigenetics and Protein Structural Relations. Proceedings of the 2012 Workshop on Biomedical Natural Language Processing(BioNLP) NAACL Workshop. Montreal, Canada. pp. 82-90 .
(10) Kano, Y., Björne, J., Ginter, F., Salakoski, T., Buyko, E., Hahn, U., Cohen, K. B., Verspoor, K., Roeder, C., Hunter, L.E., Kilicoglu, H., Bergler, S., Van Landeghem, S., Van Parys, T., Van de Peer, Y., Miwa, S., Ananiadou, S., Neves, M., Pascual-Montano, A., Ozgur, A., Radev, D.R., Riedel, S., Saetre, R., Chun, H.W., Kim, J.R., Pyysalo, S., Ohta, T., Tsujii, J. (2011) U-Compare bio-event meta-service: compatible BioNLP event extraction services. BMC Bioinformatics 12:481.
(9) Van Landeghem, S., De Baets, B., Van de Peer, Y., Saeys, Y. (2011) High-precision bio-molecular event extraction from text using parallel binary classifiers. Comput. Intell. 27(4):645-664.
(8) Van Landeghem, S., Ginter, F., Van de Peer, Y., Salakoski, T. (2011) EVEX: A PubMed-Scale Resource for Homology-Based Generalization of Text Mining Predictions. Proceedings of the 2011 Workshop on Biomedical Natural Language Processing, ACL-HLT 2011 28-37, Portland, Oregon, USA.
(7) * Abeel, T., * Van Landeghem, S., Morante, R., Van Asch, V., Van de Peer, Y., Daelemans, W., Saeys, Y. (2010) Highlights of the BioTM 2010 workshop on advances in bio text mining. BMC Bioinformatics 11, I1. *contributed equally
(6) * Van Landeghem, S., * Abeel, T., Saeys, Y., Van de Peer, Y. (2010) Discriminative and informative features for biomolecular text mining with ensemble feature selection. Bioinformatics 26(18):i554-60. *contributed equally
(5) Van Landeghem, S., Pyysalo, S., Ohta, T., Van de Peer, Y. (2010) Integration of Static Relations to Enhance Event Extraction from Text. Proceedings of the 2010 Workshop on Biomedical Natural Language Processing 144-152. Uppsala, Sweden.
(4) Saeys, Y., Van Landeghem, S., Van de Peer, Y. (2010) Event based text mining for integrated network construction. Journal of Machine Learning Research, Workshop and Conference proceedings 8:112-121.
(3) Saeys, Y., Van Landeghem, S., Van de Peer, Y. (2009) Integrated network construction using event based text mining. Proceedings of the 3rd Machine Learning in Systems Biology workshop (MLSB) 105-114.
(2) Van Landeghem, S., Saeys, Y., De Baets, B., Van de Peer, Y. (2009) Analyzing text in search of bio-molecular events: a high-precision machine learning framework. Proceedings of Natural Language Processing in Biomedicine (BioNLP) NAACL 2009 Workshop 128-136.
(1) Van Landeghem, S., Saeys, Y., De Baets, B., Van de Peer, Y. (2008) Extracting protein-protein interactions from text using rich feature vectors and feature selection. Proceedings of Third International Symposium on Semantic Mining in Biomedicine (SMBM 08) 77-84.
VIB / UGent
Bioinformatics & Evolutionary Genomics
+32 (0) 9 33 13807 (phone)
+32 (0) 9 33 13809 (fax)