PubMed-Scale Event Extraction for Post-Translational Modifications, Epigenetics and Protein Structural Relations
Recent efforts in biomolecular event extraction have mainly focused on core event types involving genes and proteins, such as gene expression, protein-protein interactions, and protein catabolism. The BioNLP’11 Shared Task extended the event extraction approach to sub-protein events and relations in the Epigenetics and Post-translational Modifications (EPI) and Protein Relations (REL) tasks. In this study, we apply the Turku Event Extraction System, the best-performing system for these tasks, to all PubMed abstracts and all available PMC full-text articles, extracting 1.4M EPI events and 2.2M REL relations from 21M abstracts and 372K articles. We introduce several entity normalization algorithms for genes, proteins, protein complexes and protein components, aiming to uniquely identify these biological entities. This normalization effort allows direct mapping of the extracted events and relations with posttranslational modifications from UniProt, epigenetics from PubMeth, functional domains from InterPro and macromolecular structures from PDB. The extraction of such detailed protein information provides a unique text mining dataset, offering the opportunity to further deepen the information provided by existing PubMed-scale event extraction efforts. The methods and data introduced in this study are freely available from bionlp.utu.fi.
Björne, J., Van Landeghem, S., Pyysalo, S., Ohta, T., Ginter, F., Van de Peer, Y., Ananiadou, S., Salakoski, T. (2012) PubMed-Scale Event Extraction for Post-Translational Modifications, Epigenetics and Protein Structural Relations. Proceedings of the 2012 Workshop on Biomedical Natural Language Processing(BioNLP) NAACL Workshop. Montreal, Canada. pp. 82-90 .
VIB / UGent
Bioinformatics & Evolutionary Genomics
+32 (0) 9 33 13807 (phone)
+32 (0) 9 33 13809 (fax)