Evaluating large-scale text mining applications beyond the traditional numeric performance measuresSofie Van Landeghem1,2, Suwisa Kaewphan3, Filip Ginter3, Yves Van de Peer1,2,*1 Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Ghent, Belgium 2 Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, 9052 Ghent, Belgium 3 Dept. of Information Technology, University of Turku, Finland, Joukahaisenkatu 3-5, 20520 Turku, Finland *Corresponding author, E-mail: yves.vandepeer@psb.vib-ugent.be AbstractText mining methods for the biomedical domain have matured substantially and are currently being applied on a large scale to support a variety of applications in systems biology, pathway curation, data integration and gene summarization. Community-wide challenges in the BioNLP research field provide gold-standard datasets and rigorous evaluation criteria, allowing for a meaningful comparison between techniques as well as measuring progress within the field. However, such evaluations are typically conducted on relatively small training and test datasets. On a larger scale, systematic erratic behaviour may occur that severely influences hundreds of thousands of predictions. In this work, we perform a critical assessment of a large-scale text mining resource, identifying systematic errors and determining their underlying causes through semi-automated analyses and manual evaluations. Supplementary dataFiltering lists
In the first file, the term *Any_numeric_value refers to any number, such as -9 or 342, which can easily be checked within any programming language or with a regular expression. Log files
The first file shows how 3 false-positive predictions could be detected in the winning submission of the BioNLP ST'13 GE challenge. |
|
Contact:
VIB / UGent Bioinformatics & Evolutionary Genomics Technologiepark 927 B-9052 Gent BELGIUM +32 (0) 9 33 13807 (phone) +32 (0) 9 33 13809 (fax) |