Evaluating large-scale text mining applications beyond the traditional numeric performance measures

Sofie Van Landeghem^1,2, Suwisa Kaewphan³, Filip Ginter³, Yves Van de Peer^1,2,*

¹ Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Ghent, Belgium

² Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, 9052 Ghent, Belgium

³ Dept. of Information Technology, University of Turku, Finland, Joukahaisenkatu 3-5, 20520 Turku, Finland

^*Corresponding author, E-mail: yves.vandepeer@psb.vib-ugent.be

Abstract

Text mining methods for the biomedical domain have matured substantially and are currently being applied on a large scale to support a variety of applications in systems biology, pathway curation, data integration and gene summarization. Community-wide challenges in the BioNLP research field provide gold-standard datasets and rigorous evaluation criteria, allowing for a meaningful comparison between techniques as well as measuring progress within the field. However, such evaluations are typically conducted on relatively small training and test datasets. On a larger scale, systematic erratic behaviour may occur that severely influences hundreds of thousands of predictions. In this work, we perform a critical assessment of a large-scale text mining resource, identifying systematic errors and determining their underlying causes through semi-automated analyses and manual evaluations.

Supplementary data

Filtering lists

Words that should never be event triggers: trigger_blacklist.tab
Trigger words that could automatically be translated to the correct event type: trigger_changelist.tab

In the first file, the term *Any_numeric_value refers to any number, such as -9 or 342, which can easily be checked within any programming language or with a regular expression.
In the second file, *All refers to all possible event types, while *All_non_reg exludes the regulatory events, i.e. Regulation, Positive regulation, Negative regulation and Catalysis.

Log files

Applying the above lists to the winning system of the BioNLP ST'13 GE challenge: correction_log_st13.txt
Applying the above lists to the large-scale EVEX text mining resource: correction_log_evex.txt

The first file shows how 3 false-positive predictions could be detected in the winning submission of the BioNLP ST'13 GE challenge.
The second file contains the log of all errors that could automatically be detected and resolved within the EVEX resource, covering around 1.2% of the 40 million events in this dataset.

credits

Contact:
VIB / UGent
Bioinformatics & Evolutionary Genomics
Technologiepark 927
B-9052 Gent
BELGIUM
+32 (0) 9 33 13807 (phone)
+32 (0) 9 33 13809 (fax)

Evaluating large-scale text mining applications beyond the traditional numeric performance measures

Sofie Van Landeghem1,2, Suwisa Kaewphan3, Filip Ginter3, Yves Van de Peer1,2,*

Abstract

Supplementary data

Sofie Van Landeghem^1,2, Suwisa Kaewphan³, Filip Ginter³, Yves Van de Peer^1,2,*