This page lists the third-party resources and tools used within TRAPID.

They are organized by general categories, depending on which part of TRAPID makes use of them, and sorted by alphabetical order. Further information about used versions and parameters, together with links to either the official website or to the relevant publication (when applicable), are also provided.

Reference databases

The below table provides an overview of TRAPID 2.0's reference databases. Gene counts only include protein-coding genes. The available reference database encompass protein sequences, functional annotation, and GF information for 115 archaea, 1,678 bacteria, and 326 eukaryotes (88 of which exclusively in PLAZA).

For extensive details about the reference databases (e.g. further information on gene family construction, list of included clades/species, ...), please refer to their own documentation.


PLAZA 4.5 dicots PLAZA 4.5 monocots Pico PLAZA 3.0 PLAZA diatoms 1.0 EggNOG 4.5.1
# Species 55 39 39 26 2,031
# Genes 1,833,029 1,497,121 572,836 503,959 9,646,196
Taxonomic focus Dicot plants Monocot plants Microbial photosynthetic eukaryotes Diatoms Archaea, Bacteria, Eukaryotes
Functional annotation GO, InterPro GO, InterPro GO, InterPro GO, InterPro GO, KO
Gene family construction Tribe-MCL, integrative orthologs Tribe-MCL, integrative orthologs Tribe-MCL, integrative orthologs Tribe-MCL, integrative orthologs eggNOG

Initial processing

DIAMOND

Fast and sensitive protein alignment using DIAMOND
Buchfink, B., Xie, C., & Huson, D. H
Nature methods 12.1 (2015): 59

Source: GitHub
Website: Official website

  • Version: 0.9.18
  • Parameters/command-line: --evalue 1e-5 [default], --more-sensitive

Infernal

Infernal 1.1: 100-fold faster RNA homology searches
Nawrocki, E. P., & Eddy, S. R.
Bioinformatics 29.22 (2013): 2933-2935

Source: GitHub
Website: Official website

  • Version: 1.1.2
  • Parameters/command-line: cmsearch using user-selected clans, --nohmmonly, --rfam, --cut_ga

Kaiju

Fast and sensitive taxonomic classification for metagenomics with Kaiju
Menzel, P., Ng, K. L., & Krogh, A.
Nature communications 7.1 (2016)

Source: GitHub
Website: Official website

  • Version: 1.7.3
  • Parameters/command-line: MEM mode, minimum match length 11, filter low complexity sequences

eggNOG-mapper

Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper
Huerta-Cepas, J., Forslund, K., Coelho, L. P., Szklarczyk, D., Jensen, L. J., Von Mering, C., & Bork, P.
Molecular biology and evolution 34.8 (2017)

Source: GitHub
Website: Web server

  • Version: 1.0.3-3-g3e22728
  • Parameters/command-line: user-selected taxonomic scope, skipping DIAMOND search (run prior to calling eggNOG-mapper)

NCBI BLAST non-redundant protein database

Website: NCBI FTP

  • Version: 2019-09-05
  • Used as database for Kaiju.
    Contains 218,930,306 sequences from eukaryotes, bacteria, archaea, and viruses.

NCBI taxonomy

The NCBI Taxonomy database
Federhen, S.
Nucleic acids research 40.D1 (2012): D136-D143

Website: NCBI FTP

  • Version: 2019-09-05

Rfam

Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families
Kalvari, I., Argasinska, J., Quinones-Olvera, N., Nawrocki, E. P., Rivas, E., Eddy, S. R., ... & Petrov, A. I.
Nucleic acids research 46.D1 (2018): D335-D342

Website: Official website

  • Version: 14.1 (January 2019)
  • Transcript sequences assigned to an RNA family are subsequently annotated with manually curated GO terms associated to the family, defined by Rfam curators.

MSA and phylogeny

MAFFT

MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability
Katoh, K., & Standley, D. M.
Molecular biology and evolution 30.4 (2013): 772-780.

Source: Official website

  • Version: 7.187
  • Parameters/command-line: mafft --maxiterate 1000 --auto $fasta_file_path > $msa_file_path 2> $msa_file_path.log
  • When creating a phylogenetic tree, optional editing of the alignment is performed using one of the available editing mode: removal of lowly conserved positions by trimming the sequences, filtering of partial sequences, or a combination of both

MUSCLE

MUSCLE: multiple sequence alignment with high accuracy and high throughput
Edgar, R. C.
Nucleic acids research 32.5 (2004): 1792-1797

Source: Official website

  • Version: 3.8.31
  • Parameters/command-line: muscle -in $fasta_file_path -out $msa_file_path
  • When creating a phylogenetic tree, optional editing of the alignment is performed using one of the available editing mode: removal of lowly conserved positions by trimming the sequences, filtering of partial sequences, or a combination of both

FastTree2

FastTree 2-approximately maximum-likelihood trees for large alignments
Price, M. N., Dehal, P. S., & Arkin, A. P.
PloS one 5, no. 3 (2010)

Source: FastTree website

  • Version: 2.1.7
  • Parameters/command-line: FastTree -wag -gamma $MSA_FILE_PATH > $NEWICK_FILE_PATH

IQ-TREE

IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies
Nguyen, L. T., Schmidt, H. A., Von Haeseler, A., & Minh, B. Q.
Molecular biology and evolution 32.1 (2015): 268-274

Source: GitHub
Website: Official website

  • Version: 1.7.0b7
  • Parameters/command-line: iqtree -st AA -s $msa_stripped_file_path -pre $tmp_tree_file_path -nt 1 -bb 1000 -mset JTT,LG,WAG,Blosum62,VT,Dayhoff -mfreq F -mrate R > $tmp_tree_file_path.log

RaxML

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies
Stamatakis, A.
Bioinformatics 30.9 (2014): 1312-1313.

Source: GitHub
Website: Official website

  • Version: 8.2.8
  • Parameters/command-line: raxmlHPC-PTHREADS-SSE3 -T 1 -f a -x 12345677 -p 12345677 -N 100 -m PROTGAMMAWAG -s $MSA_FILE_PATH -w $OUTPUT_DIR -n $NEWICK_FILE_NAME -k

PhyML

New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0
Guindon, S., Dufayard, J. F., Lefort, V., Anisimova, M., Hordijk, W., & Gascuel, O.
Systematic biology 59.3 (2010): 307-321.

Source: ATGC Montpellier website

  • Version: 2015-02-19
  • Parameters/command-line: phyml -i $msa_phylip_file_path -d aa -n 1 -b 100 -m WAG -f e -c 4 -a e -o n

Data visualization

Unipept standalone visualizations

Source: GitHub
Website: Unipept website

  • Version: 1.7.2

KronaTools

Interactive metagenomic visualization in a Web browser
Ondov, B. D., Bergman, N. H., & Phillippy, A. M.
BMC bioinformatics 12.1 (2011): 385

Source: GitHub
Website: GitHub wiki

  • Version: 2.7

MSAviewer

MSAViewer: interactive JavaScript visualization of multiple sequence alignments
Yachdav, G., Wilzbach, S., Rauscher, B., Sheridan, R., Sillitoe, I., Procter, J., ... & Goldberg, T.
Bioinformatics 32.22 (2016): 3501-3503

Source: GitHub
Website: BioJS page

  • Version: 1.0.0

PhyD3

PhyD3: a phylogenetic tree viewer with extended phyloXML support for functional genomics data visualization.
Kreft, L., Botzki, A., Coppens, F., Vandepoele, K., & Van Bel, M.
Bioinformatics 33.18 (2017): 2946-2947

Source: GitHub
Website: Official website

  • Version: 1.3