------------------------------ Spidermite small RNAs analysis ------------------------------ Eric Bonnet erbon@psb.vib-ugent.be Files: ----- tRNA_coord: Coordinates for tRNAs (predicted with tRNASCAN-SE). 139 sequences. rRNA_coord: Coordinates for rRNAs (predicted with RNAmmer, using specific cutoff). 56 sequences. infernal_ncRNA_coord: Coordinates of tRNAs, rRNAs, snoRNAs, snRNAs, U-RNAs, predicted with infernal, and parsed with a cutoff score of 28 bits. 357 sequences uniq_coord: Coordinates of unique small RNA sequences after filtering. 676,266 sequences. Filtering steps: ---------------- - select reads with length >=18 nt - filter out sequences with more than 3 Ns - filter out sequences having long single nucleotides repeats (sequence of 10 A,T,G,C) - filter out sequences having less than 3 counts - map seq to spidermite genome ("old" assembly), requiring perfect alignments (blastN with appropriate parameters). - filter out sequences overlapping with tRNAs, rRNAs, snoRNAs, snRNAs, URNAs (predicted using tRNAscanSE, RNAmmer and infernal, with appropriate cutoff scores). - group unique sequences from each GES pool (and keep a table of correspondences)