util.io
Class FileParser

java.lang.Object
  extended by util.io.FileParser

public class FileParser
extends java.lang.Object

Utility class for parsing and extracting data from files

Author:
Michiel Van Bel

Constructor Summary
FileParser()
           
 
Method Summary
 FastaFileData extractDataFromFastaEmblFile(java.io.File file)
           
 FastaFileData extractDataFromFastaFile(java.io.File file)
          Reads all the fasta data from a fastafile.
 SecondaryStructureData extractRandomSecondaryStructuresFromFile(java.lang.String fileName, int number)
          Extracts a given amount of secondary structure data from a given file.
 java.util.List<java.lang.String> extractRandomSequencesFromFile(java.lang.String fileName, int number)
          This method extracts at random a certain number of unique lines from a file.
 SecondaryStructureData extractSecondaryStructuresFromFile(java.lang.String fileName)
          Extracts the secondarystructure data from a file, according to the way it was written in the file by RNAfold.
 java.util.List<java.lang.String> extractSequencesFromFile(java.lang.String fileName)
          Static method for extracting all DNA sequences (simple format, one sequence on each line, those sequences should all have their (pseudo-) splicesites aligned at the same position (defined in main.Jaspr.SPLICESITE).
 java.io.BufferedReader getReader(java.lang.String fileName)
          Extracts the secondarystructure data from a file, according to the way it was written in the file by RNAfold.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FileParser

public FileParser()
Method Detail

getReader

public java.io.BufferedReader getReader(java.lang.String fileName)
                                 throws java.io.IOException
Extracts the secondarystructure data from a file, according to the way it was written in the file by RNAfold. Not only the secondary structure itself is extracted, the associated energies are also given, thus giving rise to more possibilities for the secondary structure to be used as a starting point for feature extraction

Parameters:
fileName - The name of the file
Returns:
Data object which contains a list with all the secondary structures, and a list with all the associated energies.
Throws:
java.io.IOException - If the reading from the file fails for a reason. public SecondaryStructureData extractSecondaryStructuresFromFile(String fileName) throws IOException{ List data = new ArrayList(); List energies = new ArrayList(); File file = new File(fileName); if(!file.exists()){ return null; } BufferedReader reader = new BufferedReader(new FileReader(file)); String s = reader.readLine(); while(s!=null){ int length = s.length(); String secstruct = reader.readLine(); //secondary structure file has 2 lines, only last is needed String temp = secstruct.substring(0,length); String energy = secstruct.substring(length,secstruct.length()); energy = energy.substring(energy.indexOf("(")+1,energy.indexOf(")")); Double energ = Double.parseDouble(energy); energies.add(energ); data.add(temp); s = reader.readLine();//secondary structure file has 2 lines, only last is needed } SecondaryStructureData res = new SecondaryStructureData(); res.setEnergies(energies); res.setStructures(data); return res; }

extractSecondaryStructuresFromFile

public SecondaryStructureData extractSecondaryStructuresFromFile(java.lang.String fileName)
                                                          throws java.io.IOException
Extracts the secondarystructure data from a file, according to the way it was written in the file by RNAfold. Not only the secondary structure itself is extracted, the associated energies are also given, thus giving rise to more possibilities for the secondary structure to be used as a starting point for feature extraction

Parameters:
fileName - The name of the file
Returns:
Data object which contains a list with all the secondary structures, and a list with all the associated energies.
Throws:
java.io.IOException - If the reading from the file fails for a reason.

extractRandomSecondaryStructuresFromFile

public SecondaryStructureData extractRandomSecondaryStructuresFromFile(java.lang.String fileName,
                                                                       int number)
                                                                throws java.io.IOException,
                                                                       java.lang.Exception
Extracts a given amount of secondary structure data from a given file. If the number of requested lines is close to the number of lines in the file, the random generator may have trouble generating the last correct and available indices for the free slots.

Parameters:
fileName - The name of the file
number - The number of files to be extracted
Returns:
An object which contains the structures and the energies
Throws:
java.io.IOException - Thrown when reading the file goes wrong
java.lang.Exception - When number>lines in file

extractSequencesFromFile

public java.util.List<java.lang.String> extractSequencesFromFile(java.lang.String fileName)
                                                          throws java.io.IOException
Static method for extracting all DNA sequences (simple format, one sequence on each line, those sequences should all have their (pseudo-) splicesites aligned at the same position (defined in main.Jaspr.SPLICESITE).

Parameters:
fileName - The name of the file
Returns:
A list of sequences.
Throws:
java.io.IOException - Thrown when the reading fails

extractRandomSequencesFromFile

public java.util.List<java.lang.String> extractRandomSequencesFromFile(java.lang.String fileName,
                                                                       int number)
                                                                throws java.io.IOException,
                                                                       java.lang.Exception
This method extracts at random a certain number of unique lines from a file. If the number of requested lines is close to the total number of lines, there might be a problem as the random generator tries to generate the remaining few open slots.

Parameters:
fileName - The name of the file
number - The expected number of lines
Returns:
A list with the lines
Throws:
java.io.IOException - Possibly thrown when reading the file.
java.lang.Exception

extractDataFromFastaFile

public FastaFileData extractDataFromFastaFile(java.io.File file)
                                       throws java.io.IOException
Reads all the fasta data from a fastafile. This data is then stored in a FastFileData object in order to ensure maximum retention of fasta data.

Parameters:
file - The fastafile
Returns:
Object which contains the parsed data from the fastafile
Throws:
java.io.IOException - Fileoperations are prone to errors

extractDataFromFastaEmblFile

public FastaFileData extractDataFromFastaEmblFile(java.io.File file)
                                           throws java.io.IOException
Throws:
java.io.IOException