classifier
Class LibSvm

java.lang.Object
  extended by classifier.LibSvm
All Implemented Interfaces:
Classifier

public class LibSvm
extends java.lang.Object
implements Classifier

Implementation of the SvmInterface interface for the libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) implementation of the SMO algorithm. This class just translates the necessary commands into methodcalls that are understandable by the java implementation of libsvm.

Author:
Michiel Van Bel

Nested Class Summary
 
Nested classes/interfaces inherited from interface classifier.Classifier
Classifier.DATA_TYPE
 
Constructor Summary
LibSvm(org.apache.log4j.Logger logger, ClassificationAction ca)
           
 
Method Summary
 void applyAttributeFilter(java.util.List<java.lang.Integer> attributeFilter, int featureNumberFilter, java.io.File toBeFilteredFile)
          After having used featureselection to get a filter, this filter can be used to change the featurefiles in order to optimize the svms.
 boolean buildClassifier()
          This method builds an SVM model file from a file with trainingexamples.
 java.lang.Double classify_single_instance_fast(double[] features)
           
 java.lang.String classify_single_instance(java.lang.String instance)
           
 void classify(java.lang.String testFile, java.lang.String outputClassification)
          Use the trained (or untrained, the modelfile must be set before though) SVM to classify data, and write the output to an outputfile
 CrossValidationResult crossValidate(int n, int maxPosTrain, int maxNegTrain)
          Performs a crossvalidation of trainingfile.
 java.lang.String generateFeatureString(java.util.List<java.lang.Double> data, Classifier.DATA_TYPE dataType)
          Creates a string of features for only one featurevector
 java.lang.String getFileExtension()
           
 java.lang.String getModelFile()
           
 int[] getPosNegExamplesInFile(java.io.File file)
          Returns the amount of positive and negative examples in a trainingfile.
 double getSigmoid_A()
           
 double getSigmoid_B()
           
 weka.core.Instances getTrainingFileInstances()
           
 boolean loadClassifier()
          Sets the modelfile, and - dependend on the implementation - there may be an attempt to build the SVM from this modelfile.
 boolean loadClassifier(java.lang.String svmFile)
           
 java.io.File mergeFeatureFiles(java.io.File tempFilePositive, java.io.File tempFileNegative)
          Merges the featurefiles (one with positive training features, one with negative training features), in order to make the actual training file.
 java.util.List<ValPosCombination> performAttributeEvaluation(boolean sort, weka.attributeSelection.AttributeEvaluator evaluator)
          Performs feature selection by evaluating different attributes.
 java.lang.String[] prepareCrossvalidationCommand(int fold, java.lang.String fileIn, java.lang.String fileOut)
          Creates an array with string values, to be parsed by svm_train from the libsvm package.
 java.lang.String[] prepareTrainingCommand(java.lang.String fileIn, java.lang.String fileOut)
          Creates an array with string values, to be parsed by svm_train from the libsvm package.
 void setModelFile(java.lang.String svmModelFile)
           
 void setOptions(ClassifierOptions options)
           
 void setSigmoid_A(double sigmoid_A)
           
 void setSigmoid_B(double sigmoid_B)
           
 void setSVMOptions(SVMOptions svm_options)
          Changes the options for the SVM itself (mainly the kernel function)
 java.lang.String to_genomeview_output(int id, java.lang.Double distance, int funsite_start, int funsite_stop, java.lang.String classification_name)
           
 java.lang.String to_splice_machine_output(java.lang.Double distance, int splicesite, int increase, java.lang.String classification_name)
           
 java.lang.String to_splice_machine_output(java.lang.String classification_result, int splicesite, int increase, java.lang.String classification_name)
           
 java.io.File writeTemporaryFeatureData(java.lang.String tempFileName, boolean forward_strand, java.util.List<java.util.List<java.lang.Double>> data, Classifier.DATA_TYPE dataType)
          This method writes the temporary featuredata (being all the features extracted from 1 sequence, each feature in a different list) to a file.
 java.io.File writeTemporaryFeatureData(java.lang.String tempFileName, java.util.List<java.util.List<java.lang.Double>> data, Classifier.DATA_TYPE dataType)
          This method writes the temporary featuredata (being all the features extracted from 1 sequence, each feature in a different list) to a file.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LibSvm

public LibSvm(org.apache.log4j.Logger logger,
              ClassificationAction ca)
Method Detail

getFileExtension

public java.lang.String getFileExtension()
Specified by:
getFileExtension in interface Classifier

crossValidate

public CrossValidationResult crossValidate(int n,
                                           int maxPosTrain,
                                           int maxNegTrain)
Performs a crossvalidation of trainingfile. The results of this crossvalidation (number of true positives,false positives,true negatives,false negatives and deduced numbers) are then returned. The normal procedure for crossvalidation can be followed. However, we extended the notion of it so different numbers of positive/negative examples can be used during the training phase of each crossvalidation step. The number of positive/ negative examples for the testing phase during each crossvalidation step remains the same (begin (n-1)*(total_amount)). For further information, see the Crossvalidation.txt document in the /doc subdirectory.

Specified by:
crossValidate in interface Classifier
Parameters:
n - The fold of the crossvalidation. Frequent numbers are 2,5 and 10
maxPosTrain - The maximum amount of positive training examples during the training phase of the crossvalidation.
maxNegTrain - The maximum amount of negative training examples during the training phase of the crossvalidation.
Returns:
The crossvalidationresult object, which contains the results of the crossvalidation, and the deduced statistics.

prepareCrossvalidationCommand

public java.lang.String[] prepareCrossvalidationCommand(int fold,
                                                        java.lang.String fileIn,
                                                        java.lang.String fileOut)
Creates an array with string values, to be parsed by svm_train from the libsvm package.

Specified by:
prepareCrossvalidationCommand in interface Classifier
Parameters:
fold - The fold of the crossvalidation
Returns:
Commandline array with options

prepareTrainingCommand

public java.lang.String[] prepareTrainingCommand(java.lang.String fileIn,
                                                 java.lang.String fileOut)
Creates an array with string values, to be parsed by svm_train from the libsvm package.

Specified by:
prepareTrainingCommand in interface Classifier
Parameters:
fold - The fold of the crossvalidation
Returns:
Commandline array with options

buildClassifier

public boolean buildClassifier()
This method builds an SVM model file from a file with trainingexamples.

Specified by:
buildClassifier in interface Classifier

mergeFeatureFiles

public java.io.File mergeFeatureFiles(java.io.File tempFilePositive,
                                      java.io.File tempFileNegative)
Merges the featurefiles (one with positive training features, one with negative training features), in order to make the actual training file. The type (donor/acceptor) is set by the constructor of the svm's, so it can take the necessary information from there.

Specified by:
mergeFeatureFiles in interface Classifier
Parameters:
tempFilePositive - The name of the file with features for positive training
tempFileNegative - The name of the file with features for negative training
Returns:
The resulting training file.

writeTemporaryFeatureData

public java.io.File writeTemporaryFeatureData(java.lang.String tempFileName,
                                              boolean forward_strand,
                                              java.util.List<java.util.List<java.lang.Double>> data,
                                              Classifier.DATA_TYPE dataType)
This method writes the temporary featuredata (being all the features extracted from 1 sequence, each feature in a different list) to a file. The type of data is also important, since it is necessary to label the data (positive, negative, unknown) according to the type of data. At first all data (all features of all sequences) was kept in memory, but this very quickly dissolved heap overflow problems.

Specified by:
writeTemporaryFeatureData in interface Classifier
Parameters:
tempFileName - The name of the file to which the data should be written.
data - The featuredata
dataType - The type of data (see enum in this interface)
Returns:
The file with the temporary data.

generateFeatureString

public java.lang.String generateFeatureString(java.util.List<java.lang.Double> data,
                                              Classifier.DATA_TYPE dataType)
Creates a string of features for only one featurevector

Specified by:
generateFeatureString in interface Classifier
Parameters:
data - The featuredata
dataType - The datatype (positive,negative,unclassified)
Returns:
The featurestring

writeTemporaryFeatureData

public java.io.File writeTemporaryFeatureData(java.lang.String tempFileName,
                                              java.util.List<java.util.List<java.lang.Double>> data,
                                              Classifier.DATA_TYPE dataType)
This method writes the temporary featuredata (being all the features extracted from 1 sequence, each feature in a different list) to a file. The type of data is also important, since it is necessary to label the data (positive, negative, unknown) according to the type of data. At first all data (all features of all sequences) was kept in memory, but this very quickly dissolved heap overflow problems.

Specified by:
writeTemporaryFeatureData in interface Classifier
Parameters:
tempFileName - The name of the file to which the data should be written.
data - The featuredata
dataType - The type of data (see enum in this interface)
Returns:
The file with the temporary data.

loadClassifier

public boolean loadClassifier()
Sets the modelfile, and - dependend on the implementation - there may be an attempt to build the SVM from this modelfile.

Specified by:
loadClassifier in interface Classifier
Parameters:
modelFile - The name of the modelfile

loadClassifier

public boolean loadClassifier(java.lang.String svmFile)
Specified by:
loadClassifier in interface Classifier

classify

public void classify(java.lang.String testFile,
                     java.lang.String outputClassification)
Use the trained (or untrained, the modelfile must be set before though) SVM to classify data, and write the output to an outputfile

Specified by:
classify in interface Classifier
Parameters:
testFile - The name of the file that contains the extracted features, outputdirectory is supposed to be in the filename.
outputFile - The name of the outputfile, outputdirectory is supposed to be in the filename.

classify_single_instance

public java.lang.String classify_single_instance(java.lang.String instance)
Specified by:
classify_single_instance in interface Classifier

classify_single_instance_fast

public java.lang.Double classify_single_instance_fast(double[] features)
Specified by:
classify_single_instance_fast in interface Classifier

to_splice_machine_output

public java.lang.String to_splice_machine_output(java.lang.String classification_result,
                                                 int splicesite,
                                                 int increase,
                                                 java.lang.String classification_name)
Specified by:
to_splice_machine_output in interface Classifier

to_splice_machine_output

public java.lang.String to_splice_machine_output(java.lang.Double distance,
                                                 int splicesite,
                                                 int increase,
                                                 java.lang.String classification_name)
Specified by:
to_splice_machine_output in interface Classifier

to_genomeview_output

public java.lang.String to_genomeview_output(int id,
                                             java.lang.Double distance,
                                             int funsite_start,
                                             int funsite_stop,
                                             java.lang.String classification_name)
Specified by:
to_genomeview_output in interface Classifier

performAttributeEvaluation

public java.util.List<ValPosCombination> performAttributeEvaluation(boolean sort,
                                                                    weka.attributeSelection.AttributeEvaluator evaluator)
Performs feature selection by evaluating different attributes.

Specified by:
performAttributeEvaluation in interface Classifier
Parameters:
sort - Whether to sort the resulting valposcombinations according to their values
evaluator - The evaluator used for performing the evaluation of the attributes
Returns:
A list with the values and the original positions of those values in order to be able to locate the classificationfeature this attribute belonged to.

getTrainingFileInstances

public weka.core.Instances getTrainingFileInstances()
Specified by:
getTrainingFileInstances in interface Classifier

applyAttributeFilter

public void applyAttributeFilter(java.util.List<java.lang.Integer> attributeFilter,
                                 int featureNumberFilter,
                                 java.io.File toBeFilteredFile)
After having used featureselection to get a filter, this filter can be used to change the featurefiles in order to optimize the svms.

Specified by:
applyAttributeFilter in interface Classifier
Parameters:
attributeFilter - The filter: this is an array with the numbers of the attributes that MUST be preserved.

setSVMOptions

public void setSVMOptions(SVMOptions svm_options)
Changes the options for the SVM itself (mainly the kernel function)

Parameters:
svm_options - The new options for the kernel function

setOptions

public void setOptions(ClassifierOptions options)
Specified by:
setOptions in interface Classifier

getPosNegExamplesInFile

public int[] getPosNegExamplesInFile(java.io.File file)
Returns the amount of positive and negative examples in a trainingfile.

Specified by:
getPosNegExamplesInFile in interface Classifier
Parameters:
file - The trainingfile
Returns:
An array of size 2, with the first number being the amount of positive training examples and the second number the amount of negative training examples.

getModelFile

public java.lang.String getModelFile()
Specified by:
getModelFile in interface Classifier

setModelFile

public void setModelFile(java.lang.String svmModelFile)
Specified by:
setModelFile in interface Classifier

getSigmoid_A

public double getSigmoid_A()
Specified by:
getSigmoid_A in interface Classifier

setSigmoid_A

public void setSigmoid_A(double sigmoid_A)
Specified by:
setSigmoid_A in interface Classifier

getSigmoid_B

public double getSigmoid_B()
Specified by:
getSigmoid_B in interface Classifier

setSigmoid_B

public void setSigmoid_B(double sigmoid_B)
Specified by:
setSigmoid_B in interface Classifier