TREECON for Windows user
manual
INTRODUCTION
General
Information
TREECON is a software package developed primarily for the construction
and drawing of phylogenetic trees based on evolutionary distances computed
from nucleic and amino acid sequences. In distance methods, the evolutionary
distance is computed for all pairs of sequences and a phylogenetic tree
is inferred by considering the relationship between these distance values.
Different algorithms are available to construct a phylogenetic tree starting
from these evolutionary distances and a number of them are implemented
in TREECON. In estimating the evolutionary distances between sequences
it is preferable to correct for superimposed mutations and several equations
for this are implemented in the software package described. Programs for
rooting the unrooted evolutionary trees, for drawing the tree on the screen,
and for saving the tree are also included, as well as several other tools.
TREECON is simple to use and prior knowledge about computers is restricted
to an absolute minimum. Therefore, the package should be particularly suited
for molecular biologists and evolutionists who want to build evolutionary
trees based on their own molecular data. Starting from a simple ASCII text
file, containing nucleic or amino acid sequences with gaps required for
mutual alignment, one can produce publishable trees in a user-friendly
and straightforward way.
TREECON for Windows has a standard MS-Windows interface including pulldown
and pop-up menus, dialog boxes and scrollable lists. It is therefore assumed
that users are familiar with the basic interface elements of MS-Windows.
The program runs on IBM-compatible computers (80486 and higher) and
requires the Microsoft® WindowsTM 3.x, Windows 95 or Windows NT operating
system, a hard disk, a mouse and at least 8 Mbytes of RAM. The software
package consists of several executables which are managed through a principal
menu. As dynamic memory allocation is used throughout the program, the
size of the data is constrained only by the available memory.
The main advantages of TREECON for Windows over the older DOS version
of the program are device-independence, the multitasking environment, and
the possibility of displaying large trees containing hundreds of sequences.
Furthermore, due to the standard Windows interface, the software package
becomes more user-friendly.
TREECON for Windows costs 75$ or a comparable amount in local
currency. This fee is mainly to support my work and to buy new computer
hard- and software, and additionally to defray the costs of diskettes and
mailing expenses. The fee should be paid only once and includes updates
and new releases of the package.
Of course, there are still some deficiencies in the current version,
but developing TREECON is only part of my interests and responsibilities.
However, I will try to further improve the package and to add interesting
features. All suggestions are very welcome. Since TREECON is improved continuously
and it is impossible for me to inform users of every improvement.
Please check the TREECON web-site at URL http://bioc-www.uia.ac.be/u/yvdp/treeconw.html
for the latest information about TREECON for Windows.
References
and citation
A paper has been written presenting TREECON for Windows. If you have
used the program for the construction and/or drawing of the evolutionary
trees in a paper that you have written, please cite one of the following
references:
-
Van de Peer, Y., De Wachter, R. (1994) TREECON for Windows: a software
package for the construction and drawing of evolutionary trees for the
Microsoft Windows environment. Comput. Applic. Biosci. 10,
569-570.
A second paper describes the implementation of our ‘substitution rate
calibration’ method, which is a method that considers the substitution
rates of the individual nucleotides in a sequence alignment (see further)
in the computation of evolutionary distances:
-
Van de Peer, Y., De Wachter, R. (1997) Construction of evolutionary
distance trees with TREECON for Windows: accounting for variation in nucleotide
substitution rate among sites. Comput. Applic. Biosci. 13,
227-230.
A more detailed paper announcing the DOS version of the program was previously
published:
-
Van de Peer, Y., De Wachter, R. (1993) TREECON: a software package
for the construction and drawing of evolutionary trees. Comput. Applic.
Biosci. 9, 177-182.
A reprint of the article
in which TREECON is mentioned is highly appreciated!
Acknowledgements
I want to thank all the people in our research group for using and
testing TREECON in their analyses and for their encouragement and stimulating
conversations. Furthermore, I am especially grateful to Stefan Rensing
of the University of Freiburg in Germany, who tested TREECON extensively,
made many helpful suggestions and reported several bugs. Special thanks
also to Peter De Rijk for his help in some programmatorical problems
and Gert Van der Auwera for his help in drawing unrooted trees.
James S. Farris is greatly acknowledged for sharing his code to
convert trees saved in the New Hampshire bracket format. I also want to
thank all other individuals who made helpful suggestions, reported bugs,
and stimulated me in this work. Last but not least, I would like to thank
all people that have purchased TREECON for Windows.
Yves Van de Peer
Note
Of course, a user manual is never complete. Some inaccuracies may exist
while a few sections may be too brief or missing. Nevertheless, with some
good will and a little bit of trial and error, it should be no problem
to discover what the TREECON program does and what it does not.
INSTALLATION OF TREECON FOR
WINDOWS
At the moment, two different installation procedures are available.
One is for installation on IBM-computers running Windows 95 or Windows
NT (up to Windows XP), the other installation is for computers running the older Windows 3.1.
Windows 7
It is not possible to run TREECON on Windows 7, not natively and not in any compatibility or adminstrator mode.
It will be necessary to install and run TREECON in a virtual Windows environment.
Windows 95,
Windows NT and Windows XP
When Windows 95, Windows NT or Windows XP is used as operating system, installation
is very simple. Just insert the floppy in drive a, choose Add/Remove Programs
from the Control Panel and select a:\setup. The Control Panel can be found
via the Windows task bar Start|Settings|Control Panel. Installation of
TREECON for Windows will then proceed automatically.
-
By default, the TREECON program will be placed in the directory c:\treeconw\programs\
-
The test files will be automatically placed in the directory c:\treeconw\data\
-
Start TREECON for Windows by double-clicking the TREECON icon.
Windows 3.x
Since TREECON for Windows is compiled in 32-bit mode, it is necessary
to install the Win32 extension first. Win32 is an operating-system extension
to Windows 3.x that provides support for developing and running 32-bit
Windows executables. 32-bit executables run faster, make use of all available
RAM, and will run on both 16 and 32-bit versions of Windows and on future
processors hosting Windows.
Therefore, if Win32 is not installed on your computer, put the diskette
labeled Win32 in the floppy drive and copy its content (a file named PW1118.EXE)
to a directory on the hard disk. Run this executable (e.g. by double-clicking),
which is a self-extracting one. When this file is executed, several new
files will be created, amongst which the file named W32S125.EXE. Executing
this file creates several new files, amongst which the file setup.exe.
Run this file and installation of the Win32 extension will proceed automatically.
When this is done, your computer is ready to run 32-bit applications.
Installation of TREECON for Windows is then very simple. Just insert
the floppy labeled TREECON for Windows in drive a, choose File|Run from
the Program Manager and type a:\install. Installation of TREECON
for Windows will then proceed automatically.
-
By default, the TREECON program will be placed in the directory c:\treeconw\programs\
-
The test files will be automatically placed in the directory c:\treeconw\data\
-
Start TREECON for Windows by double-clicking the TREECON icon.
GETTING STARTED
When the TREECON
program is started, the principal menu appears.
In this main
menu of TREECON, the following options are available, and are usually performed
one after the other:
-
Distance estimation
-
Infer tree topology
-
Root unrooted trees
-
Draw phylogenetic trees
Additionally, buttons are available for the following items:
-
Tools
-
Help
-
About (TREECON)
-
Quit (return to Windows)
In the next chapters, the different steps involved in the construction
of pairwise distance trees will be discussed. For the impatient ones, the
next section briefly describes how to construct a first tree using the
default options of the program.
Making
a first tree using the default options
The next example shows how to create a neighbor-joining tree:
-
Start TREECON by double-clicking the TREECON icon
-
Choose the ‘Distance estimation’ option from the principal menu
-
Choose ‘Start distance estimation’ from the distance estimation menu
-
Select and open the file ‘test.seq’ under directory c:\treeconw\data\ (if
TREECON was installed in c:\treeconw\programs\)
-
Press OK in the ‘Sequence type’ menu: ‘Nucleic acid sequences’ and ‘TREECON’
sequence format are the default values
-
Press the ‘select all’ button in the ‘Select sequences’ menu. The names
of 20 small ribosomal subunit RNA sequences will now be highlighted. Press
OK
A set of sequences can be selected by pressing
the left mouse-button while holding the Ctrl-key (as in the Windows file
manager when selecting multiple files).
-
Press the OK button in the ‘Options’ menu. Distances will be computed by
the Jukes and Cantor equation (see further)
-
Press the OK button in the ‘Job status’ menu, when it says ‘finished’.
All evolutionary distances have now been computed and we are ready to infer
a tree topology
-
Select ‘Infer tree topology’ from the principal menu
-
Choose ‘Start inferring tree topology’ from the ‘inferring tree topology’
menu
-
Press the OK button in the ‘Options’ menu. A neighbor-joining tree will
be inferred
-
Press the OK button in the ‘Job status’ menu, when it says ‘finished’.
A tree topology has been inferred by neighbor-joining
-
Select ‘Root unrooted trees’ from the principal menu. Since neighbor-joining
infers an unrooted tree topology, it is necessary to root the unrooted
tree before we can display it on the screen
When a tree topology is inferred with clustering
methods such as UPGMA or WPGMA, the rooting procedure should be skipped
because these methods infer rooted tree topologies.
-
Choose ‘Start rooting unrooted trees’ from the ‘root unrooted trees’ menu
-
Press OK in the ‘Options’ menu. The tree will be rooted with a single sequence
that is a suitable outgroup to the other sequences.
-
Select the last sequence in the list. This is the red alga Palmaria palmata
-
Press the OK button in the ‘Job status’ menu, when it says ‘finished’.
The tree is now rooted with Palmaria palmata
-
Select ‘Draw Phylogenetic tree’ from the principal menu
-
In the TREECON drawing program, select File|Open|(new) tree. The tree should
now be displayed on the screen.
To construct a tree with bootstrap values,
repeat this procedure but select ‘bootstrap analysis’ in the ‘Options’
menu at each step of the process.
INPUT FILE FORMATS
Since TREECON, as most other tree construction programs, starts from
a set of aligned sequences, the input file format will be discussed first.
The input file for the ‘distance estimation’ module of TREECON is the
file containing the aligned sequences in the case of nucleic and amino
acid sequences or the gel results in the case of RFLP/AFLP data. In the
case of RFLP/AFLP data, the ‘sequence’ is a row consisting of ‘1’s and
‘0’s representing the presence or absence of a band on the gel for that
particular sample (see further).
TREECON can handle several file formats:
TREECON
input file format
The first line of the TREECON input file format should always contain
the number of characters, i.e. the number of aligned nucleotides. From
the second line on, the organisms and their sequences (or restriction fragment
results) are summed up sequentially. The name of the sequence should be
written on a separate line and may contain no more than 40 characters.
The format in which the sequence has to be written is very flexible (see
examples). The sequences may be written in one long stretch, divided into
several lines, or in blocks interleaved with blanks. It is also allowed
to write every sequence in a different format.
The sequence may only contain characters, hyphens (representing gaps)
and blanks. It is important that all sequences comprise the number
of symbols mentioned in the first line of the input file. Characters may
be written in upper- or lowercase. Blanks are not allowed at the end
of the sequence!
The input file may contain as many as 2000 sequences. However, the maximum
number of sequences that can be used to construct a tree with is set to
1000. If this is not enough, please contact the author. As dynamic memory
allocation is used, there is no limit to the size of the sequences, and
the number of sequences that can be compared depends on the available memory.
Examples of the TREECON format
example 1
50
Homo sapiens
AGUCGAGUC---GCAGAAACGCAUGAC-GACCACAUUUU-CCUUGCAAAG
Pan paniscus
AGUCGCGUCG--GCAGAAACGCAUGACGGACCACAUCAU-CCUUGCAAAG
Gorilla gorilla
AGUCGCGUCG--GCAGAUACGCAUCACGGAC-ACAUCAUCCCUCGCAGAG
Pongo pigmaeus
AGUCGCGUCGAAGCAGA--CGCAUGACGGACCACAUCAUCCCUUGCAGAG
example 2
50
Homo sapiens
AGUCGAGUC-- -GCAGAAAC GCAUGAC-GA CCACAUUUU-
CCUUGCAAAG
Pan paniscus
AGUCGCGUCG- -GCAGAAAC GCAUGACGGA CCACAUCAU-
CCUUGCAAAG
Gorilla gorilla
AGUCGCGUCG- -GCAGAUAC GCAUCACGGA C-ACAUCAUC
CCUCGCAGAG
Pongo pigmaeus
AGUCGCGUCGA AGCAGA--C GCAUGACGGA CCACAUCAUC
CCUUGCAGAG
example 3
50
Homo sapiens
AGUCGAGUC---GCAGAAAC
GCAUGAC-GACCACAUUUU-
CCUUGCAAAG
Pan paniscus
AGUCGCGUCG--GCAGAAAC
GCAUGACGGACCACAUCAU-
CCUUGCAAAG
Gorilla gorilla
AGUCGCGUCG--GCAGAUAC
GCAUCACGGAC-ACAUCAUC
CCUCGCAGAG
Pongo pigmaeus
AGUCGCGUCGAAGCAGA--C
GCAUGACGGACCACAUCAUC
CCUUGCAGAG
example 4 (RFLP/AFLP/RAPD data)
50
sample1
111101001110110110110111111111010111110111111111110111
sample2
111101000110111110110111111110010111100111111111110110
sample3
111101111110110110111111111101010111110111111111010111
sample4
101101001110110110110111111001010111100111111111010111
PHYLIP
input file format
PHYLIP (PHYLogy Inference Package) is the very well-known software
package of Joe Felsenstein (Department of Genetics, University of Washington,
Box 357360, Seattle, Washington 98195-7360, USA) for inferring phylogenies.
In this package, two different file formats can be used, viz. interleaved
(sequences are written in the form of a sequence alignment) and non-interleaved
or sequential (sequences are written one after the other).
In both file formats, the upper line contains two numbers, namely the
number of sequences, and the number of alignment positions (see examples).
The sequence names may not exceed 10 characters (but may include
punctuation marks and blanks).
Examples of the PHYLIP interleaved format
example 1
4 50
Homo AGUCGAGUC---GCAGAAACGCAUGAC
Pan pani AGUCGCGUCG--GCAGAAACGCAUGAC
Gorilla AGUCGCGUCG--GCAGAUACGCAUCAC
Pongo AGUCGCGUCGAAGCAGA--CGCAUGAC
-GACCACAUUUU-CCUUGCAAAG
GGACCACAUCAU-CCUUGCAAAG
GGAC-ACAUCAUCCCUCGCAGAG
GGACCACAUCAUCCCUUGCAGAG
example 2
4 50
Homo AGUCGAGUC---GCAGAAACGCAUGAC
Pan pani AGUCGCGUCG--GCAGAAACGCAUGAC
Gorilla AGUCGCGUCG--GCAGAUACGCAUCAC
Pongo AGUCGCGUCGAAGCAGA--CGCAUGAC
-GACCACAUUUU-CCUUGCAAAG
GGACCACAUCAU-CCUUGCAAAG
GGAC-ACAUCAUCCCUCGCAGAG
GGACCACAUCAUCCCUUGCAGAG
Examples of the PHYLIP non-interleaved (sequential)
format
example 1
4 50
Homo AGUCGAGUC---GCAGAAACGCAUGAC
-GACCACAUUUU-CCUUGCAAAG
Pan pani AGUCGCGUCG--GCAGAAACGCAUGAC
GGACCACAUCAU-CCUUGCAAAG
Gorilla AGUCGCGUCG--GCAGAUACGCAUCAC
GGAC-ACAUCAUCCCUCGCAGAG
Pongo AGUCGCGUCGAAGCAGA--CGCAUGAC
GGACCACAUCAUCCCUUGCAGAG
example 2
4 50
Homo (! ten characters of the
species name MUST be present)
AGUCGAGUC---GCAGAAACGCAUGAC-
GACCACAUUUU-CCUUGCAAAG
Pan pani
AGUCGCGUCG--GCAGAAACGCAUGACG
GACCACAUCAU-CCUUGCAAAG
Gorilla
AGUCGCGUCG--GCAGAUACGCAUCACG
GAC-ACAUCAUCCCUCGCAGAG
Pongo
AGUCGCGUCGAAGCAGA--CGCAUGACG
GACCACAUCAUCCCUUGCAGAG
-
When the input file is selected, you can make
a selection of the organisms (sequences) you want to construct a tree with.
If you want to select a set of sequences, but not all, press the control-key
while selecting organisms, just like you would do when selecting several
files in the file manager of Windows. It is also possible to save a selection
of sequences to a file. This selection can then be retrieved afterwards.
If the sequence names are not listed properly when selecting an input file,
there is most probably a format error in the input file.