LVB Manual – LVB phylogeny program, version 2.2 This manual was last updated on 10 November 2005. CONTENTS COPYRIGHT Part of this document is based on PHYLIP documentation (see ACKNOWLEDGEMENTS). The PHYLIP component of this document: © Copyright 1986-2000 by the University of Washington. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed. The remainder of this document: © Copyright 2003, 2004, 2005 by Daniel Barker. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed. lvb seeks parsimonious trees from an aligned nucleotide data matrix. It uses a simulated annealing heuristic search. In contrast to the more usual heuristic searches (stepwise addition and/or hill-climbing), simulated annealing can 'jump out' of local optima. Especially with large, complex data matrices, the simulated annealing heuristic may run faster and/or find a shorter tree. CITING LVB
Please cite the following paper if you use LVB: Barker, D. 2004. LVB: Parsimony and simulated annealing in the search for phylogenetic trees. Bioinformatics, 20, 274-275. The following may also be relevant: Barker, D. 1997. LVB 1.0: Reconstructing Evolution with Parsimony and Simulated Annealing (Edinburgh: Daniel Barker) Barker, D. 1999. Simulated annealing in the Search for Phylogenetic Trees. PhD Thesis, University of Edinburgh. RUNNING LVB
lvb is a command-line program. lvb reads the alignment file from the current directory (folder) and writes its main output to a file in the current directory. The user is prompted for the matrix format, the approximate time to run, the interpretation of gaps in the alignment and whether bootstrap replicates are required. Answers are entered using the keyboard. lvb logs progress information and errors to the screen. Windows
After downloading, copy Do not launch lvb for Windows by double-clicking it. It will run, but the window will vanish on completion. You will not see the final tree length, which is output to the screen just before lvb exits. You will also fail to see any error messages. MacOS X
After downloading, extract
If
Linux
After
downloading, extract
lvb
may be launched from a command-line, for example Unix
After downloading, compile lvb from the source code (see COMPILING LVB). Once this is done, it may be launched as for Linux. INPUT
Keyboard (standard input)
Keyboard
input is case-independent. So, for example, where the instructions below
suggest you type Matrix format
lvb can read matrices in PHYLIP 3.6 interleaved or PHYLIP 3.6 sequential format. These are described in the section on infile. When
prompted for the data matrix format, type Treatment of gaps
See the the table under Bases for a list of base codes allowed by lvb. A
gap represented by the letter ' ' ' When
prompted for the treatment of ' 'Fifth state' may give excessive weight to multi-site gaps, since each affected base position will be counted as one event. Random number seed
When
prompted for the random number seed, press The default value is taken from the system clock and hence will vary from one analysis to the next, changing every second. The default is usually appropriate. Duration of the analysis
When
prompted for the duration of the analysis, type A slow analysis examines more trees and might find a shorter tree than a fast analysis, which cuts corners to get a result more quickly. The difference in quality of results is most marked for large data matrices. LVB uses a stochastic search for the most parsimonious tree and might not always give exactly the same results for the same input. If results differ markedly between repeat analyses, this is an indication that the search is too fast. If using a fast search, switch to using a slow search if time permits. The overall effect of such variation may be less if you seek a bootstrap sample of trees (see Bootstrapping) rather than the most parsimonious tree(s) for the original data. The duration of the analysis is only approximate. Also, all analyses become slower for larger data matrices. Bootstrapping
When
prompted for the number of bootstrap replicates, enter the number of
replicates required. If bootstrapping is not required, enter the number 0 or
just press lvb allows any number of replicates from 1 to 1000000 inclusive. For each replicate, a bootstrap sample of sites in the alignment is generated and analyzed. For an alignment matrix of m sites, each bootstrap replicate contains m sites, randomly sampled with replacement from the originals. Compared to the original alignment, it is likely that some sites are left out, some are present once, and others are present twice or more. In lvb the probability of including a site is equal for all sites, irrespective of whether the site varies or is constant. The most parsimonious tree(s) for each replicate are output. There will be at least one tree for each replicate. If the search for any replicate found more than one equally parsimonious tree, all are output and the number of trees will exceed the number of replicates. infile
The
data matrix must be in a file called Layout
The simplest type of data matrix file looks something like this:
The first line of the input file contains the number of sequences and the number of characters (sites). These are in free format, separated by blanks. The information for each sequence follows, starting with a ten-character sequence name (which can include blanks and some punctuation marks), and continuing with the characters for that sequence. The name should come right at the start of the line, without any preceding blanks or tabs. It should be ten characters in length, filled out to the full ten characters by trailing blanks if shorter. Any printable ASCII/ISO character is allowed in the name, except for parentheses '(' and ')', square brackets '[' and ']', colon ':', semicolon ';' and comma ','. If you forget to extend the names to ten characters in length by blanks, an error message will result. The biological characters (bases or gaps) are each a single ASCII character, sometimes separated by blanks. The sequences can continue over multiple lines. When this is done the sequences must be either in interleaved format or sequential format. In sequential format all of one sequence is given, possibly on multiple lines, before the next starts. In interleaved format the first part of the file should contain the first part of each of the sequences, then possibly a line containing nothing but a carriage-return character, then the second part of each sequence, and so on. Only the first parts of the sequences should be preceded by names. The name must be on the same line as the first character of the data for that sequence. Here is a hypothetical example of interleaved format:
while in sequential format the same sequences would be:
If
each sequence only occupies one line in the matrix file, there is no
difference between sequential and interleaved format and lvb
can read the file in either way. Other than this special case, it is important
not to read an interleaved matrix as sequential or a sequential matrix as
interleaved. A Note that a portion of a sequence like this:
is perfectly legal, assuming that the sequence name has gone before and is filled out to full length by blanks. The above digits and blanks will be ignored, the sequence being taken as starting at the first base symbol (in this case an A). This should enable you to use output from many multiple-sequence alignment programs with only minimal editing. lvb
may have difficultires with spaces at the end of lines. The symptoms of this
problem are that lvb complains about a In interleaved format the present version of lvb may sometimes have difficulties with the blank lines between groups of lines, and if so you might want to retype those lines, making sure that they have only a carriage-return and no blank characters on them, or you may perhaps have to eliminate them. The symptoms of this problem are that lvb complains that the sequences are not properly aligned, and you can find no other cause for this complaint. Bases
The
sequences may contain A's, G's, C's and T's (or U's, which lvb
treats as equivalent to T's). Each ASCII character in the sequence must be
one of the letters These characters can be either upper or lower case, because the algorithms convert all input characters to upper case (which is how they are treated). The characters constitute the IUPAC (IUB) nucleic acid code plus some slight extensions. They enable input of nucleic acid sequences taking full account of any ambiguities in the sequence. For
further information on '
OUTPUT
Screen (standard output)
lvb logs its version, details of the analysis, indication of progress and any errors encountered to the standard output, which is usually the screen. Without bootstrapping, the rearrangement number (iteration) of the search and current tree length is logged every 10000 trees and every time tree length changes. During simulated annealing, the tree length can go up as well as down. LVB keeps and outputs the shortest trees encountered at any point during the search. The length of this tree or trees is logged to the screen near end of the analysis. With bootstrapping, the replicate number is logged, along with the number of rearrangements tries, the number of trees found and length of trees found for that replicate. outtree
Without
bootstrapping, the file With
bootstrapping, Trees use a subset of the 'Newick standard' tree format. This is accepted by many other programs. Trees
may be converted to graphics files using the Without
bootstrapping, if more than one equally parsimonious tree is found, these may
be combined in various ways using Output
trees are unrooted and branch lengths are not given. Trees may be rooted with
the COMPILING LVB
Compiling lvb is rarely necessary, because lvb is available at the LVB Web page as ready-to-run software for the following platforms:
However, for other platforms, or if you wish to modify the source code, you will have to compile lvb. Assuming
your system is UNIX-like, uses GNU Unpacking the source code
Assuming
This
gives you a main directory Compiler
flags
First,
edit the file Compilation
Now,
assuming you begin in the
Results of the above commands are:
After
changing the source code or Documentation
The
main documentation (i.e. this file) is Internal
documentation will be of interest to people who wish to modify or re-use the
source code of LVB. During a successful build, documentation in Documentation
of PHYLIP code within LVB is given separately, in BIOINFORMATICS APPLICATIONS
For
automated use of lvb, a 'wrapper' in the Perl language may
be used. This is
SUPPORT
AND REGISTRATION
Please send questions and bug reports to: db60@st-and.ac.uk To be placed on an email list to receive information on new
versions, please email ACKNOWLEDGEMENTS
lvb contains portions of PHYLIP 3.6a. This allows lvb to read PHYLIP-format matrix files. Also, most of the above documentation for infile is taken from the PHYLIP 3.6a manual. I wish to thank Joe Felsenstein for making PHYLIP freely available, and for advising on how to re-use it in lvb. SEE
ALSO
http://biology.st-andrews.ac.uk/cegg/lvb.htm http://evolution.genetics.washington.edu/phylip.html http://phylogeny.arizona.edu/macclade/macclade.html http://mesquiteproject.org/mesquite/mesquite.html http://taxonomy.zoology.gla.ac.uk/software.html |