glam2-purge - Removes redundant sequences from a FASTA file
glam2-purge
file score [options]
glam2-purge is a modified version of Andrew Neuwald´s
purge
program that removes redundant sequences from a FASTA file. This is
recommended in order to prevent highly similar sequences distorting the search
for motifs. Purge works with either DNA or protein sequences and creates an
output file such that no two sequences have a (gapless) local alignment score
greater than a threshold specified by the user. The output file is named
<file>.<score>. The alignment score is based on the BLOSUM62
matrix for proteins, and on a +5/-1 scoring scheme for DNA. Purge can also be
used to mask tandem repeats. It uses the XNU program for this purpose.
-n
Sequences are DNA (default: protein).
-b
Use blast heuristic method (default for
protein).
-e
Use an exhaustive method (default for
DNA).
-q
Keep first sequence in the set.
-x
Use xnu to mask protein tandem repeats.
glam2(1),
glam2format(1),
glam2mask(1),
glam2scan(1),
xnu(1)
The full Hypertext documentation of GLAM2 is available online at
http://bioinformatics.org.au/glam2/ or on this computer in
/usr/share/doc/glam2/.
Purge was written by Andy Neuwald and is described in more detail in Neuwald et
al., "Gibbs motif sampling: detection of bacterial outer membrane protein
repeats", Protein Science, 4:1618–1632, 1995. Please cite it if
you use Purge.
If you use GLAM2, please cite: MC Frith, NFW Saunders, B Kobe, TL Bailey (2008)
Discovering sequence motifs with arbitrary insertions and deletions, PLoS
Computational Biology (in press).
Andrew Neuwald
-
- Author of purge, renamed glam2-purge in Debian.
Martin Frith
-
- Modified purge to be ANSI standard C and improved the user
interface.
Timothy Bailey
-
- Modified purge to be ANSI standard C and improved the user
interface.
Charles Plessy <plessy@debian.org>
-
- Formatted this manpage in DocBook XML for the Debian
distribution.
The source code and the documentation of Purge and GLAM2 are released in the
public domain.