boxshade - Pretty-printing of multiple sequence alignments
is a program for pretty-printing multiple alignment output. The
program itself doesn't do any alignment, you have to use a multiple alignment
program like ClustalW or Pileup and use the output of these programs as input
Show the help.
Show the help and extend command line.
Use defaults, no unnecessary questions.
Use default numbering.
Assume DNA sequences, use box_dna.par.
Create separate files for multiple
Shading according to sequence No.
is input file name.
is output file name.
is parameter file name.
is file name for similar residues def.
is file name for grouping residues def.
is the fraction of sequences that must agree for a consensus.
is output device class (see below).
is input file format (see below).
Print ruler line.
Create consensus line.
are consensus symbols.
If the one above does not work, try this
Output files lines are terminated with LF
Output files lines are terminated with CR
Output files lines are terminated with
This manual page was written for the Debian(TM) distribution because the
original program does not have a manual page. The presented information comes
from the documentation of the Web Service of the 3.21 version that is not
available as a Debian package.
BOXSHADE is a program for creating good looking printouts from multiple-aligned
protein or DNA sequences. The program does no alignment by itself, it has to
take as input a file preprocessed by a multiple alignment program or a
multiple file editor. See below for a list of supported input formats and
output devices. In the standard BOXSHADE output, identical and similar
residues in the multiple-alignment chart are represented by different colors
or shadings. There are some more options concerning the kind of shading to be
applied, sequence numbering, consensus output and so on. The user interface is
a bit clumsy at the moment, one has to answer a lot of questions in order to
get the desired output. There is, however, the possibility to use default
parameters from a standard parameter file or to supply the program with
parameters from the command line. At the moment, the VMS and DOS versions of
BOXSHADE have identical user interfaces.
BOXSHADE 3.2 knows about the following input file formats: (some of the are
generally used only for MSDOS or VMS systems) + CLUSTAL and CLUSTALV, multiple
alignment program, DOS/VMS/MAC default extension .ALN + ESEE, multiple
sequence editor, DOS default extension .ESE + PHYLIP, phylogenetic analysis
package, DOS, VMS, UNIX default extension .PHY + PILEUP and PRETTY of the GCG
sequence analysis package VMS/UNIX default extensions .MSF and .PRE NB!! you
are strongly encouraged NOT to use the PRETTY format as input, it may be
incompatible with the revised version of .MSF input. We can't actually think
why anyone would use this format now, .MSF files are more useful generally. +
MALIGNED, multiple sequence editor, VMS only default extension .MAL BOXSHADE
tries to determine the file type from the extension but will work also if
different extensions are used.
POSTSCRIPT/EPS creates POSTSCRIPT(TM) files for printing on a Laserprinter or
for further conversion with a POSTSCRIPT interpreter (like GHOSTSCRIPT) + HPGL
for export to various graphics programs or for conversion/printing with the
shareware program PRINTGL. Plotting BOXSHADE output on a plotter is generally
not recommended + RTF for export to various word-processing and graphics
programs + CRT, uses direct screen writes to the PC-monitor. Possible options
depend on the graphics adapter used. This output device is supported only in
the MSDOS version. + ANSI. On a PC, this option uses an ANSI device driver
(ANSI.SYS) that has to be loaded in CONFIG.SYS previously. Possible character
renditions are reverse, bold,underlined, blinking etc. On non-DOS systems,
this option behaves more or less like the VT100 output mode. + VT100 for
display on a VT100 compatible terminal or emulator. + ReGISterm for display on
a ReGIS compatible graphics terminal or emulator. + ReGISfile for later
conversion by the program RETOS (copyright DEC) in order to print on DIGITALs
printer series. + LJ250 for printing on DIGITALS LJ250 color printer. + ASCII
output showing either the conserved residues or the varying ones (others as
'-'). + FIG file for xfig 2.1. + PICT files for import to Mac and PC graphics
progs. Some of the formats above offer the possibility of scaling the
characters and of rotating the plot. Character size has to be entered in
'point' units. Normal output orientation is in portrait mode (PS/EPS/HPGL/PICT
only), to obtain output in landscape orientation, 'rotate plot = y' has to be
chosen. When creating multi-page output, all pages are contained in a single
output file. If one page per file is desired, one has to use the command line
parameter /SPLIT. This is enforced when requesting EPSF or PICT file output,
as multi-page EPSFs are a contradiction of the purpose of an EPSF and large
PICT files would probably be too big for most personal computers. While using
the terminal as output device, the 'RETURN' key has to be pressed to obtain
the next page of output.
Starting with version 2.2 there is the possibility to add numbering to the
output files. The numbers are printed between the sequence names and the
sequence itself. Since most of the input-files either use no numbering or
number the first position in the alignment always with a "1" (and
that does not necessarily reflect the numbers within the original sequence),
the user is asked to enter the starting position for each sequence. The
command line flag /DEFNUM suppressed that question, a starting position of 1
is assumed for all sequences. Boxshade starts with the value entered for the
leftmost position and continues numbering every valid symbol, skipping blanks,
'-','.' and stuff like that.
Several people using previous releases of BOXSHADE pointed me to the need of
having default parameters for the various questions asked by the program. They
argued that most sites only use one type of input files, one output device and
one choice of colors for the output. I therefore added a management of default
parameters allowing two levels of assistance to the user. 1) all default
parameters are contained in an ASCII file that can be modified easily to
accommodate the users taste. The format is roughly documented within the
file-header, it resembles the keyboard input one has to make if using the
program interactively. There are two such files supplied with this release of
BOXSHADE, BOX_DNA.PAR and BOX_PEP.PAR , holding some example parameters for
peptide and dna-comparisons. there are no big differences between these two,
the major one is that when shading DNA-comparisons one doesn't care of
"similar" residues. 2) to run the program with minimal user
interaction, I have added the possibility to use command line parameters. At
the moment, you can use: /check : list all allowed command line parameters
(this list) and allows parameters to be added. /def : program runs without
questions, BOX_PEP.PAR is used as default /dna : makes the program use
BOX_DNA.PAR as parameter file /pep : makes the program use BOX_PEP.PAR as
parameter file /in=xxx : makes the program take xxx as input file /out=yyy :
makes the program take yyy as output file (note1) /par=zzz : makes the program
use zzz as a default parameter file /type=1 : makes the program assume an
input file of type 1 (PRETTY/MSF) /dev=1 : makes the program assume and output
device of type 1 (CRT) /numdef : use default numbering (all sequences starting
with "1") /thr : threshold fraction of residues that must agree for
a consensus /split : forces one page per file output, creates multiple output
files. /cons : makes the program create an additional consensus line (see
below) /symbcons=: influences the way the consensus line is displayed. (see
below) /unix : writes output files in unix style (LF only) (note2) /dos :
writes output files in DOS style (CR/LF) (note2) note1: on unix machines, use
out=OUTPUT for terminal output on DOS machines, use out=con: on VMS machines,
use out=tt: note2: if no mode is specified, the native style of the machine is
on unix systems, the dash (-) instead of the slash (/) has to be used as
separation character for command line parameters. For example, a valid unix
command line is: boxshade -def -numdef -cons -symbcons=" .*"
Starting with version 3, BOXSHADE has a new shading system. The first difference
is the introduction of a threshold fraction of residues that must agree for
there to be a consensus. Previously, the program assumed that SOME residue was
always the consensus. If no two residues were the same, the first sequence
provided the consensus residue. This threshold fraction can be any number
between 0.0 and 1.0. The number of sequences that must agree for there to be a
consensus is, as you might expect, this fraction times the total number of
sequences in the alignment (fractions of a sequence count as one, e.g. 3.2
becomes 4). The second difference is the idea of 'consensus by similarity';
this tries to take account of the situations where all the sequences may have
(for example) R or K at a position, but neither in a majority. It would not be
logical to shade one type of residue as 'identical' and the other as
'similar'; the threshold function might also eliminate both as being in too
small numbers. Therefore, if there is not a single residue that is conserved
(greater than the threshold) at a position, the program looks for a 'group' of
amino acids that fulfills the requirements. 'Groups' are defined in the .grp
files. Users can tailor these to their personal prejudices. Any amino acid not
listed is assumed not to be in a group. All members of a group are considered
to be mutually similar, unlike the .sim files, described below. If consensus
by similarity is found, all the residues in the consensus are shaded using the
'similar' shading defined by the user. If the user does not select 'shading by
similarity', only identity-type consensus is looked at. If an identity-type
consensus is found, and similarity shading is in operation, the program looks
to see if the remaining residues are similar to the consensus residue. Here
the box_xxx.sim files are used. The main difference between relationships in
these files and those in the .grp files is that, e.g. in a .grp file the line
STA means that all three a.a.s are mutually similar. In a .sim file S TA means
that both T and A are considered similar to S, where there is a conserved S
residue in more than threshold number of sequences. However, it does NOT mean
that T and A are similar to each other. Note that cases where two residues, or
groups of residues, fulfill the threshold requirements (as could happen with
values of the thr. fraction less than or equal to 0.5) are treated as having
no consensus. This describes the main shading model 'shading according to a
consensus'. The alternative model is called 'shading according to a master
sequence'. In this case the user is prompted for a sequence of the alignment
and consecutively that sequence is taken to be the 'consensus'. Only those
residues become shaded that are identical or similar to the chosen sequence.
Output obtained with this option tends to be less shaded and neglects
similarities between the other (non-chosen) sequences. Starting in V2.7, this
'master sequence' can be hidden. Thus, it only influences the shading of the
other sequences without being shown itself.
Starting with version 2.5, BOXSHADE offers the possibility to create an
additional line holding a consensus symbol. This line can either be obtained
by using the command line qualifier /CONS or interactively by answering the
question ' create consensus? '. The way this consensus line is displayed can
be modified by the command line parameter SYMBCONS=xyz, by editing the
respective entry in the .PAR file or interactively. Since the SYMBCONS syntax
is not intuitive, here a brief description: The SYMBCONS parameter consist of
exactly three symbols: + the first one stands for 'normal' sequence residues
that are not involved in any similar/identical relationship. + the second
symbol represents positions that are similar in all sequences of the
alignment. See the files BOX_PEP.SIM and BOX_DNA.SIM to see what residues are
considered similar. + the third symbol represents positions that are identical
in all sequences of the alignment. A SYMBCONS parameter string " .*"
(blank/point/asterisk) means: label all positions in the alignment with
totally identical residues by an asterisk, all positions with all similar
residues by a point and do not mark the other positions. The letter 'B' can be
used instead of the blank, this is necessary e.g. when using the command line
option /SYMBCONS=B.* which gives the same result as the above example. The
option /SYMBCONS= .* would result in an unexpected behaviour because MSDOS
squeezes blanks out of the command line. Besides points, asterisks and other
symbols, there are two special characters when they appear in the SYMBCONS
string: 'L' and 'U'. An 'L' means, that a lowercase representation of the most
abundant residue at that position is to be used instead of a fixed consensus
symbol while an 'U' means an uppercase character representation of that
residue. A possible application would be the SYMBCONS string " LU"
where similar residues are represented by lowercase characters and identical
by uppercase characters.
multiple alignment files that to be used by BOXSHADE can be created, amongst
others, by the following PD/freeware programs: + PHYLIP by Joe Felsenstein,
available by ftp from anthro.utah.edu + ESEE by Eric Cabot, available from the
same sources as BOXSHADE (see above) + CLUSTAL by Des Higgins, ditto for
preview/conversion of POSTSCRIPT files, the program GHOSTSCRIPT from GNU
software foundation is highly recommended. It is available from all major
MSDOS ftp-sites (e.g. SIMTEL or ftp.uni-koeln.de) There is also a version
tested for use with boxshade available at vax0.biomed.uni-koeln.de although
this might be not the most recent release. for Mac users, there is
MacGhostscript, also available from the main archives (info-mac, umich and
their mirrors). A *very* good tool for putting a preview image into an EPSF
file, often a prerequisite for incorporating into a drawing package, is
PS2EPS, by Peter Lerup. This can be found on info-mac. for preview/conversion
of HPGL files, the shareware program PRINTGL 1.18 by Cary Ravitz is highly
recommended. It is available from many MSDOS ftp sites and from
email@example.com - output on dot printers - Since PRINTGL offers a
broad choice of printer types and is a nice program, I recommend its use for
printing BOXSHADE output on non-POSTSCRIPT printers. Use HPGL output with
options 0F1N for normal residues 2F1N for identical residues 3F1N for similar
residues 2F4N for conserved residues 8 for character size not rotated (these
are the standard parameters in BOX_PEP.PAR) for creating a HPGL files. (lets
call it TEST.PLT) Now use PRINTGL either interactively by calling PMI or use a
command line like: PRINTGL /Fx/S0340/Waaac/Ptest.plt where test.plt is to be
replaced by the filename to convert and the x in the expression /Fx is to be
replaced by the letter of the printer you use. (See the PRINTGL documentation
for further details)
The RTF output and PHYLIP input implementations are still experimental. Please
tell me of your experiences with the program. + the current DOS version
supports only 13 sequences with 2000 residues each. This parameters can be
easily changed in the source code. If you cannot compile the sources because
you are lacking a pascal compiler, contact the author for precompiled versions
There is no publication on BOXSHADE and none is planned. Most people just use it
for figures in publications and don't mention anything, this is ok for the
authors of BOXSHADE. If you really feel like mentioning BOXSHADE, you could
either acknowledge it in the figure legend or in the Mat&Meth part on
ISREC, Bioinformatics Group,
Epalinges s/Lausanne Switzerland
BBSRC Institute for Animal Health,
C port of Boxshade. (don't send Kay or Michael
any questions concerning the 'C' version of boxshade)
Wrote the manpage.
Updated the manpage
Copyright © 1997 Kay Hofmann, Michael Baron and Harmut Schirmer
Copyright © 2003, 2007 Steffen Moeller, Charles Plessy
The above copyright notices refer to the program and its manpage respectively.
BOXSHADE is completely public-domain and may be passed around and modified
without any notice to the authors.
This manual page was written for the Debian(TM) system but may be used by
others. Permission is granted to copy, distribute and/or modify this document
under same terms as boxshade itself.