This is the command boxshade that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator
PROGRAM:
NAME
boxshade - Pretty-printing of multiple sequence alignments
SYNOPSIS
boxshade
DESCRIPTION
BOXSHADE is a program for pretty-printing multiple alignment output. The program itself
doesn't do any alignment, you have to use a multiple alignment program like ClustalW or
Pileup and use the output of these programs as input for BOXSHADE.
-help
Show the help.
-check
Show the help and extend command line.
-def
Use defaults, no unnecessary questions.
-numdef
Use default numbering.
-dna
Assume DNA sequences, use box_dna.par.
-split
Create separate files for multiple pages.
-toseq=xxx
Shading according to sequence No. xxx.
-in=xxxxx
xxxxx is input file name.
-out=xxxxx
xxxxx is output file name.
-par=xxxxx
xxxxx is parameter file name.
-sim=xxxxx
xxxxx is file name for similar residues def.
-grp=xxxxx
xxxxx is file name for grouping residues def.
-thr=x
x is the fraction of sequences that must agree for a consensus.
-dev=x
x is output device class (see below).
-type=x
x is input file format (see below).
-ruler
Print ruler line.
-cons
Create consensus line.
-symbcons=xyz
xyz are consensus symbols.
-symbcons="xyz"
If the one above does not work, try this one.
-unix
Output files lines are terminated with LF only.
-mac
Output files lines are terminated with CR only.
-dos
Output files lines are terminated with CRLF.
This manual page was written for the Debian(TM) distribution because the original program
does not have a manual page. The presented information comes from the documentation of the
Web Service of the 3.21 version that is not available as a Debian package.
BOXSHADE is a program for creating good looking printouts from multiple-aligned protein or
DNA sequences. The program does no alignment by itself, it has to take as input a file
preprocessed by a multiple alignment program or a multiple file editor. See below for a
list of supported input formats and output devices. In the standard BOXSHADE output,
identical and similar residues in the multiple-alignment chart are represented by
different colors or shadings. There are some more options concerning the kind of shading
to be applied, sequence numbering, consensus output and so on. The user interface is a bit
clumsy at the moment, one has to answer a lot of questions in order to get the desired
output. There is, however, the possibility to use default parameters from a standard
parameter file or to supply the program with parameters from the command line. At the
moment, the VMS and DOS versions of BOXSHADE have identical user interfaces.
Input formats
BOXSHADE 3.2 knows about the following input file formats: (some of the are generally used
only for MSDOS or VMS systems) + CLUSTAL and CLUSTALV, multiple alignment program,
DOS/VMS/MAC default extension .ALN + ESEE, multiple sequence editor, DOS default extension
.ESE + PHYLIP, phylogenetic analysis package, DOS, VMS, UNIX default extension .PHY +
PILEUP and PRETTY of the GCG sequence analysis package VMS/UNIX default extensions .MSF
and .PRE NB!! you are strongly encouraged NOT to use the PRETTY format as input, it may be
incompatible with the revised version of .MSF input. We can't actually think why anyone
would use this format now, .MSF files are more useful generally. + MALIGNED, multiple
sequence editor, VMS only default extension .MAL BOXSHADE tries to determine the file type
from the extension but will work also if different extensions are used.
Output devices
POSTSCRIPT/EPS creates POSTSCRIPT(TM) files for printing on a Laserprinter or for further
conversion with a POSTSCRIPT interpreter (like GHOSTSCRIPT) + HPGL for export to various
graphics programs or for conversion/printing with the shareware program PRINTGL. Plotting
BOXSHADE output on a plotter is generally not recommended + RTF for export to various
word-processing and graphics programs + CRT, uses direct screen writes to the PC-monitor.
Possible options depend on the graphics adapter used. This output device is supported only
in the MSDOS version. + ANSI. On a PC, this option uses an ANSI device driver (ANSI.SYS)
that has to be loaded in CONFIG.SYS previously. Possible character renditions are reverse,
bold,underlined, blinking etc. On non-DOS systems, this option behaves more or less like
the VT100 output mode. + VT100 for display on a VT100 compatible terminal or emulator. +
ReGISterm for display on a ReGIS compatible graphics terminal or emulator. + ReGISfile for
later conversion by the program RETOS (copyright DEC) in order to print on DIGITALs
printer series. + LJ250 for printing on DIGITALS LJ250 color printer. + ASCII output
showing either the conserved residues or the varying ones (others as '-'). + FIG file for
xfig 2.1. + PICT files for import to Mac and PC graphics progs. Some of the formats above
offer the possibility of scaling the characters and of rotating the plot. Character size
has to be entered in 'point' units. Normal output orientation is in portrait mode
(PS/EPS/HPGL/PICT only), to obtain output in landscape orientation, 'rotate plot = y' has
to be chosen. When creating multi-page output, all pages are contained in a single output
file. If one page per file is desired, one has to use the command line parameter /SPLIT.
This is enforced when requesting EPSF or PICT file output, as multi-page EPSFs are a
contradiction of the purpose of an EPSF and large PICT files would probably be too big for
most personal computers. While using the terminal as output device, the 'RETURN' key has
to be pressed to obtain the next page of output.
Sequence numbering
Starting with version 2.2 there is the possibility to add numbering to the output files.
The numbers are printed between the sequence names and the sequence itself. Since most of
the input-files either use no numbering or number the first position in the alignment
always with a "1" (and that does not necessarily reflect the numbers within the original
sequence), the user is asked to enter the starting position for each sequence. The command
line flag /DEFNUM suppressed that question, a starting position of 1 is assumed for all
sequences. Boxshade starts with the value entered for the leftmost position and continues
numbering every valid symbol, skipping blanks, '-','.' and stuff like that.
Default parameters
Several people using previous releases of BOXSHADE pointed me to the need of having
default parameters for the various questions asked by the program. They argued that most
sites only use one type of input files, one output device and one choice of colors for the
output. I therefore added a management of default parameters allowing two levels of
assistance to the user. 1) all default parameters are contained in an ASCII file that can
be modified easily to accommodate the users taste. The format is roughly documented within
the file-header, it resembles the keyboard input one has to make if using the program
interactively. There are two such files supplied with this release of BOXSHADE,
BOX_DNA.PAR and BOX_PEP.PAR , holding some example parameters for peptide and
dna-comparisons. there are no big differences between these two, the major one is that
when shading DNA-comparisons one doesn't care of "similar" residues. 2) to run the program
with minimal user interaction, I have added the possibility to use command line
parameters. At the moment, you can use: /check : list all allowed command line parameters
(this list) and allows parameters to be added. /def : program runs without questions,
BOX_PEP.PAR is used as default /dna : makes the program use BOX_DNA.PAR as parameter file
/pep : makes the program use BOX_PEP.PAR as parameter file /in=xxx : makes the program
take xxx as input file /out=yyy : makes the program take yyy as output file (note1)
/par=zzz : makes the program use zzz as a default parameter file /type=1 : makes the
program assume an input file of type 1 (PRETTY/MSF) /dev=1 : makes the program assume and
output device of type 1 (CRT) /numdef : use default numbering (all sequences starting with
"1") /thr : threshold fraction of residues that must agree for a consensus /split : forces
one page per file output, creates multiple output files. /cons : makes the program create
an additional consensus line (see below) /symbcons=: influences the way the consensus line
is displayed. (see below) /unix : writes output files in unix style (LF only) (note2) /dos
: writes output files in DOS style (CR/LF) (note2) note1: on unix machines, use out=OUTPUT
for terminal output on DOS machines, use out=con: on VMS machines, use out=tt: note2: if
no mode is specified, the native style of the machine is used.
ATTENTION
on unix systems, the dash (-) instead of the slash (/) has to be used as separation
character for command line parameters. For example, a valid unix command line is:
boxshade -def -numdef -cons -symbcons=" .*"
Shading strategies (similarity to consensus or single sequence)
Starting with version 3, BOXSHADE has a new shading system. The first difference is the
introduction of a threshold fraction of residues that must agree for there to be a
consensus. Previously, the program assumed that SOME residue was always the consensus. If
no two residues were the same, the first sequence provided the consensus residue. This
threshold fraction can be any number between 0.0 and 1.0. The number of sequences that
must agree for there to be a consensus is, as you might expect, this fraction times the
total number of sequences in the alignment (fractions of a sequence count as one, e.g. 3.2
becomes 4). The second difference is the idea of 'consensus by similarity'; this tries to
take account of the situations where all the sequences may have (for example) R or K at a
position, but neither in a majority. It would not be logical to shade one type of residue
as 'identical' and the other as 'similar'; the threshold function might also eliminate
both as being in too small numbers. Therefore, if there is not a single residue that is
conserved (greater than the threshold) at a position, the program looks for a 'group' of
amino acids that fulfills the requirements. 'Groups' are defined in the .grp files. Users
can tailor these to their personal prejudices. Any amino acid not listed is assumed not to
be in a group. All members of a group are considered to be mutually similar, unlike the
.sim files, described below. If consensus by similarity is found, all the residues in the
consensus are shaded using the 'similar' shading defined by the user. If the user does not
select 'shading by similarity', only identity-type consensus is looked at. If an
identity-type consensus is found, and similarity shading is in operation, the program
looks to see if the remaining residues are similar to the consensus residue. Here the
box_xxx.sim files are used. The main difference between relationships in these files and
those in the .grp files is that, e.g. in a .grp file the line STA means that all three
a.a.s are mutually similar. In a .sim file S TA means that both T and A are considered
similar to S, where there is a conserved S residue in more than threshold number of
sequences. However, it does NOT mean that T and A are similar to each other. Note that
cases where two residues, or groups of residues, fulfill the threshold requirements (as
could happen with values of the thr. fraction less than or equal to 0.5) are treated as
having no consensus. This describes the main shading model 'shading according to a
consensus'. The alternative model is called 'shading according to a master sequence'. In
this case the user is prompted for a sequence of the alignment and consecutively that
sequence is taken to be the 'consensus'. Only those residues become shaded that are
identical or similar to the chosen sequence. Output obtained with this option tends to be
less shaded and neglects similarities between the other (non-chosen) sequences. Starting
in V2.7, this 'master sequence' can be hidden. Thus, it only influences the shading of the
other sequences without being shown itself.
Consensus display
Starting with version 2.5, BOXSHADE offers the possibility to create an additional line
holding a consensus symbol. This line can either be obtained by using the command line
qualifier /CONS or interactively by answering the question ' create consensus? '. The way
this consensus line is displayed can be modified by the command line parameter
SYMBCONS=xyz, by editing the respective entry in the .PAR file or interactively. Since the
SYMBCONS syntax is not intuitive, here a brief description: The SYMBCONS parameter consist
of exactly three symbols: + the first one stands for 'normal' sequence residues that are
not involved in any similar/identical relationship. + the second symbol represents
positions that are similar in all sequences of the alignment. See the files BOX_PEP.SIM
and BOX_DNA.SIM to see what residues are considered similar. + the third symbol represents
positions that are identical in all sequences of the alignment. A SYMBCONS parameter
string " .*" (blank/point/asterisk) means: label all positions in the alignment with
totally identical residues by an asterisk, all positions with all similar residues by a
point and do not mark the other positions. The letter 'B' can be used instead of the
blank, this is necessary e.g. when using the command line option /SYMBCONS=B.* which gives
the same result as the above example. The option /SYMBCONS= .* would result in an
unexpected behaviour because MSDOS squeezes blanks out of the command line. Besides
points, asterisks and other symbols, there are two special characters when they appear in
the SYMBCONS string: 'L' and 'U'. An 'L' means, that a lowercase representation of the
most abundant residue at that position is to be used instead of a fixed consensus symbol
while an 'U' means an uppercase character representation of that residue. A possible
application would be the SYMBCONS string " LU" where similar residues are represented by
lowercase characters and identical by uppercase characters.
Shareware/PD programs useful in conjunction with BOXSHADE
multiple alignment files that to be used by BOXSHADE can be created, amongst others, by
the following PD/freeware programs: + PHYLIP by Joe Felsenstein, available by ftp from
anthro.utah.edu + ESEE by Eric Cabot, available from the same sources as BOXSHADE (see
above) + CLUSTAL by Des Higgins, ditto for preview/conversion of POSTSCRIPT files, the
program GHOSTSCRIPT from GNU software foundation is highly recommended. It is available
from all major MSDOS ftp-sites (e.g. SIMTEL or ftp.uni-koeln.de) There is also a version
tested for use with boxshade available at vax0.biomed.uni-koeln.de although this might be
not the most recent release. for Mac users, there is MacGhostscript, also available from
the main archives (info-mac, umich and their mirrors). A *very* good tool for putting a
preview image into an EPSF file, often a prerequisite for incorporating into a drawing
package, is PS2EPS, by Peter Lerup. This can be found on info-mac. for preview/conversion
of HPGL files, the shareware program PRINTGL 1.18 by Cary Ravitz is highly recommended. It
is available from many MSDOS ftp sites and from netserv@embl-heidelberg.de - output on dot
printers - Since PRINTGL offers a broad choice of printer types and is a nice program, I
recommend its use for printing BOXSHADE output on non-POSTSCRIPT printers. Use HPGL output
with options 0F1N for normal residues 2F1N for identical residues 3F1N for similar
residues 2F4N for conserved residues 8 for character size not rotated (these are the
standard parameters in BOX_PEP.PAR) for creating a HPGL files. (lets call it TEST.PLT) Now
use PRINTGL either interactively by calling PMI or use a command line like: PRINTGL
/Fx/S0340/Waaac/Ptest.plt where test.plt is to be replaced by the filename to convert and
the x in the expression /Fx is to be replaced by the letter of the printer you use. (See
the PRINTGL documentation for further details)
RESTRICTIONS
The RTF output and PHYLIP input implementations are still experimental. Please tell me of
your experiences with the program. + the current DOS version supports only 13 sequences
with 2000 residues each. This parameters can be easily changed in the source code. If you
cannot compile the sources because you are lacking a pascal compiler, contact the author
for precompiled versions
CITING BOXSHADE
There is no publication on BOXSHADE and none is planned. Most people just use it for
figures in publications and don't mention anything, this is ok for the authors of
BOXSHADE. If you really feel like mentioning BOXSHADE, you could either acknowledge it in
the figure legend or in the Mat&Meth part on sequence analysis.
Use boxshade online using onworks.net services