EnglishFrenchSpanish

OnWorks favicon

sim4db - Online in the Cloud

Run sim4db in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command sim4db that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


sim4db - batch spliced alignment of cDNA sequences to a target genome

SYNOPSIS


A simple command line invocation:

sim4db -genomic g.fasta -cdna c.fasta -scr script -output o.sim4db

where:
- 'c.fasta' and 'g.fasta' are the multi-fasta cDNA and genome sequence files
- 'script' is a script file indicating individual alignments to be computed
- output in sim4db format will be sent to the file 'o.sim4db' ('-' for standard output)

A more complex invocation:

sim4db -genomic g.fasta -cdna c.fasta -output o.sim4db [options]

DESCRIPTION


sim4db performs fast batch alignment of large cDNA (EST, mRNA) sequence sets to a set of
eukaryotic genomic regions. It uses the sim4 and sim4cc algorithms to determine the
alignments, but incorporates a fast sequence indexing and retrieval mechanism, implemented
in the sister package leaff(1), to speedily process large volumes of sequences.

While sim4db produces alignments in the same way as sim4 or sim4cc, it has additional
features to make it more amenable for use with whole-genome annotation pipelines. A script
file can be used to group pairings between cDNAs and their corresponding genomic regions,
to be aligned as one run and using the same set of parameters. Sim4db also optionally
reports more than one alignment for the same cDNA within a genomic region, as long as they
meet user-defined criteria such as minimum length, percentage sequence identity or
coverage. This feature is instrumental in finding all alignments of a gene family at one
locus. Lastly, the output is presented either as custom sim4db alignments or as GFF3 gene
features.

OPTIONS


Salient options:
-cdna use these cDNA sequences (multi-fasta file)
-genomic use these genomic sequences (multi-fasta file)
-script use this script file
-pairwise sequentially align pairs of sequences

If none of the '-script' and '-pairwise' options
is specified, sim4db performs all-against-all
alignments between pairs of cDNA and genomic sequences.

-output write output to this file
-gff3 report output in GFF3 format
-interspecies use sim4cc for inter-species alignments (default sim4)

Filter options:
-mincoverage iteratively find all exon models with the specified
minimum PERCENT COVERAGE
-minidentity iteratively find all exon models with the specified
minimum PERCENT EXON IDENTITY
-minlength iteratively find all exon models with the specified
minimum ABSOLUTE COVERAGE (number of bp matched)
(default 0)
-alwaysreport always report <number> exon models, even if they
are below the quality thresholds

If no mincoverage or minidentity or minlength is given, only
the best exon model is returned. This is the DEFAULT operation.

You will probably want to specify ALL THREE of mincoverage,
minidentity and minlength! Don't assume the default values
are what you want!

You will DEFINITELY want to specify at least one of mincoverage,
minidentity and minlength with alwaysreport! If you don't,
mincoverage will be set to 90 and minidentity to 95 -- to reduce
the number of spurious matches when a good match is found.

Auxiliary options:
-nodeflines don't include the defline in the sim4db output
-alignments print alignments

-polytails DON'T mask poly-A and poly-T tails
-cut trim marginal exons if A/T % > x (poly-AT tails)

-noncanonical don't force canonical splice sites
-splicemodel use the following splice model: 0 - original sim4;
1 - GeneSplicer; 2 - Glimmer; options 1 and 2 are
only available with '-interspecies'.
Default for sim4 is 0, and for sim4cc is 1.

-forcestrand Force the strand prediction to always be
one of 'forward' or 'reverse'

Execution options:
-threads Use n threads.
-touch create this file when the program finishes execution

Debugging options:
-v print status to stderr while running
-V print script lines (stderr) as they are being processed

Developer options:
-Z set the spaced seed pattern
-H set the relink weight factor (H=1000 recommended for mRNAs)
-K set the first MSP threshold
-C set the second MSP threshold
-Ma set the limit of the number of MSPs allowed
-Mp same, as percentage of bases in cDNA
NOTE: If used, both -Ma and -Mp must be specified!

Use sim4db online using onworks.net services


Free Servers & Workstations

Download Windows & Linux apps

Linux commands

Ad