EnglishFrenchSpanish

OnWorks favicon

variantCaller - Online in the Cloud

Run variantCaller in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command variantCaller that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


variantCaller - variant-calling algorithms for PacBio sequencing data

SYNOPSIS


variantCaller.py is invoked from the command line. For example, a simple invocation is:

variantCaller.py -j8 --algorithm=quiver \
-r lambdaNEB.fa \
-o variants.gff \
aligned_reads.cmp.h5

which requests that variant calling proceed, - using 8 worker processes, - employing the
quiver algorithm, - taking input from the file aligned_reads.cmp.h5, - using the FASTA
file lambdaNEB.fa as the reference, - and writing output to variants.gff (see pbgff(5)).

A particularly useful option is --referenceWindow/-w: this option allows the user to
direct the tool to perform variant calling exclusively on a window of the reference
genome, where the

OPTIONS


variantCaller.py --help

will provide a help message explaining all available options.

NOTES


Input and output
variantCaller.py requires two input files:

· A file of reference-aligned reads in PacBio's standard cmp.h5 format;

· A FASTA file that has been processed by ReferenceUploader.

The tool's output is formatted in the GFF format, as described in (how to link to other
file?). External tools can be used to convert the GFF file to a VCF or BED file---two
other standard interchange formats for variant calling.

NOTE:
Input cmp.h5 file requirements

variantCaller.py requires its input cmp.h5 file to be be sorted. An unsorted file can
be sorting using the tool cmpH5Sort.py.

The quiver(1) algorithm in variantCaller requires its input cmp.h5 file to have the
following pulse features:

System Message: ERROR/3 (doc/VariantCallerFunctionalSpecification.rst:, line 69)
Unexpected indentation.

· InsQV,

· SubsQV,

· DelQV,

· DelTag,

· MergeQV.

The plurality(1) algorithm can be run on cmp.h5 files that lack these features.

The input file is the main argument to variantCaller.py, while the output file is provided
as an argument to the -o flag. For example,

variantCaller.py aligned_reads.cmp.h5 -r lambda.fa -o variants.gff

will read input from aligned_reads.cmp.h5, using the reference lambda.fa, and send output
to the file variants.gff. The extension of the filename provided to the -o flag is
meaningful, as it determines the output file format. The file formats presently
supported, by extension, are

.gff GFFv3 format

.txt a simplified human readable format used primarily by the developers

If the -o flag is not provided, the default behavior is to output to a variants.gff in the
current directory.

NOTE:
variantCaller.py does not modify its input cmp.h5 file in any way. This is in contrast
to previous variant callers in use at PacBio, which would write a consensus dataset to
the input cmp.h5 file.

Available algorithms
At this time there are two algorithms available for variant calling: plurality and quiver.

Plurality is a simple and very fast procedure that merely tallies the most frequent read
base or bases found in alignment with each reference base, and reports deviations from the
reference as potential variants.

Quiver is a more complex procedure based on algorithms originally developed for CCS.
Quiver leverages the quality values (QVs) provided by upstream processing tools, which
provide insight into whether insertions/deletions/substitutions were deemed likely at a
given read position. Use of quiver requires the ConsensusCore library as well as trained
parameter set, which will be loaded from a standard location (TBD). Quiver can be thought
of as a QV-aware local-realignment procedure.

Both algorithms are expected to converge to zero errors (miscalled variants) as coverage
increases; however quiver should converge much faster (i.e., fewer errors at low
coverage), and should provide greater variant detection power at a given error level.

Confidence values
Both quiver and plurality make a confidence metric available for every position of the
consensus sequence. The confidence should be interpreted as a phred-transformed posterior
probability that the consensus call is incorrect; i.e.

QV = -10 \log_{10}(p_{err})

variantCaller.py clips reported QV values at 93---larger values cannot be encoded in a
standard FASTQ file.

Chemistry specificity
The Quiver algorithm parameters are trained per-chemistry. SMRTanalysis software loads
metadata into the cmp.h5 to indicate the chemistry used per movie. Quiver sees this table
and automatically chooses the appropriate parameter set to use. This selection can be
overridden by a command line flag.

When multiple chemistries are represented in the reads in a cmp.h5, Quiver will model each
read appropriately using the parameter set for its chemistry, thus yielding optimal
results.

Performance Requirements
variantCaller.py performs variant calling in parallel using multiple processes. Work
splitting and inter-process communication are handled using the Python multiprocessing
module. Work can be split among an arbitrary number of processes (using the -j
command-line flag), but for best performance one should use no more worker processes than
there are CPUs in the host computer.

The running time of the plurality algorithm should not exceed the runtime of the BLASR
process that produced the cmp.h5. The running time of the quiver algorithm should not
exceed 4x the runtime of BLASR.

The amount of core memory (RAM) used among all the python processes launched by a
variantCaller.py run should not exceed the size of the uncompressed input .cmp.h5 file.

Use variantCaller online using onworks.net services


Free Servers & Workstations

Download Windows & Linux apps

  • 1
    PAC Manager
    PAC Manager
    PAC is a Perl/GTK replacement for
    SecureCRT/Putty/etc (linux
    ssh/telnet/... gui)... It provides a GUI
    to configure connections: users,
    passwords, EXPECT regula...
    Download PAC Manager
  • 2
    GeoServer
    GeoServer
    GeoServer is an open-source software
    server written in Java that allows users
    to share and edit geospatial data.
    Designed for interoperability, it
    publishes da...
    Download GeoServer
  • 3
    Firefly III
    Firefly III
    A free and open-source personal finance
    manager. Firefly III features a
    double-entry bookkeeping system. You can
    quickly enter and organize your
    transactions i...
    Download Firefly III
  • 4
    Apache OpenOffice Extensions
    Apache OpenOffice Extensions
    The official catalog of Apache
    OpenOffice extensions. You'll find
    extensions ranging from dictionaries to
    tools to import PDF files and to connect
    with ext...
    Download Apache OpenOffice Extensions
  • 5
    MantisBT
    MantisBT
    Mantis is an easily deployable, web
    based bugtracker to aid product bug
    tracking. It requires PHP, MySQL and a
    web server. Checkout our demo and hosted
    offerin...
    Download MantisBT
  • 6
    LAN Messenger
    LAN Messenger
    LAN Messenger is a p2p chat application
    for intranet communication and does not
    require a server. A variety of handy
    features are supported including
    notificat...
    Download LAN Messenger
  • More »

Linux commands

  • 1
    abidw
    abidw
    abidw - serialize the ABI of an ELF
    file abidw reads a shared library in ELF
    format and emits an XML representation
    of its ABI to standard output. The
    emitted ...
    Run abidw
  • 2
    abilint
    abilint
    abilint - validate an abigail ABI
    representation abilint parses the native
    XML representation of an ABI as emitted
    by abidw. Once it has parsed the XML
    represe...
    Run abilint
  • 3
    coresendmsg
    coresendmsg
    coresendmsg - send a CORE API message
    to the core-daemon daemon ...
    Run coresendmsg
  • 4
    core_server
    core_server
    core_server - The primary server for
    SpamBayes. DESCRIPTION: Currently serves
    the web interface only. Plugging in
    listeners for various protocols is TBD.
    This ...
    Run core_server
  • 5
    fwflash
    fwflash
    fwflash - program to flash image file
    to a connected NXT device ...
    Run fwflash
  • 6
    fwts-collect
    fwts-collect
    fwts-collect - collect logs for fwts
    bug reporting. ...
    Run fwts-collect
  • More »

Ad