This is the command cutadapt that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator
PROGRAM:
NAME
cutadapt - manual page for cutadapt 1.8.3
DESCRIPTION
cutadapt version 1.8.3 Copyright © 2010-2015 Marcel Martin <[email protected]>
cutadapt removes adapter sequences from high-throughput sequencing reads.
Usage:
cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq
For paired-end reads:
cutadapt -a ADAPT1 -A ADAPT2 [options] -o out1.fastq -p out2.fastq in1.fastq
in2.fastq
Replace "ADAPTER" with the actual sequence of your 3' adapter. IUPAC wildcard characters
are supported. The reverse complement is *not* automatically searched. All reads from
input.fastq will be written to output.fastq with the adapter sequence removed. Adapter
matching is error-tolerant. Multiple adapter sequences can be given (use further -a
options), but only the best-matching adapter will be removed.
Input may also be in FASTA format. Compressed input and output is supported and
auto-detected from the file name (.gz, .xz, .bz2). Use the file name '-' for standard
input/output. Without the -o option, output is sent to standard output.
Some other available features are:
* Various other adapter types (5' adapters, "mixed" 5'/3' adapters etc.) *
Trimming a fixed number of bases * Quality trimming * Trimming colorspace reads *
Filtering reads by various criteria
Use "cutadapt --help" to see all command-line options. See
http://cutadapt.readthedocs.org/ for full documentation.
OPTIONS
--version
show program's version number and exit
-h, --help
show this help message and exit
-f FORMAT, --format=FORMAT
Input file format; can be either 'fasta', 'fastq' or 'sra-fastq'. Ignored when
reading csfasta/qual files (default: auto-detect from file name extension).
Options that influence how the adapters are found:
Each of the following three parameters (-a, -b, -g) can be used multiple times and
in any combination to search for an entire set of adapters of possibly different
types. Only the best matching adapter is trimmed from each read (but see the
--times option). Instead of giving an adapter directly, you can also write
file:FILE and the adapter sequences will be read from the given FILE (which must be
in FASTA format).
-a ADAPTER, --adapter=ADAPTER
Sequence of an adapter that was ligated to the 3' end. The adapter itself and
anything that follows is trimmed. If the adapter sequence ends with the '$'
character, the adapter is anchored to the end of the read and only found if it is a
suffix of the read.
-g ADAPTER, --front=ADAPTER
Sequence of an adapter that was ligated to the 5' end. If the adapter sequence
starts with the character '^', the adapter is 'anchored'. An anchored adapter must
appear in its entirety at the 5' end of the read (it is a prefix of the read). A
non-anchored adapter may appear partially at the 5' end, or it may occur within the
read. If it is found within a read, the sequence preceding the adapter is also
trimmed. In all cases, the adapter itself is trimmed.
-b ADAPTER, --anywhere=ADAPTER
Sequence of an adapter that was ligated to the 5' or 3' end. If the adapter is
found within the read or overlapping the 3' end of the read, the behavior is the
same as for the -a option. If the adapter overlaps the 5' end (beginning of the
read), the initial portion of the read matching the adapter is trimmed, but
anything that follows is kept.
-e ERROR_RATE, --error-rate=ERROR_RATE
Maximum allowed error rate (no. of errors divided by the length of the matching
region) (default: 0.1)
--no-indels
Do not allow indels in the alignments (allow only mismatches). Currently only
supported for anchored adapters. (default: allow both mismatches and indels)
-n COUNT, --times=COUNT
Try to remove adapters at most COUNT times. Useful when an adapter gets appended
multiple times (default: 1).
-O LENGTH, --overlap=LENGTH
Minimum overlap length. If the overlap between the read and the adapter is shorter
than LENGTH, the read is not modified. This reduces the no. of bases trimmed purely
due to short random adapter matches (default: 3).
--match-read-wildcards
Allow IUPAC wildcards in reads (default: False).
-N, --no-match-adapter-wildcards
Do not interpret IUPAC wildcards in adapters.
Options for filtering of processed reads:
--discard-trimmed, --discard
Discard reads that contain the adapter instead of trimming them. Also use -O in
order to avoid throwing away too many randomly matching reads!
--discard-untrimmed, --trimmed-only
Discard reads that do not contain the adapter.
-m LENGTH, --minimum-length=LENGTH
Discard trimmed reads that are shorter than LENGTH. Reads that are too short even
before adapter removal are also discarded. In colorspace, an initial primer is not
counted (default: 0).
-M LENGTH, --maximum-length=LENGTH
Discard trimmed reads that are longer than LENGTH. Reads that are too long even
before adapter removal are also discarded. In colorspace, an initial primer is not
counted (default: no limit).
--no-trim
Match and redirect reads to output/untrimmed-output as usual, but do not remove
adapters.
--max-n=LENGTH
The max proportion of N's allowed in a read. A number < 1 will be treated as a
proportion while a number > 1 will be treated as the maximum number of N's
contained.
--mask-adapter
Mask adapters with 'N' characters instead of trimming them.
Options that influence what gets output to where:
--quiet
Do not print a report at the end.
-o FILE, --output=FILE
Write modified reads to FILE. FASTQ or FASTA format is chosen depending on input.
The summary report is sent to standard output. Use '{name}' in FILE to demultiplex
reads into multiple files. (default: trimmed reads are written to standard output)
--info-file=FILE
Write information about each read and its adapter matches into FILE. See the
documentation for the file format.
-r FILE, --rest-file=FILE
When the adapter matches in the middle of a read, write the rest (after the
adapter) into FILE.
--wildcard-file=FILE
When the adapter has wildcard bases ('N's), write adapter bases matching wildcard
positions to FILE. When there are indels in the alignment, this will often not be
accurate.
--too-short-output=FILE
Write reads that are too short (according to length specified by -m) to FILE.
(default: discard reads)
--too-long-output=FILE
Write reads that are too long (according to length specified by -M) to FILE.
(default: discard reads)
--untrimmed-output=FILE
Write reads that do not contain the adapter to FILE. (default: output to same file
as trimmed reads)
Additional modifications to the reads:
-u LENGTH, --cut=LENGTH
Remove LENGTH bases from the beginning or end of each read. If LENGTH is positive,
the bases are removed from the beginning of each read. If LENGTH is negative, the
bases are removed from the end of each read. This option can be specified twice if
the LENGTHs have different signs.
-q [5'CUTOFF,]3'CUTOFF, --quality-cutoff=[5'CUTOFF,]3'CUTOFF
Trim low-quality bases from 5' and/or 3' ends of reads before adapter removal. If
one value is given, only the 3' end is trimmed. If two comma-separated cutoffs are
given, the 5' end is trimmed with the first cutoff, the 3' end with the second. The
algorithm is the same as the one used by BWA (see documentation). (default: no
trimming)
--quality-base=QUALITY_BASE
Assume that quality values are encoded as ascii(quality + QUALITY_BASE). The
default (33) is usually correct, except for reads produced by some versions of the
Illumina pipeline, where this should be set to 64. (Default: 33)
--trim-n
Trim N's on ends of reads.
-x PREFIX, --prefix=PREFIX
Add this prefix to read names
-y SUFFIX, --suffix=SUFFIX
Add this suffix to read names
--strip-suffix=STRIP_SUFFIX
Remove this suffix from read names if present. Can be given multiple times.
-c, --colorspace
Colorspace mode: Also trim the color that is adjacent to the found adapter.
-d, --double-encode
When in colorspace, double-encode colors (map 0,1,2,3,4 to A,C,G,T,N).
-t, --trim-primer
When in colorspace, trim primer base and the first color (which is the transition
to the first nucleotide)
--strip-f3
For colorspace: Strip the _F3 suffix of read names
--maq, --bwa
MAQ- and BWA-compatible colorspace output. This enables -c, -d, -t, --strip-f3 and
-y '/1'.
--length-tag=TAG
Search for TAG followed by a decimal number in the description field of the read.
Replace the decimal number with the correct length of the trimmed read. For
example, use --length-tag 'length=' to correct fields like 'length=123'.
--no-zero-cap
Do not change negative quality values to zero. Colorspace quality values of -1
would appear as spaces in the output FASTQ file. Since many tools have problems
with that, negative qualities are converted to zero when trimming colorspace data.
Use this option to keep negative qualities.
-z, --zero-cap
Change negative quality values to zero. This is enabled by default when
-c/--colorspace is also enabled. Use the above option to disable it.
Paired-end options.:
The -A/-G/-B/-U options work like their -a/-b/-g/-u counterparts.
-A ADAPTER
3' adapter to be removed from the second read in a pair.
-G ADAPTER
5' adapter to be removed from the second read in a pair.
-B ADAPTER
5'/3 adapter to be removed from the second read in a pair.
-U LENGTH
Remove LENGTH bases from the beginning or end of each read (see --cut).
-p FILE, --paired-output=FILE
Write second read in a pair to FILE.
--untrimmed-paired-output=FILE
Write the second read in a pair to this FILE when no adapter was found in the first
read. Use this option together with --untrimmed-output when trimming pairedend
reads. (Default: output to same file as trimmed reads.)
Use cutadapt online using onworks.net services