sfetch - Online in the Cloud

This is the command sfetch that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


sfetch - get a sequence from a flatfile database.

SYNOPSIS


sfetch [options] seqname

DESCRIPTION


sfetch retrieves the sequence named seqname from a sequence database.

Which database is used is controlled by the -d and -D options, or "little databases" and
"big databases". The directory location of "big databases" can be specified by
environment variables, such as $SWDIR for Swissprot, and $GBDIR for Genbank (see -D for
complete list). A complete file path must be specified for "little databases". By
default, if neither option is specified and the name looks like a Swissprot identifier
(e.g. it has a _ character), the $SWDIR environment variable is used to attempt to
retrieve the sequence seqname from Swissprot.

A variety of other options are available which allow retrieval of subsequences (-f,-t);
retrieval by accession number instead of by name (-a); reformatting the extracted sequence
into a variety of other formats (-F); etc.

If the database has been SSI indexed, sequence retrieval will be extremely efficient;
else, retrieval may be painfully slow (the entire database may have to be read into memory
to find seqname). SSI indexing is recommended for all large or permanent databases. The
program sindex creates SSI indexes for any sequence file.

sfetch was originally named getseq, and was renamed because it clashed with a GCG program
of the same name.

OPTIONS


-a Interpret seqname as an accession number, not an identifier.

-d <seqfile>
Retrieve the sequence from a sequence file named <seqfile>. If a GSI index
<seqfile>.gsi exists, it is used to speed up the retrieval.

-f <from>
Extract a subsequence starting from position <from>, rather than from 1. See -t.
If <from> is greater than <to> (as specified by the -t option), then the sequence
is extracted as its reverse complement (it is assumed to be nucleic acid sequence).

-h Print brief help; includes version number and summary of all options, including
expert options.

-o <outfile>
Direct the output to a file named <outfile>. By default, output would go to
stdout.

-r <newname>
Rename the sequence <newname> in the output after extraction. By default, the
original sequence identifier would be retained. Useful, for instance, if retrieving
a sequence fragment; the coordinates of the fragment might be added to the name
(this is what Pfam does).

-t <to>
Extract a subsequence that ends at position <to>, rather than at the end of the
sequence. See -f. If <to> is less than <from> (as specified by the -f option),
then the sequence is extracted as its reverse complement (it is assumed to be
nucleic acid sequence)

-D <database>
Retrieve the sequence from the main sequence database coded <database>. For each
code, there is an environment variable that specifies the directory path to that
database. Recognized codes and their corresponding environment variables are -Dsw
(Swissprot, $SWDIR); -Dpir (PIR, $PIRDIR); -Dem (EMBL, $EMBLDIR); -Dgb (Genbank,
$GBDIR); -Dwp (Wormpep, $WORMDIR); and -Dowl (OWL, $OWLDIR). Each database is read
in its native flatfile format.

-F <format>
Reformat the extracted sequence into a different format. (By default, the sequence
is extracted from the database in the same format as the database.) Available
formats are embl, fasta, genbank, gcg, strider, zuker, ig, pir, squid, and raw.

EXPERT OPTIONS


--informat <s>
Specify that the sequence file is in format <s>, rather than the default FASTA
format. Common examples include Genbank, EMBL, GCG, PIR, Stockholm, Clustal, MSF,
or PHYLIP; see the printed documentation for a complete list of accepted format
names. This option overrides the default format (FASTA) and the -B Babelfish
autodetection option.

Use sfetch online using onworks.net services



Latest Linux & Windows online programs