swath - Online in the Cloud

This is the command swath that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


swath - General-purpose Thai word segmentation utility

SYNOPSIS


swath [options] < infile > outfile

DESCRIPTION


Thai script has no word delimitor. Applications need to recognize word boundaries before
they can do useful things with Thai text, such as line wrapping.

Swath provides word analysis filter to insert word delimitors into a given text stream.
It reads text from standard input, analyzes it for word boundaries by consulting a Thai
word list, and output to standard output the same text with the predefined word delimitors
inserted.

Currently, it can read plain text, HTML, RTF, LaTeX and Lambda (Unicode version of LaTeX
with Omega typesetter kernel) documents and insert common word delimitors for each format
(pipe `|' for plain text). But user can always override this with a preferred delimitor.

OPTIONS


-b [delimitor]
Define a string to be used as word delimitor code in the output text.

-d [dict-path]
Specify alternative dictionary location. dict-path must be either a directory
containing the swath dictionary file `swathdic.tri', or a path to the dictionary
file itself. The dictionary file must be a trie file prepared using
trietool-0.2(1) utility from libdatrie package.

If this option is given, swath will override normal dictionary search and will exit
on failure to find the given dictionary. Otherwise, if SWATHDICT environment is
set, it will try to open dictionary from the location specified by its value.
Otherwise, it will try the current working directory, and finally the usual
installed location.

-f [format]
Specify format of the input. Possible formats are: html, rtf, latex, lambda.

-m [scheme]
Choose word matching scheme when analyzing word boundaries. Possible schemes are
`long' (for longest or greedy matching) and `max' (for maximal matching, with least
words preferred). Maximal matching is the default value.

-u input-enc,output-enc
Specify encodings of input and output. input-enc and output-enc can be one of 'u'
(for UTF-8 encoding) and 't' (for TIS-620 encoding). Swath will convert the
character encoding as necessary. If omitted, TIS-620 encodings on both input and
output are assumed.

-v, --verbose
Turn on verbose mode.

-help, --help
Show help.

ENVIRONMENT VARIABLES


SWATHDICT
If specified, swath will search for dictionary in this location before the usual
places (current working directory and usual installed directory, respectively).
This value is overridden by -d option.

EXAMPLES


For LaTeX (to be used with babel-thai package):

$ swath -f latex < thaifile.tex > thaifile.ttex
$ latex thaifile.ttex

For HTML (to provide web pages to web browsers that cannot wrap Thai lines properly, but
support the <wbr> tag):

$ swath -f html < myweb.html > myweb-wbr.html

To preprocess a Thai UTF-8 encoded LaTeX file for babel-thai with tis620 inputenc:

$ swath -f latex -u u,t < thaifile.tex > thaifile.ttex
$ latex thaifile.ttex

This is equivalent to filtering with iconv(1):

$ iconv -f UTF-8 -t TIS-620 thaifile.tex | swath -f latex > thaifile.ttex
$ latex thaifile.ttex

To use longest matching scheme with LaTeX document:

$ swath -f latex -m long < thaifile.tex > thaifile.ttex
$ latex thaifile.ttex

To use an alternative dictionary from libthai:

$ swath -f latex -d /usr/share/libthai/thbrk.tri < thaifile.tex > thaifile.ttex

Use swath online using onworks.net services



Latest Linux & Windows online programs