ucto - Online in the Cloud

This is the command ucto that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


ucto - Unicode Tokenizer

SYNOPSYS


ucto [[options]] [input-file] [[output-file]]

DESCRIPTION


ucto ucto tokenizes text files: it separates words from punctuation, splits sentences (and
optionally paragraphs), and finds paired quotes. Ucto is preconfigured with tokenisation
rules for several languages.

OPTIONS


-c configfile
read settings from a file

-d value
set debug mode to 'value'

-e value
set input encoding. (default UTF8)

-f
disable filtering of special characters

-L language
Automatically selects a configuration file by language code. e.g. 'fr' will
select the file tokconfig-fr from the installation directory

-l
Convert to all lowercase

-u
Convert to all uppercase

-n
Emit one sentence per line on output

-m
Assume one sentence per line on input

--passthru
Don't tokenize, but perform input decoding and simple token role detection

-P
Disable Paragraph Detection

-Q
Enable Quote Detection. (this is experimental and may lead to unexpected results)

-S
Disable Sentence Detection

-s <string>
Set End-of-sentence marker. (Default <utt>)

-V
Show version information

-v
set Verbose mode

-F
Read a FoLiA XML document, tokenize it, and output the modified doc. (this disables
usage of most other options: -nulPQvsS)

--textclass cls
When tokenizing a FoLiA XML document, search for text nodes of class 'cls'

-X
Output FoLiA XML. (this disables usage of most other options: -nulPQvsS)

--id <DocId>
Use the specified Document ID for the FoLiA XML

-x <DocId> (obsolete)
Output FoLiA XML, use the specified Document ID. (this disables usage of most other
options: -nulPQvsS)

obsolete Use -X and --id instead

Use ucto online using onworks.net services



Latest Linux & Windows online programs