This is the command mmorph that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator
PROGRAM:
NAME
mmorph - MULTEXT morphology tool
SYNOPSIS
information:
mmorph [ -vh ]
parse only:
mmorph -y | -z [ -a addfile ]
-m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]
generate:
mmorph -c | -n [ -t trace_level ] [ -s trace_level ] [ -a addfile ]
-m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]
simple lookup:
mmorph [ -fi ] [ -b | -k ] [ -r rejectfile ]
-m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]
record/field lookup:
mmorph -C classes [ -fU ] [ -E | -O ] [ -b | [ -k ] [ -B class ]]
-m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]
dump database:
mmorph -p | -q
-m morphfile [ -d debug_map ] [ -l logfile ] [ infile [ outfile ]]
DESCRIPTION
In the simplest mode of operation, with just the -m morphfile option, mmorph operates in
lookup mode: it will open an existing database called morphfile.db and lookup all the
string segments (usually corresponding to words) in the input.
To create the database from the lexical entries specified in "morphfile", use -c -m
morphfile. The file morphfile.db should not exist. When the database is complete it will
lookup the segments in the input. If used ineractively (input and output is a terminal), a
prompt is printed when the program expects the user to type a segment string. No
prompting occurs in record/field mode.
To test the rule applications on the lexical entries specified in morphfile, without
creating a database and without looking up segments, use -n -m morphfile. This
automatically sets the trace level to 1 if it was not specified.
In order to do the same operations as above, but on the alternate set of lexical entries
in addfile, use the extra option -a addfile. The lexical entries in morphfile will be
ignored. This is useful when making additions to a standard morphological description.
Be aware that entries added to the database morphfile.db do not replace existing ones.
How to test a morphological description
Use the -n option. In the Grammar section, specify goal rules that will match the desired
results. In the Lexicon section specify the lexical items you want to test. When running
all rules will be applied (recursively) to the lexical items, if the rule is a goal, then
the result of the application is printed on the output.
Suggestion: Put the two parts mentioned above (goal rules and Lexicon section) in separate
files and reference these files with an #include directive where they should occur in the
main input file.
If you are using an existing description and want to test only new lexical entries, use
the options -n -a addfile, and put the lexical entries in addfile.
OPTIONS
-a addfile
Ignore lexical entries in morphfile, take them from addfile instead.
-B class
Specifies the record class that occurs before the beginning of a sentence.
Capitalized words occurring just after such records will also be looked up with all
their letters converted to lowercase (according to LC_CTYPE, see below).
-b fold case before lookup. Uppercase letters are converted to lowercase letters
(according to LC_CTYPE, see below) before a word is looked up.
-C classes
Determines record/field mode. Specifies the record classes that should be looked
up. Class names should be separated by comma ",", TAB, space, bar "|" or backslash
"\".
-c Create a new database for lookup. The name of the created file is the name of
morphfile (-m option) with suffix .db. It should not exist; if it exists the user
should remove it manually before running mmorph -c (this is a minimal protection
against accidental overwriting a database that might have taken a long time to
create).
-d debug_map
Specify which debug options are wanted. Each bit in debug_map corresponds to an
option.
bit decimal hexadecimal purpose
no bits 0 0x0 no debug option (default)
1 1 0x1 debug initialisation
2 2 0x2 debug yacc parsing
3 4 0x4 debug rule combination
4 8 0x8 debug spelling application
5 16 0x10 print statistics with -p or -q options
all bits -1 0xffff all debug options whatever they are
To combine options add the decimal or hexadecimal values together. Example: -t 0x5
specifies bits (options) 1 and 4.
-E In record/field mode, extends the morphology annotations if they already exist (the
default is to leave existing annotations as is).
-O In record/field mode, overwrite the morphology annotations if they already exist
(the default is to leave existing annotations as is).
-f Flush the output after each segment lookup. This is useful only if input and output
are piped from and to a program that needs to synchronize them.
-h Print help and exit.
-i Prepend the result of each lookup with the identifier of the input segment it
corresponds to. Currently input segments are identified by their sequential number,
starting at 0. With this indication, the extra newline separating the solutions
for different input segments is not printed because it is not needed. If a lookup
has no solutions, only the segment identifier is printed on the output. The segment
identifier is also prepended to rejected segments. A tab always follows the
segment identifier.
-k fallback fold case. If a word lookup failed, then convert all uppercase letters to
lowercase and try lookup again. (conversion is done according to LC_CTYPE, see
below).
-l logfile
Specify the file for writing trace and error messages. Defaults to standard error.
-m morphfile
Specify the file containing the morphology description. See mmorph (5) for a
description of the formalism's syntax.
-n No database creation or lookup (test mode).
-p Dump the typed feature structure database to outfile (or standard output). The
count of distinct tfs is given in the logfile (or standard error) if bit 5 of debug
option is set.
-q Dump the forms in the database to outfile (or standard output). Some statistics
are given in the logfile (or standard error) if bit 5 of debug option is set.
-r rejectfile
In non record/field mode, specifies the file where to write input segments that
could not be looked up. Defaults to standard error.
-s trace_level
Trace spelling rule application:
0 no tracing (default).
1 trace valid surface forms.
2 trace rules whose lexical part match.
3 trace surface left context match (surface word construction).
4 trace surface right context mismatch and rule blocking.
5 trace rule non blocking.
A trace_level implies all preceding ones.
-t trace_level
Specify the level of tracing for rule application:
0 no tracing (default).
1 trace goal rules that apply.
2 trace all rules that apply, indentation indicates the recursion depth.
10 trace also rules that were tried but did not apply
A trace_level implies all preceding ones.
-U In record/field mode, unknown words (i.e. that were unsuccessfully looked up) are
annotated with ??\??.
-v Print version and exit.
-y Parse only: do not process the description other than for syntax checking. While
developping a morphology description you may use this option to catch syntax errors
quickly after each modification before running it "for real".
-z implies -y. Parse and output the lexical descriptions in normalized form.
infile file containing the segments to lookup, one per line. Defaults to the standard
input.
outfile
file in which the output of the program is written. One line per solution.
Solutions of different input segments are separated by an empty line. Defaults to
the standard output.
WORD GRAMMAR AND SPELLING RULES
For a detailed account of the principles and mechanisms used in mmorph, please refer to
the documents cited in the SEE ALSO section below.
Briefly sketched, morphosyntactic descriptions written for mmorph describe how words are
constructed by the concatenation of morphemes, and how this concatenation process changes
the spelling of these morphemes. The first part, the word structure grammar, is specified
by restricted context free rewrite rules whose formalism is inspired by unification based
systems (cf. Shieber 1986). The second part, the spelling changes, is specified by
spelling rules in a formalism based on the two level model of morphology. This approach
to morphology is described in Ritchie, Russell et. al, 1992 and more concisely in Pulman
and Hepple 1993.
ENVIRONMENT VARIABLES
To decide which characters are displayable on the output, mmorph uses the language
specific description that setlocale(3) sets according to the environment variable
LC_CTYPE. For the languages that are dealt with in MULTEXT it is a good idea to have that
variable set to iso_8859_1.
EXAMPLES
Here is a summary of the common usage of mmorph options:
mmorph -n -m morphfile
Test mode: reads the whole of morphfile and prints results on standard error. No database
is created, no words are looked up.
mmorph -c -m morphfile
Database creation: reads the whole of morphfile and stores the results in a database
(morphfile.db). Typed feature structures are collected in a separate file
(morphfile.tfs). Standard input is read for words to look up in the new database.
mmorph -m morphfile
Lookup mode: reads only the Alphabets, Attributes and Types sections of morphfile.
Standard input is read for words to look up according to the existing database
(mmorphfile.db and morphfile.tfs).
mmorph -m morphfile -a addfile
Addition mode: ignores the Lexicon section of morphfile, but addfile is consulted, and
the results are added to the database. Standard input is read for words to look up
according to the augmented database (mmorphfile.db and morphfile.tfs).
DIAGNOSTICS
Error messages should be self explanatory. Please refer to mmorph(5) for a formal
description of the syntax.
Use mmorph online using onworks.net services