msort - Online in the Cloud

This is the command msort that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


msort - sort records in complex ways

SYNOPSIS


msort <options> [<input file>]

DESCRIPTION


msort is a program for sorting text files in sophisticated ways. It was developed
initially for alphabetizing dictionaries of languages in which the ordering may be quite
different from English but has many other uses.

msort allows you to sort blocks of text delimited in a number of ways rather than just
lines and to specify particular fields of a record as sort keys using either their
position, counted from either end, or by matching regular expressions to their tags.

msort is capable of sorting on multiple keys, so that when two records tie on one key, the
tie may be broken on another. Any or all keys may be optional. How absent optional keys
are ordered with respect to present keys may be set separately for each key.

msort allows you to specify arbitrary sort orders and to define virtually unlimited
numbers of multigraphs of effectively unlimited length. The sort order and multigraphs
are defined separately for each key. If your system has locale support, you can also use
locale collation rules instead of specify your own sort order.

msort provides twelve types of key comparison: lexicographic, numeric, numeric string,
hybrid, by string length, by angle, by date, by domain name, by time, by ISO8601 date/time
stamp, by month name, and random.

What month names are used is a bit complicated. If the -s flag is used on the same key and
its argument is the name of a file, the month names are read from the file, which should
be in the same format as a sort order definition file. If the -s flag is used and its
argument is a locale name, the month names recognized will be the month names and
abbreviations associated with the specified locale. If the -s flag is not used the month
names recognized will be the month names and abbreviations associated with the current
locale. If your system does not have locale support and you do not use the -s flag to read
the month names from a file, the month names recognized will be the English month names
and abbreviations.

msort can reverse the characters in a key, allowing it to be used to generate reverse
dictionaries.

A choice of sorting algorithms is provided.

msort fully supports Unicode. The text to be sorted, and all specifications, should be in
UTF-8 Unicode. (If you have plain ASCII text, this is not a problem as ASCII is a subset
of Unicode.) Full Unicode case-folding is available, in Turkic and non-Turkic variants.
Unicode normalization is performed before sorting.

For usage information, execute msort with no arguments.

Full information about msort is currently to be found in the reference manual, which is
distributed as a PDF (Portable Document Format) file. If a copy is not available locally,
you can download it from msort's home page:
http://billposer.org/Software/msort.html

OPTIONS


Informational options
-h,--help
Print usage message

-v,--version
Print version message

-D,--defaults
List defaults

-F,--general-options
List general command line options

-G,--gnu-equivalences
List equivalents for GNU sort command line options.

-H,--informational-options
List informational command line options

-K,--key-specific-options
List key-specific command line options

-L,--limits
List limits

-N,--number-systems
List the supported number systems.

General options
-b,--block
A record is terminated by two or more newlines

-l,--line
A record consists of a single line

-r,--record-separator <separator>
A record is terminated by separator character

-O,--fixed-size-record <bytes>
A record consists of the specified number of bytes.

-d,--field-separators <character>+
Fields are delimited by the named character(s)

-w,--whole
Sort on the entire text of the record

-a,--algorithm <algorithm>
Use the specified sort algorithm. The choices are: I(nsertionSort), M(ergeSort),
Q(uickSort), and S(hellSort). Note that InsertionSort and MergeSort are stable,
while QuickSort and ShellSort are unstable. The default is QuickSort.

-M,-initial-maximum-records <records>
Set initial maximum number of records

-m,--line-end-carriage-return
End-of-line in the input data is marked by Carriage Return (0x0D) as on the
Macintosh rather than by Line Feed (0x0A) as on Unix systems.

-I,--invert-globally
Invert sense of comparisons globally

-B,--BMP
No characters fall outside the Basic Multingual Plane (that is, have values greater
than 0xFFFF).

-Z,--skip-first-record
Copy the first record in the input to the output without sorting it. This is useful
for sorting files with a header.

-p,--reserve-private-use-area
Do not make internal use of the Private Use areas. By default, multigraphs are
assigned internally to codepoints in the Supplementary Private Use areas if full
Unicode is in use or to codepoints in the Private Use area if input is restricted
to the Basic Multilingual Plane by means of the -B option. If your input makes use
of the Private Use areas, this option prevents interference with your input. In
this case, multigraphs will be assigned to the Low and High Surrogate areas
(0xD800-0xDFFF). Note that this limits the number of multigraphs to 2,048.

-P,--random-seed <seed>
Set the seed for the random number generator. If not set here, it is set to a value
determined by the time. The seed used is reported in the log. This option allows
runs to be replicated.

-Q,--check-only
Check whether the input is already sorted. Do not generate any output. Exit status
is 0 if input is already sorted, 11 if not sorted.

-1,--in <input file name>

-2,--out <output file name>
If the output file is the same as the input file, the input file will be
overwritten. The input file will not be overwritten if the run is unsuccessful.

-j,--suppress-log
Suppress output to the log. If this flag is given before there is any output to the
log from a command line flag, nothing will be written to the log and the log file
will not be created. If a command line flag generates a log message before this
flag is processed, the log file will be created but no log messages will be written
to it once this flag is processed. To guarantee that no attempt will be made to
open a log file, give this flag first.

-q,--quiet
Be quiet - do not chat while working

-u,--unicode-normalization <mode>
Select Unicode normalization mode. The choices of mode are: c for normalization
form C (NFC), d for normalization form D (NFD), C for normalization form KC (NFKC),
D for normalization form KD (NFKD), and n for no normalization. The default is NFC.

Key specific options
-e,--character-range <m,n>
Sort on characters m through n. Positive indices start from one. Negative indices
indicate position with respect to the end of the record. For example, the range
3,-2 consists of the third character through the next-to-last character.

-n,--position <POS>(,<POS>)
Sort on the specified POS or contiguous range of POSs, where a POS is of the form
<field number>(.<character number>). Both counts begin at one. Field numbers but
not character numbers may be negative, in which case they are counted from the
right. Thus, 1.2 is the second character of the first field; -2.1 is the first
character of the next to last field.

-t,--tag <tag regexp>
Sort on the field with the specified tag

-o,--optional <comparison>
Optional: compare as (<,=,>) to present key if absent

-C,--fold-case
Fold case

-z,--fold-case-turkic
Fold case with additional Turkic conversions.

-c,--comparison-type <comparison type>
a(ngle),l(exicographic), i(so8601 date/time), t(ime), D(omain name/email address),
d(ate), m(onth name), n(umeric), N(umeric string),s(ize), h(hybrid), r(andom)

-y,--number-system <number system>
Specifies the number system expected for this key. This affects only numeric and
numeric string keys. There are two special values. If the number system is "all",
records may contain any number system that msort can interpret. Different records
may contain different number systems. If the number system is "any", records may
contain any writing system that msort can interpret, but all records must make use
of the same number system. msort sets the number system on the basis of the first
record.

-f,--date-format <date format>
Permutation of ymd with separators, e.g. y-m-d for international date format, m/d/y
for American date format, or a permutation of yd with separators, e.g. y-d, for
day-of-year dates. All three components may be numbers in any available number
system. The month field may also be a month name, determined by the same devices as
independent month name fields.

-W,--sort-order-file-separators <file name>
Read the list of characters to be treated as separators in the sort order
definition file.

-S,--substitutions <file name>
Read substitutions from named file

-s,--sort-order <file name>|<locale name>|"locale"
If the argument is a file name, it is taken to be a sort order file and the sort
order for the key is read from the file. If the argument is a locale name, the
collation rules for that locale are used. If the argument is "locale", the
collation rules for the current locale are used.

-T,--transformations <(d)(e)(s)>
Apply the specified transformations. d specifies that diacritics are to be
stripped. Separately encoded combining diacritics are removed. Characters with
diacritics represented by single codepoints are replaced with the corresponding
ASCII character without the diacritics, if there is one. e specifies that enclosed
characters, that is, characters within circles or parentheses, are to be replaced
with the corresponding plain ASCII character if there is one. s specifies that
characters in special styles are to be replaced with the corresponding plain ASCII
character if there is one. Stylistic equivalents include: small capitals (e.g.
U+1D04), script forms (e.g. U+212C), black letter forms (e.g. U+212D), Arabic
presentation forms (e.g. U+FE81), Hebrew presentation forms (e.g. U+FB1D),
fullwidth forms (e.g. U+FF01), halfwidth forms (e.g. U+FF7B), and the mathematical
alphanumeric symbols (e.g. U+1D400).

-x,--exclusion-file <file name>
Read exclusions from named file

-X,--exclude-characters <exclusions>
Exclude specified characters

-i,--invert-locally
Invert sense of comparisons

-R,--reverse-key
Reverse characters of key

-A,--first-character-only
Ignore all but the first character of the field, after substitutions, exclusions,
etc.

Note: long options may not be available on your system.

Use msort online using onworks.net services



Latest Linux & Windows online programs