EnglishFrenchSpanish

OnWorks favicon

mmseg - Online in the Cloud

Run mmseg in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command mmseg that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


mmseg - maximum matching segment Chinese text.

SYNOPSIS


mmseg -d dict_file [option]... [corpus_file]...

DESCRIPTION


mmseg is a tool for segmenting Chinese text into words using maximum matching algorithm.
mmseg segments corpus_file, or standard input if no filename is specified, and write the
segmented result to standard output.

OPTIONS


-d dict_file
Use dict_file as lexicon. A default lexicon can be found at
/usr/share/sunpinyin-slm/dict.utf8.

-f,--format (text|bin)
Output Format, can be 'text' or 'bin'. default 'bin'. Normally, in text mode, word
text are output, while in binary mode, binary short integer of the word-ids are
written to stdout.

-s, --stok STOK_ID
Sentence token id. Default 10. It will be written to output in binary mode after
every sentence.

-i, --show-id
Show Id info. Under text output format mode, attach id after known words. If under
binary mode, print id(s) in text.

-a, --ambiguious-id AMBI-ID
Ambiguious means ABC => A BC or AB C. If specified (AMBI-ID != 0), The sequence ABC
will not be segmented, in binary mode, the AMBI-ID is written out; in text mode,
"<ambi>ABC</ambi>" will be output. Default is 0.

NOTES


Under binary mode, consecutive id of 0 are merged into one 0. Under text mode, no space
are inserted between unknown-words.

Use mmseg online using onworks.net services


Free Servers & Workstations

Download Windows & Linux apps

  • 1
    Crater
    Crater
    Crater is an open-source web &
    mobile invoicing app made especially for
    freelancers and small businesses.
    It's the complete invoicing solution
    you need...
    Download Crater
  • 2
    formkiq-core
    formkiq-core
    FormKiQ Core is an Open Source Document
    Management System (DMS), available to
    run as a headless software or with a
    web-based client, deployed to your
    Amazon We...
    Download formkiq-core
  • 3
    Blackfriday
    Blackfriday
    Blackfriday is a Markdown processor
    implemented in Go. It is paranoid about
    its input (so you can safely feed it
    user-supplied data), it is fast, it
    supports c...
    Download Blackfriday
  • 4
    QNAP NAS GPL Source
    QNAP NAS GPL Source
    GPL source for QNAP Turbo NAS.
    Audience: Developers. User interface:
    Web-based. Programming Language: C,
    Java. Categories:System, Storage,
    Operating System Ker...
    Download QNAP NAS GPL Source
  • 5
    deep-clean
    deep-clean
    A Kotlin script that nukes all build
    caches from Gradle/Android projects.
    Useful when Gradle or the IDE let you
    down. The script has been tested on
    macOS, but ...
    Download deep-clean
  • 6
    Eclipse Checkstyle Plug-in
    Eclipse Checkstyle Plug-in
    The Eclipse Checkstyle plug-in
    integrates the Checkstyle Java code
    auditor into the Eclipse IDE. The
    plug-in provides real-time feedback to
    the user about viol...
    Download Eclipse Checkstyle Plug-in
  • More »

Linux commands

Ad