EnglishFrenchSpanish

OnWorks favicon

ids2ngram - Online in the Cloud

Run ids2ngram in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command ids2ngram that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


ids2ngram - generate n-gram data file from ids file

SYNOPSIS


ids2ngram [option]... ids_file...

DESCRIPTION


ids2ngram generates idngram file, which is a sorted [id1,..,idN,freq] array, from binary
id stream files. Here, the id stream files are always generated by mmseg or slmseg.
Basically, it finds all occurrence of n-words tuples (i.e. the tuple of (id1,..,idN)), and
sorts these tuples by the lexicographic order of the ids make up the tuples, then write
them to specified output file.

INPUT


The input file is presented as a binary id stream, which looks like:
[id0,...,idX]

OPTIONS


All the following options are mandatory.

-n,--NMax N
Generates N-gram result. ids2ngram does only support uni-gram, bi-gram, and trigram,
so any number not in the range of 1..3 is not valid.

-s,--swap swap-file
Specify the temporary intermediate file.

-o, --out output-file
Specify the result idngram file, e.g. the array of [id1, ..., idN, freq]

-p, --para N
Specify the maximum n-gram items per paragraph. ids2ngram writes to the temporary file
on a per-paragraph basis. Every time it writes a paragraph out, it frees the
corresponding memory allocated for it. When your computer system permits, a higher N
is suggested. This can speed up the processing speed because of less I/O.

EXAMPLE


Following example will use three input idstream file idsfile[1,2,3] to generate the
idngram file all.id3gram. Each para (internal map size or hash size) would be 1024000,
using swap file for temp result. All temp para result would eventually be merged to got
the final result.

ids2ngram -n 3 -s /tmp/swap -o all.id3gram -p 1024000 idsfile1 idsfile2 idsfile3

Use ids2ngram online using onworks.net services


Free Servers & Workstations

Download Windows & Linux apps

  • 1
    strace
    strace
    The strace project has been moved to
    https://strace.io. strace is a
    diagnostic, debugging and instructional
    userspace tracer for Linux. It is used
    to monitor a...
    Download strace
  • 2
    gMKVExtractGUI
    gMKVExtractGUI
    A GUI for mkvextract utility (part of
    MKVToolNix) which incorporates most (if
    not all) functionality of mkvextract and
    mkvinfo utilities. Written in C#NET 4.0,...
    Download gMKVExtractGUI
  • 3
    JasperReports Library
    JasperReports Library
    JasperReports Library is the
    world's most popular open source
    business intelligence and reporting
    engine. It is entirely written in Java
    and it is able to ...
    Download JasperReports Library
  • 4
    Frappe Books
    Frappe Books
    Frappe Books is a free and open source
    desktop book-keeping software that's
    simple and well-designed to be used by
    small businesses and freelancers. It'...
    Download Frappe Books
  • 5
    Numerical Python
    Numerical Python
    NEWS: NumPy 1.11.2 is the last release
    that will be made on sourceforge. Wheels
    for Windows, Mac, and Linux as well as
    archived source distributions can be fou...
    Download Numerical Python
  • 6
    CMU Sphinx
    CMU Sphinx
    CMUSphinx is a speaker-independent large
    vocabulary continuous speech recognizer
    released under BSD style license. It is
    also a collection of open source tools ...
    Download CMU Sphinx
  • More »

Linux commands

crm
crm
Use crm online using onworks.net
services. ...
Run crm
  • 4
    crmgr
    crmgr
    crmgr - administration utility for QDBM
    Curia ...
    Run crmgr
  • 5
    gappletviewer
    gappletviewer
    gappletviewer - Load and runs an applet
    ...
    Run gappletviewer
  • 6
    gaps
    gaps
    mummer - package for sequence alignment
    of multiple genomes ...
    Run gaps
  • s-processed="true">
    g15stats
    g15stats - A CPU/Memory/Swap usage
    meter for G15Daemon DESCRIPTION: The
    packages provides the following usage
    meter for LCD on some Logitech
    keyboards, usind g...
    Run g15stats
  • More »
  • Ad