EnglishFrenchSpanish

OnWorks favicon

Corpus redundancy manager to run in Linux online download

Free download Corpus redundancy manager to run in Linux online Linux app to run online in Ubuntu online, Fedora online or Debian online

This is the Linux app named Corpus redundancy manager to run in Linux online whose latest release can be downloaded as collocations.zip. It can be run online in the free hosting provider OnWorks for workstations.

Download and run online this app named Corpus redundancy manager to run in Linux online with OnWorks for free.

Follow these instructions in order to run this app:

- 1. Downloaded this application in your PC.

- 2. Enter in our file manager https://www.onworks.net/myfiles.php?username=XXXXX with the username that you want.

- 3. Upload this application in such filemanager.

- 4. Start the OnWorks Linux online or Windows online emulator or MACOS online emulator from this website.

- 5. From the OnWorks Linux OS you have just started, goto our file manager https://www.onworks.net/myfiles.php?username=XXXXX with the username that you want.

- 6. Download the application, install it and run it.

Corpus redundancy manager to run in Linux online


Ad


DESCRIPTION

Redundancy due to cut-paste operations in text creates bias in machine learning for NLP.
This module takes a directory and produces a subset of the files in that directory (in a list) with an upper bound on similarity between two files.

Features

  • Identify copy paste redundancy in a document corpus
  • Input: a folder with text documents and similarity threshold
  • Output (a) a list of non-redundant documents (a non-redundant subset of the corpus)
  • Output (b) list of document pairs found to be redundant with the amount of redundancy for the pair
  • Python script (2.6) - tested on various Linux flavours + Windows XP/7


Audience

Science/Research


User interface

Console/Terminal


Programming Language

Python



This is an application that can also be fetched from https://sourceforge.net/projects/corpusredundanc/. It has been hosted in OnWorks in order to be run online in an easiest way from one of our free Operative Systems.


Free Servers & Workstations

Download Windows & Linux apps

  • 1
    NSIS: Nullsoft Scriptable Install System
    NSIS: Nullsoft Scriptable Install System
    NSIS (Nullsoft Scriptable Install
    System) is a professional open source
    system to create Windows installers. It
    is designed to be as small and flexible
    as possi...
    Download NSIS: Nullsoft Scriptable Install System
  • 2
    authpass
    authpass
    AuthPass is an open source password
    manager with support for the popular and
    proven Keepass (kdbx 3.x AND kdbx 4.x ...
    Download authpass
  • 3
    Zabbix
    Zabbix
    Zabbix is an enterprise-class open
    source distributed monitoring solution
    designed to monitor and track
    performance and availability of network
    servers, device...
    Download Zabbix
  • 4
    KDiff3
    KDiff3
    This repository is no longer maintained
    and is kept for archival purposes. See
    https://invent.kde.org/sdk/kdiff3 for
    the newest code and
    https://download.kde.o...
    Download KDiff3
  • 5
    USBLoaderGX
    USBLoaderGX
    USBLoaderGX is a GUI for
    Waninkoko's USB Loader, based on
    libwiigui. It allows listing and
    launching Wii games, Gamecube games and
    homebrew on Wii and WiiU...
    Download USBLoaderGX
  • 6
    Firebird
    Firebird
    Firebird RDBMS offers ANSI SQL features
    & runs on Linux, Windows &
    several Unix platforms. Features
    excellent concurrency & performance
    & power...
    Download Firebird
  • More »

Linux commands

Ad