EnglishFrenchSpanish

OnWorks favicon

ra-index - Online in the Cloud

Run ra-index in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command ra-index that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


ra-index - index files for use with remembrance agent software

SYNOPSIS


ra-index [--version] [-v] [-d] [-s] <base-dir> <source1> [<source2>] [...] [-e
<excludee1> [<excludee2>] [...]]

DESCRIPTION


ra-index and ra-retrieve make up the Savant search engine, an information retrieval engine
designed as a back-end for the Remembrance Agent (RA). Given a collection of the user's
accumulated email, usenet news articles, papers, saved HTML files and other text notes,
the RA attempts to find those documents which are most relevant to the user's current
context. That is, it searches this collection of text for the documents which bear the
highest word-for-word similarity to the text the user is currently editing, in the hope
that they will also bear high conceptual similarity and thus be useful to the user's
current work. With the Emacs front-end, these suggestions are continuously displayed in a
small buffer at the bottom of the user's window. If a suggestion looks useful, the full
text can be retrieved with a single command.

The Remembrance Agent works in two stages. First, the user's collection of text documents
is indexed into a database saved in a vector format. After the database is created, the
other stage of the Remembrance Agent is run from emacs, where it periodically takes a
sample of text from the working buffer and finds those documents from the collection that
are most similar. It summarizes the top documents in a small emacs window and allows you
to retrieve the entire text of any one with a keystroke. See the README file for
information on using the Emacs front-end.

At its core Savant is a text-retrieval search-engine that uses a standard TF/iDF
algorithm, but it also uses a template system to recognize different kinds of documents
and extract various field information. For example, ra-index can recognize subject lines
and address information from email files and file this information separately. It can
also pull apart file archives into separate documents, e.g. RMAIL files are indexed as
separate email documents. Finally, there are filters defined for many document types to
remove extraneous information like HTML tags that might otherwise cause problems in
retrieval. These are all precompiled in a template structure. It is not currently well
documented, though if anyone wants to play with it is all defined in the source file
templates/conftemplates.c.

The RA is primarily designed as a proactive information provider that continually gives
you information that might be relevant to your current environment, but Savant can also be
used as a standard text and information retrieval search engine.

USAGE
To index, you must have a set of source text-files, and a directory Savant can put
database files into. The <source> arguments may be files or directories. If a directory
is in the list, Savant will use all its contents, recursing into all subdirectories. Non-
text files and backup files (those appended with ~ or prepended with #) are ignored. It
also ignores dot-files (those starting with .) and symbolic links. Any files or
directories specified after the optional -e flag will be excluded. Savant will use any
files it finds to create a database in the specified base directory, which must already
exist. The optional -v argument (verbose) will direct Savant to keep you updated on its
progress. So for example,

ra-index -v ~/RA-indexes/mail ~/RMAIL ~/Rmail-files -e ~/Rmail-files/Old-files
will build a database in the ~/RA-indexes/mail directory, made up of emails from my RMAIL
file plus all files and subdirectories of ~/Rmail-files, excluding files and directories
in ~/Rmail-files/Old-files.

ra-index can build databases in any directory you like, but the emacs interface for the
Remembrance Agent expects a particular structure. For each database you want to make, you
should create a directory, and all these directories should live in the same parent
directory. For example, for my own use I have a directory ~/RA-indexes/, and within that
are the directories ~/RA-indexes/mail/, ~/RA-indexes/papers/, etc. which actually contain
the database files.

OPTIONS
-v Verbose mode. Print useful information.

-d Debug mode. Print not-so-useful information.

-e Exclude all filenames and directories which follow

-s Follow symbolic links when indexing

--version
Print version information.

Use ra-index online using onworks.net services


Free Servers & Workstations

Download Windows & Linux apps

  • 1
    Phaser
    Phaser
    Phaser is a fast, free, and fun open
    source HTML5 game framework that offers
    WebGL and Canvas rendering across
    desktop and mobile web browsers. Games
    can be co...
    Download Phaser
  • 2
    VASSAL Engine
    VASSAL Engine
    VASSAL is a game engine for creating
    electronic versions of traditional board
    and card games. It provides support for
    game piece rendering and interaction,
    and...
    Download VASSAL Engine
  • 3
    OpenPDF - Fork of iText
    OpenPDF - Fork of iText
    OpenPDF is a Java library for creating
    and editing PDF files with a LGPL and
    MPL open source license. OpenPDF is the
    LGPL/MPL open source successor of iText,
    a...
    Download OpenPDF - Fork of iText
  • 4
    SAGA GIS
    SAGA GIS
    SAGA - System for Automated
    Geoscientific Analyses - is a Geographic
    Information System (GIS) software with
    immense capabilities for geodata
    processing and ana...
    Download SAGA GIS
  • 5
    Toolbox for Java/JTOpen
    Toolbox for Java/JTOpen
    The IBM Toolbox for Java / JTOpen is a
    library of Java classes supporting the
    client/server and internet programming
    models to a system running OS/400,
    i5/OS, o...
    Download Toolbox for Java/JTOpen
  • 6
    D3.js
    D3.js
    D3.js (or D3 for Data-Driven Documents)
    is a JavaScript library that allows you
    to produce dynamic, interactive data
    visualizations in web browsers. With D3
    you...
    Download D3.js
  • More »

Linux commands

Ad