EnglishFrenchSpanish

OnWorks favicon

pymvpa2-crossval - Online in the Cloud

Run pymvpa2-crossval in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command pymvpa2-crossval that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


pymvpa2-crossval - cross-validation of a learner's performance

SYNOPSIS


pymvpa2 crossval [--version] [-h] -i DATASET [DATASET ...] --learner LEARNER [--learner-
space LEARNER_SPACE] --partitioner PARTITIONER [--errorfx ERRORFX] [--avg-datafold-
results] [--balance-training BALANCE_TRAINING] [--sampling-repetitions
SAMPLING_REPETITIONS] [--permutations PERMUTATIONS] [--prob-tail {left,right}] -o OUTPUT
[--hdf5-compression TYPE]

DESCRIPTION


Cross-validation of a learner's performance

A learner is repeatedly trained and tested on partitions of an input dataset that are
generated by a configurable partitioning scheme. Partition usually constitute training
and testing portions. The learner is trained on training portion of the dataset and then
learner's generalization is tested by comparing its predictions on the testing portion.

A summary of a learner performance is written to STDOUT. Depending on the particular setup
of the cross-validation analysis, either the learner's raw predictions or summary
statistics are returned in an output dataset.

If Monte-Carlo permutation testing is enabled (see --permutations) a second output dataset
with the corresponding p-values is stored as well (filename suffix '_nullprob').

OPTIONS


--version
show program's version and license information and exit

-h, --help, --help-np
show this help message and exit. --help-np forcefully disables the use of a pager
for displaying the help.

-i DATASET [DATASET ...], --input DATASET [DATASET ...]
path(s) to one or more PyMVPA dataset files. All datasets will be merged into a
single dataset (vstack'ed) in order of specification. In some cases this option may
need to be specified more than once if multiple, but separate, input datasets are
required.

Options for cross-validation setup:
--learner LEARNER
select a learner (trainable node) via its description in the learner warehouse (see
'info' command for a listing), a colon-separated list of capabilities, or by a file
path to a Python script that creates a classifier instance (advanced).

--learner-space LEARNER_SPACE
name of a sample attribute that defines the model to be learned by a learner. By
default this is an attribute named 'targets'.

--partitioner PARTITIONER
select a data folding scheme. Supported arguments are: 'half' for split-half
partitioning, 'oddeven' for partitioning into odd and even chunks, 'group-X' where
X can be any positive integer for partitioning in X groups, 'n-X' where X can be
any positive integer for leave-X-chunks out partitioning. By default partitioners
operate on dataset chunks that are defined by a 'chunks' sample attribute. The name
of the "chunking" attribute can be changed by appending a colon and the name of the
attribute (e.g. 'oddeven:run'). optionally an argument to this option can also be
a file path to a Python script that creates a custom partitioner instance
(advanced).

--errorfx ERRORFX
error function to be applied to the targets and predictions of each
cross-validation data fold. This can either be a name of any error function in
PyMVPA's mvpa2.misc.errorfx module, or a file path to a Python script that creates
a custom error function (advanced).

--avg-datafold-results
average result values across data folds generated by the partitioner. For example
to compute a mean prediction error across all folds of a crossvalidation procedure.

--balance-training BALANCE_TRAINING
If enabled, training samples are balanced within each data fold. If the keyword
'equal' is given as argument an equal number of random samples for each unique
target value is chosen. The number of samples per category is determined by the
category with the least number of samples in the respective training set. An
integer argument will cause the a corresponding number of samples per category to
be randomly selected. A floating point number argument (interval [0,1]) indicates
what fraction of the available samples shall be selected.

--sampling-repetitions SAMPLING_REPETITIONS
If training set balancing is enabled, how often should random sample selection be
performed for each data fold. Default: 1

--permutations PERMUTATIONS
Number of Monte-Carlo permutation runs to be computed for estimating an H0
distribution for all crossvalidation results. Enabling this option will make
reports of corresponding p-values available in the result summary and output.

--prob-tail {left,right}
which tail of the probability distribution to report p-values from when evaluating
permutation test results. For example, a cross-validation computing mean prediction
error could report left-tail p-value for a single-sided test.

Output options:
-o OUTPUT, --output OUTPUT
output filename ('.hdf5' extension is added automatically if necessary). NOTE: The
output format is suitable for data exchange between PyMVPA commands, but is not
recommended for long-term storage or exchange as its specific content may vary
depending on the actual software environment. For long-term storage consider
conversion into other data formats (see 'dump' command).

--hdf5-compression TYPE
compression type for HDF5 storage. Available values depend on the specific HDF5
installation. Typical values are: 'gzip', 'lzf', 'szip', or integers from 1 to 9
indicating gzip compression levels.

Use pymvpa2-crossval online using onworks.net services


Free Servers & Workstations

Download Windows & Linux apps

Linux commands

Ad