This is the command datapacker that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator
PROGRAM:
NAME
datapacker - Tool to pack files into the minimum number of bins
SYNOPSIS
datapacker [ -0 ] [ -a ACTION ] [ -b FORMAT ] [ -d ] [ -p ] [ -S SIZE ] -s SIZE FILE ...
datapacker -h | --help
DESCRIPTION
datapacker is a tool to group files by size. It is designed to group files such that they
fill fixed-size containers (called "bins") using the minimum number of containers. This
is useful, for instance, if you want to archive a number of files to CD or DVD, and want
to organize them such that you use the minimum possible number of CDs or DVDs.
In many cases, datapacker executes almost instantaneously. Of particular note, the
hardlink action (see OPTIONS below) can be used to effectively copy data into bins without
having to actually copy the data at all.
datapacker is a tool in the traditional Unix style; it can be used in pipes and call other
tools.
OPTIONS
Here are the command-line options you may set for datapacker. Please note that -s and at
least one file (see FILE SPECIFICATION below) is mandatory.
-0
--null When reading a list of files from standard input (see FILE SPECIFICATION below),
expect the input to be separated by NULL (ASCII 0) characters instead of one per
line. Especially useful with find -print0.
-a ACTION
--action=ACTION
Defines what action to take with the matches. Please note that, with any action,
the output will be sorted by bin, with bin 1 first. Possible actions include:
print Print one human-readable line per file. Each line contains the bin number
(in the format given by -b), an ASCII tab character, then the filename.
printfull
Print one semi-human-readable line per bin. Each line contains the bin
number, then a list of filenames to place in that bin, with an ASCII tab
character after the bin number and between each filename.
print0 For each file, output the bin number (according to the format given by -b),
an ASCII NULL character, the filename, and another ASCII NULL character.
Ideal for use with xargs -0 -L 2.
exec:COMMAND
For each file, execute the specified COMMAND via the shell. The program
COMMAND will be passed information on its command line as indicated below.
It is an error if the generated command line for a given bin is too large
for the system.
A nonzero exit code from any COMMAND will cause datapacker to terminate. If
COMMAND contains quotes, don't forget to quote the entire command, as in:
datapacker '--action=exec:echo "Bin: $1"; shift; ls "$@"'
The arguments to the given command will be:
· argv[0] ($0 in shell) will be the name of the shell used to invoke the
command -- $SHELL or /bin/sh.
· argv[1] ($1 in shell) will be the bin number, formatted according to -b.
· argv[2] and on ($2 and on in shell) will be the files to place in that bin
hardlink
For each file, create a hardlink at bin/filename pointing to the original
input filename. Creates the directory bin as necessary. Alternative
locations and formats for bin can be specified with -b. All bin directories
and all input must reside on the same filesystem.
After you are done processing the results of the bin, you may safely delete
the bins without deleting original data. Alternatively, you could leave the
bins and delete the original data. Either approach will be workable.
It is an error to attempt to make a hard link across filesystems, or to have
two input files with the same filename in different paths. datapacker will
exit on either of these situations.
See also --deep-links.
symlink
Like hardlink, but create symlinks instead. Symlinks can span filesystems,
but you will lose information if you remove the original (pre-bin) data.
Like hardlink, it is an error to have a single filename occur in multiple
input directories with this option.
See also --deep-links.
-b FORMAT
--binfmt=FORMAT
Defines the output format for the bin name. This format is given as a %d input to
a function that interprets it as printf(3) would. This can be useful both to
define the name and the location of your bins. When running datapacker with
certain arguments, the bin format can be taken to be a directory in which files in
that bin are linked. The default is %03d, which outputs integers with leading
zeros to make all bin names at least three characters wide.
Other useful variants could include destdir/%d to put the string "destdir/" in
front of the bin number, which is rendered without leading zeros.
-d
--debug
Enable debug mode. This is here for future expansion and does not currently have
any effect.
-D
--deep-links
When used with the symlink or hardlink action, instead of making all links in a
single flat directory under the bin, mimic the source directory structure under the
bin. Makes most sense when used with -p, but could also be useful without it if
there are files with the same name in different source directories.
--help Display brief usage information and exit.
-p
--preserve-order
Normally, datapacker uses an efficient algorithm that tries to rearrange files such
that the number of bins required is minimized. Sometimes you may instead wish to
preserve the ordering of files at the expense of potentially using more bins. In
these cases, you would want to use this option.
As an example of such a situation: perhaps you have taken one photo a day for
several years. You would like to archive these photos to CD, but you want them to
be stored in chronological order. You have named the files such that the names
indicate order, so you can pass the file list to datapacker using -p to preserve
the ordering in your bins. Thus, bin 1 will contain the oldest files, bin 2 the
second-oldest, and so on. If -p wasn't used, you might use fewer CDs, but the
photos would be spread out across all CDs without preserving your chronological
order.
-s SIZE
--size=SIZE
Gives the size of each bin in bytes. Suffixes such as "k", "m", "g", etc. may be
used to indicate kilobytes, megabytes, gigabytes, and so forth. Numbers such as
1.5g are valid, and if needed, will be rounded to the nearest possible integer
value.
The size of the first bin may be overridden with -S.
Here are the sizes of some commonly-used bins. For each item, I have provided you
with both the underlying recording capacity of the disc and a suggested value for
-s. The suggested value for -s is lower than the underlying capacity because there
is overhead imposed by the filesystem stored on the disc. You will perhaps find
that the suggested value for -s is lower than optimal for discs that contain few
large files, and higher than desired for discs that contain vast amounts of small
files.
· CD-ROM, 74-minute (standard): 650m / 600m
· CD-ROM, 80-minute: 703m / 650m
· CD-ROM, 90-minute: 790m / 740m
· CD-ROM, 99-minute: 870m / 820m
· DVD+-R: 4.377g / 4g
· DVD+R, dual layer: 8.5g / 8g
-S
--size-first
The size of the first bin. If not given, defaults to the value given with -s.
This may be useful if you will be using a mechanism outside datapacker to add
additional information to the first bin: perhaps an index of which bin has which
file, the information necessary to make a CD bootable, etc. You may use the same
suffixes as with -s with this option.
--sort Sorts the list of files to process before acting upon them. When combined with -p,
causes the output to be sorted. This option has no effect save increasing CPU
usage when not combined with -p.
FILE SPECIFICATION
After the options, you must supply one or more files to consider for packing into bins.
Alternatively, instead of listing files on the command line, you may list a single hyphen
(-), which tells datapacker to read the list of files from standard input (stdin).
datapacker never recurses into subdirectories. If you want a recursive search -- finding
all files in a given directory and all its subdirectories -- see the second example in the
EXAMPLES section below. datapacker is designed to integrate with find(1) in this
situation to let you take advantage of find's built-in powerful recursion and filtering
features.
When reading files from standard input, it is assumed that the list contains one distinct
filename per line. Seasoned POSIX veterans will recognize the inherent limitations in
this format. For that reason, when given -0 in conjunction with the single file -,
datapacker will instead expect, on standard input, a list of files, each one terminated by
an ASCII NULL character. Such a list can be easily generated with find(1) using its
-print0 option.
EXAMPLES
· Put all JPEG images in ~/Pictures into bins (using hardlinks) under the pre-existing
directory ~/bins, no more than 600MB per bin:
datapacker -b ~/bins/%03d -s 600m -a hardlink ~/Pictures/*.jpg
· Put all files in ~/Pictures or any subdirectory thereof into 600MB bins under ~/bins,
using hardlinking. This is a simple example to follow if you simply want a recursive
search of all files.
find ~/Pictures -type f -print0 | \
datapacker -0 -b ~/bins/%03d -s 600m -a hardlink -
· Find all JPEG images in ~/Pictures or any subdirectory thereof, put them into bins
(using hardlinks) under the pre-existing directory ~/bins, no more than 600MB per bin:
find ~/Pictures -name "*.jpg" loading="lazy" -print0 | \
datapacker -0 -b ~/bins/%03d -s 600m -a hardlink -
· Find all JPEG images as above, put them in 4GB bins, but instead of putting them
anywhere, calculate the size of each bin and display it.
find ~/Pictures -name "*.jpg" loading="lazy" -print0 | \
datapacker -0 -b ~/bins/%03d -s 4g \
'--action=exec:echo -n "$1: "; shift; du -ch "$@" | grep total' \
-
This will display output like so:
/home/jgoerzen/bins/001: 4.0G total
/home/jgoerzen/bins/002: 4.0G total
/home/jgoerzen/bins/003: 4.0G total
/home/jgoerzen/bins/004: 992M total
Note: the grep pattern in this example is simple, but will cause unexpected results if
any matching file contains the word "total".
· Find all JPEG images as above, and generate 600MB ISO images of them in ~/bins. This
will generate the ISO images directly without ever hardlinking files into ~/bins.
find ~/Pictures -name "*.jpg" loading="lazy" -print0 | \
datapacker -0 -b ~/bins/%03d.iso -s 4g \
'--action=exec:BIN="$1"; shift; mkisofs -r -J -o "$BIN" "$@"' \
-
You could, if you so desired, pipe this result directly into a DVD-burning application.
Or, you could use growisofs to burn a DVD+R in a single step.
ERRORS
It is an error if any specified file exceeds the value given with -s or -S.
It is also an error if any specified files disappear while datapacker is running.
Use datapacker online using onworks.net services