DATAPACKER(1) | datapacker Manual | DATAPACKER(1) |
datapacker - Tool to pack files into the minimum number of bins
datapacker [ -0 ] [ -a ACTION ] [ -b FORMAT ] [ -d ] [ -p ] [ -S SIZE ] -s SIZE FILE ...
datapacker -h | --help
datapacker is a tool to group files by size. It is designed to group files such that they fill fixed-size containers (called "bins") using the minimum number of containers. This is useful, for instance, if you want to archive a number of files to CD or DVD, and want to organize them such that you use the minimum possible number of CDs or DVDs.
In many cases, datapacker executes almost instantaneously. Of particular note, the hardlink action (see OPTIONS below) can be used to effectively copy data into bins without having to actually copy the data at all.
datapacker is a tool in the traditional Unix style; it can be used in pipes and call other tools.
Here are the command-line options you may set for datapacker. Please note that -s and at least one file (see FILE SPECIFICATION below) is mandatory.
It is an error if the generated command line for a given bin is too large for the system.
A nonzero exit code from any COMMAND will cause datapacker to terminate. If COMMAND contains quotes, don't forget to quote the entire command, as in:
datapacker '--action=exec:echo "Bin: $1"; shift; ls "$@"'
The arguments to the given command will be:
After you are done processing the results of the bin, you may safely delete the bins without deleting original data. Alternatively, you could leave the bins and delete the original data. Either approach will be workable.
It is an error to attempt to make a hard link across filesystems, or to have two input files with the same filename in different paths. datapacker will exit on either of these situations.
See also --deep-links.
See also --deep-links.
Other useful variants could include destdir/%d to put the string "destdir/" in front of the bin number, which is rendered without leading zeros.
As an example of such a situation: perhaps you have taken one photo a day for several years. You would like to archive these photos to CD, but you want them to be stored in chronological order. You have named the files such that the names indicate order, so you can pass the file list to datapacker using -p to preserve the ordering in your bins. Thus, bin 1 will contain the oldest files, bin 2 the second-oldest, and so on. If -p wasn't used, you might use fewer CDs, but the photos would be spread out across all CDs without preserving your chronological order.
The size of the first bin may be overridden with -S.
Here are the sizes of some commonly-used bins. For each item, I have provided you with both the underlying recording capacity of the disc and a suggested value for -s. The suggested value for -s is lower than the underlying capacity because there is overhead imposed by the filesystem stored on the disc. You will perhaps find that the suggested value for -s is lower than optimal for discs that contain few large files, and higher than desired for discs that contain vast amounts of small files.
After the options, you must supply one or more files to consider for packing into bins. Alternatively, instead of listing files on the command line, you may list a single hyphen (-), which tells datapacker to read the list of files from standard input (stdin).
datapacker never recurses into subdirectories. If you want a recursive search -- finding all files in a given directory and all its subdirectories -- see the second example in the EXAMPLES section below. datapacker is designed to integrate with find(1) in this situation to let you take advantage of find's built-in powerful recursion and filtering features.
When reading files from standard input, it is assumed that the list contains one distinct filename per line. Seasoned POSIX veterans will recognize the inherent limitations in this format. For that reason, when given -0 in conjunction with the single file -, datapacker will instead expect, on standard input, a list of files, each one terminated by an ASCII NULL character. Such a list can be easily generated with find(1) using its -print0 option.
datapacker -b ~/bins/%03d -s 600m -a hardlink ~/Pictures/*.jpg
find ~/Pictures -type f -print0 | \ datapacker -0 -b ~/bins/%03d -s 600m -a hardlink -
find ~/Pictures -name "*.jpg" -print0 | \ datapacker -0 -b ~/bins/%03d -s 600m -a hardlink -
find ~/Pictures -name "*.jpg" -print0 | \ datapacker -0 -b ~/bins/%03d -s 4g \ '--action=exec:echo -n "$1: "; shift; du -ch "$@" | grep total' \ -
This will display output like so:
/home/jgoerzen/bins/001: 4.0G total /home/jgoerzen/bins/002: 4.0G total /home/jgoerzen/bins/003: 4.0G total /home/jgoerzen/bins/004: 992M total
Note: the grep pattern in this example is simple, but will cause unexpected results if any matching file contains the word "total".
find ~/Pictures -name "*.jpg" -print0 | \ datapacker -0 -b ~/bins/%03d.iso -s 4g \ '--action=exec:BIN="$1"; shift; mkisofs -r -J -o "$BIN" "$@"' \ -
You could, if you so desired, pipe this result directly into a DVD-burning application. Or, you could use growisofs to burn a DVD+R in a single step.
It is an error if any specified file exceeds the value given with -s or -S.
It is also an error if any specified files disappear while datapacker is running.
Reports of bugs should be reported online at the datapacker homepage. Debian users are encouraged to instead use the Debian bug-tracking system.
datapacker, and this manual, are Copyright (C) 2008 John Goerzen.
All code, documentation, and build scripts are under the following license unless otherwise noted:
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see
<URL:http://www.gnu.org/licenses/>.
The GNU General Public License is available in the file COPYING in the source distribution. Debian GNU/Linux users may find this in /usr/share/common-licenses/GPL-3.
If the GPL is unacceptable for your uses, please e-mail me; alternative terms can be negotiated for your project.
datapacker, its libraries, documentation, and all included files, except where noted, was written by John Goerzen <jgoerzen@complete.org> and copyright is held as stated in the COPYRIGHT section.
datapacker may be downloaded, and information found, from its homepage <URL:http://software.complete.org/datapacker>.
mkisofs(1), genisoimage(1)
19 May 2023 | John Goerzen |