STORAGE.CONF(5) | InterNetNews Documentation | STORAGE.CONF(5) |
storage.conf - Configuration file for storage manager
The file pathetc/storage.conf contains the rules to be used in assigning articles to different storage methods. These rules determine where incoming articles will be stored.
The storage manager is a unified interface between INN and a variety of different storage methods, allowing the news administrator to choose between different storage methods with different trade-offs (or even use several at the same time for different newsgroups, or articles of different sizes). The rest of INN need not care what type of storage method was used for a given article; the storage manager will figure this out automatically when that article is retrieved via the storage API. Note that you may also want to see the options provided in inn.conf(5) regarding article storage.
The storage.conf file consists of a series of storage method entries. Blank lines and lines beginning with a number sign ("#") are ignored. The maximum number of characters in each line is 255. The order of entries in this file is important, see below.
Each entry specifies a storage method and a set of rules. Articles which match all of the rules of a storage method entry will be stored using that storage method; if an article matches multiple storage method entries, the first one will be used. Each entry is formatted as follows:
method <methodname> { newsgroups: <wildmat> class: <storage_class> size: <minsize>[,<maxsize>] expires: <mintime>[,<maxtime>] options: <options> exactmatch: <bool> filtered: <bool> }
If spaces or tabs are included in a value, that value must be enclosed in double quotes (""). If either a number sign ("#") or a double quote are meant to be included verbatim in a value, they should be escaped with "\".
<methodname> is the name of a storage method to use for articles which match the rules of this entry. The currently available storage methods are:
cnfs timecaf timehash tradspool trash
See the "STORAGE METHODS" section below for more details.
The meanings of the keys in each storage method entry are as follows:
There is no default newsgroups pattern; if an entry should match all newsgroups, use an explicit "newsgroups: *".
The assignment of a particular number to a storage class is arbitrary but permanent (since it is used in storage tokens). As a matter of fact, an article is assigned a storage class depending on the storage rules in effect at the time of its arrival. This identifier will not change even if storage.conf is modified afterwards and the same article would have been assigned another storage class, had it been received after that change. The article is still perfectly valid and retrievable. The only difference will be for expiration with groupbaseexpiry set to false in inn.conf: the rules in expire.ctl apply to the storage class assigned to articles at their arrival.
The format of these parameters is "0d0h0m0s" (days, hours, minutes, and seconds into the future). If <maxtime> is "0s" or is not specified, there is no upper bound on expire times falling into this entry (note that this key has no effect on when the article will actually be expired, but only on whether or not the article will be stored using this storage method). This field is also optional and may be omitted entirely if you do not want to store articles according to their Expires header field, if any.
A <mintime> value greater than "0s" implies that this storage method won't match any article without an Expires header field.
If all the storage classes have this key set to false, filtered articles are stored in the same storage class as accepted articles. It is only when at least one storage class has this key set to true than filtered articles and accepted articles are no longer stored mixed together in any storage class.
If an article matches all of the constraints of an entry, it is stored via that storage method and is associated with that <storage_class>. This file is scanned in order and the first matching entry is used to store the article.
If an article does not match any entry, either by being posted to a newsgroup which does not match any of the <wildmat> patterns or by being outside the size and expires ranges of all entries whose newsgroups pattern it does match, the article is not stored and is rejected by innd. When this happens, the error message:
cant store article: no matching entry in storage.conf
is logged to syslog. If you want to silently drop articles matching certain newsgroup patterns or size or expires ranges, assign them to the "trash" storage method rather than having them not match any storage method entry.
Currently, there are five storage methods available. Each method has its pros and cons; you can choose any mixture of them as is suitable for your environment. Note that each method has an attribute EXPENSIVESTAT which indicates whether checking the existence of an article is expensive or not. This is used to run expireover(8).
CNFS has its own configuration file, cycbuff.conf, which describes some subtleties to the basic description given above. Storage method entries for the "cnfs" storage method must have an options: field specifying the metacycbuff into which articles matching that entry should be stored; see cycbuff.conf(5) for details on metacycbuffs.
Advantages: By far the fastest of all storage methods (except for "trash"), since it eliminates the overhead of dealing with a file system and creating new files. Unlike all other storage methods, it does not require manual article expiration. With CNFS, the server will never throttle itself due to a full spool disk, and groups are restricted to just the buffer files given so that they can never use more than the amount of disk space allocated to them.
Disadvantages: Article retention times are more difficult to control because old articles are overwritten automatically. Attacks on Usenet, such as flooding or massive amounts of spam, can result in wanted articles expiring much faster than intended (with no warning).
<patharticles>/timecaf-nn/bb/aacc.CF
where "nn" is the hexadecimal value of <storage_class>, "bb" and "aacc" are the hexadecimal components of the arrival time, and "CF" is a hardcoded extension. (The arrival time, in seconds since the epoch, is converted to hexadecimal and interpreted as "0xaabbccdd", with "aa", "bb", and "cc" used to build the path.) This method does not have self-expire functionality (meaning expire has to run periodically to delete old articles, as well as cancelled articles if immediatecancel is not set to true in inn.conf). EXPENSIVESTAT is false for this method.
A given CAF file contains all the articles received during a time frame of 4 minutes or so (256 seconds), and is limited to 262,144 articles and about 3,5 GB. It is enough for normal operations. The only caveat is when you're feeding at high speed bunches of articles between two servers; you'll then want to limit it to that amount of articles during the time frame when a CAF file stores newly arrived articles.
Advantages: It is roughly four times faster than "timehash" for article writes, since much of the file system overhead is bypassed, while still retaining the same fine control over article retention time.
Disadvantages: Using this method means giving up all but the most careful manually fiddling with the article spool; in this aspect, it looks like "cnfs". As one of the newer and least widely used storage types, "timecaf" has not been as thoroughly tested as the other methods.
<patharticles>/time-nn/bb/cc/yyyy-aadd
where "nn" is the hexadecimal value of <storage_class>, "yyyy" is a hexadecimal sequence number, and "bb", "cc", and "aadd" are components of the arrival time in hexadecimal (the arrival time is interpreted as documented above under "timecaf"). This method does not have self-expire functionality. Cancelled articles are removed immediately. EXPENSIVESTAT is true for this method.
Advantages: Heavy traffic groups do not cause bottlenecks, and a fine control of article retention time is still possible.
Disadvantages: The ability to easily find all articles in a given newsgroup and manually fiddle with the article spool is lost, and INN still suffers from speed degradation due to file system overhead (creating and deleting individual files is a slow operation).
<patharticles>/news/group/name/nnnnn
where "news/group/name" is the name of the newsgroup to which the article was posted with each period changed to a slash, and "nnnnn" is the sequence number of the article in that newsgroup. For crossposted articles, the article is linked into each newsgroup to which it is crossposted (using either hard or symbolic links). This is the way versions of INN prior to 2.0 stored all articles, as well as being the article storage format used by C News and earlier news systems. This method does not have self-expire functionality. Cancelled articles are removed immediately. EXPENSIVESTAT is true for this method.
Advantages: It is widely used and well-understood; it can read article spools written by older versions of INN and it is compatible with all third-party INN add-ons. This storage mechanism provides easy and direct access to the articles stored on the server, makes writing programs that fiddle with the news spool very easy, gives fine control over article retention times, and comes with the scanspool support utility to perform sanity checks.
Disadvantages: It takes a very fast file system and I/O system to keep up with current Usenet traffic volumes due to file system overhead. Groups with heavy traffic tend to create a bottleneck because of inefficiencies in storing large numbers of article files in a single directory. It requires a nightly expire program to delete old articles out of the news spool, a process that can slow down the server for several hours or more.
The following sample storage.conf file would store all articles posted to alt.binaries.* in the "BINARIES" CNFS metacycbuff, all articles over roughly 50 KB in any other hierarchy in the "LARGE" CNFS metacycbuff, all other articles in alt.* in one timehash class, and all other articles in any newsgroups in a second timehash class, except for the internal.* hierarchy which is stored in traditional spool format.
method tradspool { class: 1 newsgroups: internal.* } method cnfs { class: 2 newsgroups: alt.binaries.* options: BINARIES } method cnfs { class: 3 newsgroups: * size: 50000 options: LARGE } method timehash { class: 4 newsgroups: alt.* } method timehash { class: 5 newsgroups: * }
Notice that the last storage method entry will catch everything. This is a good habit to get into; make sure that you have at least one catch-all entry just in case something you did not expect falls through the cracks. Notice also that the special rule for the internal.* hierarchy is first, so it will catch even articles crossposted to alt.binaries.* or over 50 KB in size.
As for poison wildmat expressions, if you have for instance an article crossposted between misc.foo and misc.bar, the pattern:
misc.*,!misc.bar
will match that article whereas the pattern:
misc.*,@misc.bar
will not match that article. An article posted only to misc.bar will fail to match either pattern.
Usually, high-volume groups and groups whose articles do not need to be kept around very long (binaries groups, *.jobs*, news.lists.filters, etc.) are stored in CNFS buffers. Use the other methods (or CNFS buffers again) for everything else. However, it is as often as not most convenient to keep in "tradspool" special hierarchies like local hierarchies and hierarchies that should never expire or through the spool of which you need to go manually.
Written by Katsuhiro Kondou <kondou@nec.co.jp> for InterNetNews. Rewritten into POD by Julien Elie.
cycbuff.conf(5), expire.ctl(5), expireover(8), inn.conf(5), innd(8), libinn_uwildmat(3), scanspool(8).
2024-03-31 | INN 2.7.2 |