bloom(1) | bloom(1) |
bloom - utility to work with Bloom filters
bloom [global options] command [command options] [arguments...]
This is the main tool to interact with Bloom filters, e.g. to create, fill and query them.
To create a new Bloom filter with a desired capacity and false positive probability, you can use `create`:
#will create a gzipped Bloom filter with 100.000 capacity and a 0.1 % false positive probability bloom --gzip create -p 0.001 -n 100000 test.bloom.gzTo insert values, you can use the insert command and pipe some input to it (each line will be treated as one value):
cat values | bloom --gzip insert test.bloom.gzYou can also interactively add values to the filter by specifying the --interactive option:
bloom --gzip --interactive insert test.bloom.gzTo check if a given value or a list of values is in the filter, you can use `check`:
cat values | bloom --gzip check test.bloom.gzThis will return a list of all values in the filter.
Sometimes it is useful to attach additional information to a string that we want to check against the Bloom filter, such as a timestamp or the original line content. To make passing along this additional information easier within a shell context, the Bloom tool provides an option for splitting the input string by a given delimiter and checking the filter against the resulting field values.
# will check the Bloom filter for the values foo, bar and baz cat "foo,bar,baz" | bloom -s filter.bloom # uses a different delimiter (--magic-delimiter--) cat "foo--ac5ba--bar--ac5ba--baz" | bloom -d "--ac5ba--" -s filter.bloom # will check the Bloom filter against the second field value only cat "foo,bar,baz" | bloom -f 1 -s filter.bloom # will check the Bloom filter against the second and third field values only cat "foo,bar,baz" | bloom -f 1,2 -s filter.bloom # will print one line for each field value that matched against the filter cat "foo,bar,baz" | bloom -e -s filter.bloom # will print the last field value for each line whose fields matched against the filter cat "foo,bar,baz" | bloom -e -s --pf -1 filter.bloomThis functionality is especially handy when using CSV data, as it allows you to filter CSV rows by checking individual columns against the filter without having to use external tools to split and reassemble the lines.
01 November 2024 | bloom 0.2.4 |