gt_mpi_gather - MPI gatherer for GenomicsDB
- --help, -h
- Print a usage message summarizing options available and exit
- --json-config=<query json file>, -j <query json file>
- Can specify workspace, array, query_column_ranges, query_row_ranges,
vid_mapping_file, callset_mapping_file, query_attributes, query_filter,
reference_genome, etc. as fields in the json file e.g.
- { "workspace" : "/tmp/ws",
- "array" : "t0_1_2", "query_column_ranges" :
[ [ [0, 100 ], 500 ] ], "query_row_ranges" : [ [ [0, 2 ] ],
"vid_mapping_file" : "/tests/inputs/vid.json",
"callset_mapping_file":
"/tests/inputs/callset_mapping.json",
"query_attributes" : [ "REF", "ALT",
"BaseQRankSum", "MQ", "MQ0",
"ClippingRankSum", "MQRankSum",
"ReadPosRankSum", "DP", "GT",
"GQ", "SB", "AD", "PL",
"DP_FORMAT", "MIN_DP" ] }
- --loader-json-config=<loader json file>, -l <loader json
file>
- Optional, if vid_mapping_file and callset_mapping_file fields are
specified in the query json file
- --workspace=<workspace dir>, -w <GenomicsDB workspace
dir>
- Optional, if workspace is specified in any of the json config files
- --array=<array dir>, -A <GenomicsDB array dir>
- Optional, if array is specified in any of the json config files
- --print-calls
- Optional, prints VariantCalls in a JSON format
- --print-csv
- Optional, outputs CSV with the fields and the order of CSV lines
determined by the query attributes
- --produce-Broad-GVCF
- Optional, produces combined gVCF from the GenomicsDB data constrained by
the query configuration --output-format=<output_format>, -O
<output_format>
- used with
--produce-Broad-GVCF
- Output format can be one of the following strings: "z[0-9]"
(compressed VCF),"b[0-9]" (compressed BCF) or "bu"
(uncompressed BCF). Default is uncompressed VCF if not specified.
- --produce-histogram
- Optional
- --produce-interesting-positions
- Optional
- --version Print version and exit
- If none of the print/produce arguments are specified, the tool prints all
the Variants constrained by the query configuration in a JSON format
- Parallel Querying
- MPI could be used for parallel querying, e.g. mpirun -n
<num_processes> -hostfile <hostfile>
./bin/gt_mpi_gather -j <query.json> -l
<loader.json> [<other_args>]