RUN_ABUNDANCE.PY(1) User Commands RUN_ABUNDANCE.PY(1)

run_abundance.py - helper script to estimate the abundance at a given taxonomic level

usage: run_abundance.py [-h] [-v] [-A N] [-P N] [-F N] [--distance DISTANCE]

[-M DIAMETER] [-S DECOMP] [-p DIR] [-rt] [-o OUTPUT]
[-d OUTPUT_DIR] [-c CONFIG] [-t TREE] [-r RAXML] [-a ALIGN] [-f FRAG] [-m MOLECULE] [--ignore-overlap] [-x N] [-cp CHCK_FILE] [-cpi N] [-seed N] [-bt N] [-at N] [-pt N] [-g N] [-b N] [-no_trim] [-bin N] [-D] [-C N] [-G GENES]

This script runs the SEPP algorithm on an input tree, alignment, fragment file, and RAxML info file.

show this help message and exit
show program's version number and exit

These options determine the alignment decomposition size and taxon insertion size. If None is given, then the default is to align/place at 10% of total taxa. The alignment decomosition size must be less than the taxon insertion size.
max alignment subset size of N [default: 10% of the total number of taxa or the placement subset size if given]
max placement subset size of N [default: 10% of the total number of taxa or the alignment length (whichever bigger)]
maximum fragment chunk size of N. Helps controlling memory. [default: 20000]
minimum p-distance before stopping the decomposition[default: 1]
maximum tree diameter before stopping the decomposition[default: None]
decomposition strategy [default: using tree branch length]

These options control output.
Tempfile files will be written to DIR. Full-path required. [default: /tmp/sepp]
Remove tempfile directory. [default: disabled]
output files with prefix OUTPUT. [default: output]
output to OUTPUT_DIR directory. full-path required. [default: .]

These options control input. To run SEPP the following is required. A backbone tree (in newick format), a RAxML_info file (this is the file generated by RAxML during estimation of the backbone tree. Pplacer uses this info file to set model parameters), a backbone alignment file (in fasta format), and a fasta file including fragments. The input sequences are assumed to be DNA unless specified otherwise.
A config file, including options used to run SEPP. Options provided as command line arguments overwrite config file values for those options. [default: None]
Input tree file (newick format) [default: None]
RAxML_info file including model parameters, generated by RAxML.[default: None]
Aligned fasta file [default: None]
fragment file [default: None]
Molecule type of sequences. Can be amino, dna, or rna [default: dna]
When a query sequence has the same name as a backbone sequence, ignore the query sequences and keep the backbone sequence [default: False]

These options control how SEPP is run
Use N cpus [default: number of cpus available on the machine]
checkpoint file [default: no checkpointing]
Interval (in seconds) between checkpoint writes. Has effect only with -cp provided. [default: 3600]
random seed number. [default: 297834]

These arguments set settings specific to TIPP
Minimum query coverage for blast hit to map read to a markerThis should be a number between >0 [default : 50]
Enough alignment subsets are selected to reach a commulative probability of N. This should be a number between 0 and 1 [default: 0.95]
Enough placements are selected to reach a commulative probability of N. This should be a number between 0 and 1 [default: 0.95]
Classify on only the specified gene.
Blast file with fragments already binned.
Trim query sequence if it extends outside marker (BLAST only).
Use blast or hmmer for binning [default: blast]
Treat fragments as distribution
Placement probability requirement to count toward the distribution. This should be a number between 0 and 1 [default: 0.0]
Use markers or cogs genes [default: markers-v3]

run_sepp.py(1), run_tipp.py(1),

September 2021 run_abundance.py