run_abundance.py - helper script to estimate the abundance at a
given taxonomic level
usage: run_abundance.py [-h] [-v] [-A N] [-P N] [-F N] [--distance
DISTANCE]
- [-M DIAMETER] [-S DECOMP] [-p DIR] [-rt] [-o OUTPUT]
- [-d OUTPUT_DIR] [-c CONFIG] [-t TREE] [-r RAXML] [-a ALIGN] [-f FRAG] [-m
MOLECULE] [--ignore-overlap] [-x N] [-cp CHCK_FILE] [-cpi N] [-seed N]
[-bt N] [-at N] [-pt N] [-g N] [-b N] [-no_trim] [-bin N] [-D] [-C N] [-G
GENES]
This script runs the SEPP algorithm on an input tree, alignment,
fragment file, and RAxML info file.
- These options determine the alignment decomposition size and taxon
insertion size. If None is given, then the default is to align/place at
10% of total taxa. The alignment decomosition size must be less than the
taxon insertion size.
- -A N, --alignmentSize
N
- max alignment subset size of N [default: 10% of the total number of taxa
or the placement subset size if given]
- -P N, --placementSize
N
- max placement subset size of N [default: 10% of the total number of taxa
or the alignment length (whichever bigger)]
- -F N,
--fragmentChunkSize N
- maximum fragment chunk size of N. Helps controlling memory. [default:
20000]
- --distance
DISTANCE
- minimum p-distance before stopping the decomposition[default: 1]
- -M DIAMETER,
--diameter DIAMETER
- maximum tree diameter before stopping the decomposition[default:
None]
- -S DECOMP,
--decomp_strategy DECOMP
- decomposition strategy [default: using tree branch length]
- These options control input. To run SEPP the following is required. A
backbone tree (in newick format), a RAxML_info file (this is the file
generated by RAxML during estimation of the backbone tree. Pplacer uses
this info file to set model parameters), a backbone alignment file (in
fasta format), and a fasta file including fragments. The input sequences
are assumed to be DNA unless specified otherwise.
- -c CONFIG, --config
CONFIG
- A config file, including options used to run SEPP. Options provided as
command line arguments overwrite config file values for those options.
[default: None]
- -t TREE, --tree
TREE
- Input tree file (newick format) [default: None]
- -r RAXML, --raxml
RAXML
- RAxML_info file including model parameters, generated by RAxML.[default:
None]
- -a ALIGN, --alignment
ALIGN
- Aligned fasta file [default: None]
- -f FRAG, --fragment
FRAG
- fragment file [default: None]
- -m MOLECULE,
--molecule MOLECULE
- Molecule type of sequences. Can be amino, dna, or rna [default: dna]
- --ignore-overlap
- When a query sequence has the same name as a backbone sequence, ignore the
query sequences and keep the backbone sequence [default: False]
- These arguments set settings specific to TIPP
- -bt N,
--blastThreshold N
- Minimum query coverage for blast hit to map read to a markerThis should be
a number between >0 [default : 50]
- -at N,
--alignmentThreshold N
- Enough alignment subsets are selected to reach a commulative probability
of N. This should be a number between 0 and 1 [default: 0.95]
- -pt N,
--placementThreshold N
- Enough placements are selected to reach a commulative probability of N.
This should be a number between 0 and 1 [default: 0.95]
- -g N, --gene
N
- Classify on only the specified gene.
- -b N, --blast_file
N
- Blast file with fragments already binned.
- -no_trim,
--do_not_trim_after_blast
- Trim query sequence if it extends outside marker (BLAST only).
- -bin N,
--bin_using N
- Use blast or hmmer for binning [default: blast]
- -D, --dist
- Treat fragments as distribution
- -C N, --cutoff
N
- Placement probability requirement to count toward the distribution. This
should be a number between 0 and 1 [default: 0.0]
- -G GENES, --genes
GENES
- Use markers or cogs genes [default: markers-v3]
run_sepp.py(1), run_tipp.py(1),