Trinity - RNA-Seq De novo Assembly
Trinity represents a novel method for the efficient and robust de
novo reconstruction of transcriptomes from RNA-seq data. Trinity combines
three independent software modules: Inchworm, Chrysalis, and Butterfly,
applied sequentially to process large volumes of RNA-seq reads. Trinity
partitions the sequence data into many individual de Bruijn graphs, each
representing the transcriptional complexity at a given gene or locus, and
then processes each graph independently to extract full-length splicing
isoforms and to tease apart transcripts derived from paralogous genes.
Required:
- --seqType <string> type of reads: ( fa, or fq )
- --max_memory <string> suggested max memory to use by Trinity
where limiting can be enabled. (jellyfish, sorting, etc) provied in Gb of
RAM, ie. '--max_memory 10G'
If paired reads:
- --left <string> left reads, one or more (separated by
space)
- --right <string> right reads, one or more (separated by
space)
Or, if unpaired reads:
- --single <string> single reads, one or more (note, if single
file contains pairs, can use flag: --run_as_paired )
Misc:
- --SS_lib_type <string> Strand-specific RNA-Seq read
orientation. if paired: RF or FR, if single: F or R. (dUTP method = RF)
See web documentation.
- --CPU <int> number of CPUs to use, default: 2
- --min_contig_length <int> minimum assembled contig length to
report (def=200)
- --long_reads <string> fasta file containing error-corrected
or circular consensus (CCS) pac bio reads
- --genome_guided_bam <string> genome guided mode, provide path
to coordinate-sorted bam file. (see genome-guided param section under
--show_full_usage_info)
- --jaccard_clip option, set if you have paired reads and you expect
high gene density with UTR overlap (use FASTQ input file format for
reads). (note: jaccard_clip is an expensive operation, so avoid using it
unless necessary due to finding excessive fusion transcripts w/o it.)
- --trimmomatic run Trimmomatic to quality trim reads see
'--quality_trimming_params' under full usage info for tailored
settings.
- --normalize_reads run in silico normalization of reads. Defaults to
max. read coverage of 50. see '--normalize_max_read_cov' under full usage
info for tailored settings.
- --no_distributed_trinity_exec do not run Trinity phase 2 (assembly
of partitioned reads), and stop after generating command list.
- --output <string> name of directory for output (will be
created if it doesn't already exist) default(your current working
directory)
- --full_cleanup only retain the Trinity fasta file, rename as
${output_dir}.Trinity.fasta
- --cite show the Trinity literature citation
- --version reports Trinity version (Trinity_v2.0.2) and exits.
- --show_full_usage_info show the many many more options available
for running Trinity (expert usage).
A typical Trinity command might be:
- Trinity --seqType fq --max_memory 50G --left
reads_1.fq --right reads_2.fq --CPU 6
and for Genome-guided Trinity:
- Trinity --genome_guided_bam rnaseq_alignments.csorted.bam
--max_memory 50G
--genome_guided_max_intron 10000 --CPU 6
see: /usr/lib/trinityrnaseq/sample_data/test_Trinity_Assembly/ for
sample data and 'runMe.sh' for example Trinity execution
For more details, visit: http://trinityrnaseq.github.io