RNApvmin - manual page for RNApvmin 2.6.4
RNApvmin [options] <file.shape>
RNApvmin 2.6.4
Calculate a perturbation vector that minimizes discripancies
between predicted and observed pairing probabilities
The program reads a RNA sequence from stdin and uses an iterative
minimization process to calculate a perturbation vector that minimizes the
discripancies between predicted pairing probabilites and observed pairing
probabilities (deduced from given shape reactivities). Experimental data is
read from a given SHAPE file and normalized to pairing probabilities. The
experimental data has to be provided in a multiline plain text file where
each line has the format '[position] [nucleotide] [absolute shape
reactivity]' (e.g. '3 A 0.7'). The objective function used for the
minimization may be weighted by choosing appropriate values for sigma and
tau.
The minimization progress will be written to stderr. Once the
minimization has terminated, the obtained perturbation vector is written to
stdout.
- -h, --help
- Print help and exit
- --detailed-help
- Print help, including all details and hidden options, and exit
- --full-help
- Print help, including hidden options, and exit
- -V, --version
- Print version and exit
- Command line options for input and output (pre-)processing
- -j,
--numThreads=INT
- Set the number of threads used for calculations.
- Select additional algorithms which should be included in the calculations.
The Minimum free energy (MFE) and a structure representative are
calculated in any case.
- --shapeConversion=STRING
- Specify the method used to convert SHAPE reactivities to pairing
probabilities.
- (default=`O')
- The following methods can be used to convert SHAPE reactivities into the
probability for a certain nucleotide to be unpaired.
- 'M': Use linear mapping according to Zarringhalam et al. 2012
- 'C': Use a cutoff-approach to divide into paired and unpaired nucleotides
(e.g. "C0.25")
- 'S': Skip the normalizing step since the input data already represents
probabilities for being unpaired rather than raw reactivity values
- 'L': Use a linear model to convert the reactivity into a probability for
being unpaired (e.g. "Ls0.68i0.2" to use a slope of 0.68 and an
intercept of 0.2)
- 'O': Use a linear model to convert the log of the reactivity into a
probability for being unpaired (e.g. "Os1.6i-2.29" to use a
slope of 1.6 and an intercept of -2.29)
- --tauSigmaRatio=DOUBLE
- Ratio of the weighting factors tau and sigma. (default=`1.0')
- A high ratio will lead to a solution as close as possible to the
experimental data, while a low ratio will lead to results close to the
thermodynamic prediction without guiding pseudo energies.
- --objectiveFunction=INT
- The energies of the perturbation vector and the discripancies between
predicted and observed pairing probabilities contribute to the objective
function. This parameter defines, which function is used to process the
contributions before summing them up. 0 square 1 absolute.
- (default=`0')
- --sampleSize=INT
- The iterative minimization process requires to evaluate the gradient of
the objective function.
- (default=`1000')
- A sample size of 0 leads to an analytical evaluation which scales as
O(N^4). Choosing a sample size >0 estimates the gradient by sampling
the given number of sequences from the ensemble, which is much
faster.
- -N,
--nonRedundant
- Enable non-redundant sampling strategy.
- (default=off)
- --intermediatePath=STRING
Write an output file for each iteration of the
- minimization process.
- Each file contains the used perturbation vector and the score of the
objective function. The number of the iteration will be appended to the
given path.
- --initialVector=DOUBLE
- Specify the vector of initial pertubations. (default=`0')
- Defines the initial perturbation vector which will be used as starting
vector for the minimization process. The value 0 results in a null vector.
Every other value x will be used to populate the initial vector with
random numbers from the interval [-x,x].
- --minimizer=ENUM
- Set the minimizing algorithm used for finding an appropriate perturbation
vector.
- (possible values="conjugate_fr",
- "conjugate_pr", "vector_bfgs",
"vector_bfgs2", "steepest_descent",
"default" default=`default')
- The default option uses a custom implementation of the gradient descent
algorithms while all other options represent various algorithms
implemented in the GNU Scientific Library. When the GNU Scientific Library
can not be found, only the default minimizer is available.
- --initialStepSize=DOUBLE
- The initial stepsize for the minimizer methods.
- (default=`0.01')
- --minStepSize=DOUBLE
- The minimal stepsize for the minizimer methods.
- (default=`1e-15')
- --minImprovement=DOUBLE
- The minimal improvement in the default minizimer method that has to be
surpassed to considered a new result a better one.
- (default=`1e-3')
- --minimizerTolerance=DOUBLE
- The tolerance to be used in the GSL minimizer
- methods.
- (default=`1e-3')
- -S,
--pfScale=DOUBLE
- In the calculation of the pf use scale*mfe as an estimate for the ensemble
free energy (used to avoid overflows).
- (default=`1.07')
- The default is 1.07, useful values are 1.0 to 1.2. Occasionally needed for
long sequences.
- Command line options to interact with the structure constraints feature of
this program
- --maxBPspan=INT
- Set the maximum base pair span.
- (default=`-1')
- Energy parameter sets can be adapted or loaded from user-provided input
files
- -T,
--temp=DOUBLE
- Rescale energy parameters to a temperature of temp C. Default is 37C.
- (default=`37.0')
- -P,
--paramFile=paramfile
- Read energy parameters from paramfile, instead of using the default
parameter set.
- Different sets of energy parameters for RNA and DNA should accompany your
distribution. See the RNAlib documentation for details on the file format.
The placeholder file name 'DNA' can be used to load DNA parameters without
the need to actually specify any input file.
- -4, --noTetra
- Do not include special tabulated stabilizing energies for tri-, tetra- and
hexaloop hairpins.
- (default=off)
- Mostly for testing.
- --salt=DOUBLE
- Set salt concentration in molar (M). Default is 1.021M.
- Tweak the energy model and pairing rules additionally using the following
parameters
- -d,
--dangles=INT
- How to treat "dangling end" energies for bases adjacent to
helices in free ends and multi-loops.
- (default=`2')
- With -d1 only unpaired bases can participate in at most one
dangling end. With -d2 this check is ignored, dangling energies
will be added for the bases adjacent to a helix on both sides in any case;
this is the default for mfe and partition function folding (-p).
The option -d0 ignores dangling ends altogether (mostly for
debugging). With -d3 mfe folding will allow coaxial stacking of
adjacent helices in multi-loops. At the moment the implementation will not
allow coaxial stacking of the two interior pairs in a loop of degree 3 and
works only for mfe folding.
- Note that with -d1 and -d3 only the MFE computations will be
using this setting while partition function uses -d2 setting, i.e.
dangling ends will be treated differently.
- --noLP
- Produce structures without lonely pairs (helices of length 1).
- (default=off)
- For partition function folding this only disallows pairs that can only
occur isolated. Other pairs may still occasionally occur as helices of
length 1.
- --noGU
- Do not allow GU pairs.
- (default=off)
- --noClosingGU
- Do not allow GU pairs at the end of helices.
- (default=off)
- --nsp=STRING
- Allow other pairs in addition to the usual AU,GC,and GU pairs.
- Its argument is a comma separated list of additionally allowed pairs. If
the first character is a "-" then AB will imply that AB and BA
are allowed pairs, e.g. --nsp="-GA" will allow GA and AG
pairs. Nonstandard pairs are given 0 stacking energy.
- -e,
--energyModel=INT
- Set energy model.
- Rarely used option to fold sequences from the artificial ABCD... alphabet,
where A pairs B, C-D etc. Use the energy parameters for GC (-e 1)
or AU (-e 2) pairs.
- --helical-rise=FLOAT
- Set the helical rise of the helix in units of Angstrom.
- (default=`2.8')
- Use with caution! This value will be re-set automatically to 3.4 in case
DNA parameters are loaded via -P DNA and no further value is
provided.
- --backbone-length=FLOAT
- Set the average backbone length for looped regions in units of
Angstrom.
- (default=`6.0')
- Use with caution! This value will be re-set automatically to 6.76 in case
DNA parameters are loaded via -P DNA and no further value is
provided.
If you use this program in your work you might want to
cite:
R. Lorenz, S.H. Bernhart, C. Hoener zu Siederdissen, H. Tafer, C.
Flamm, P.F. Stadler and I.L. Hofacker (2011), "ViennaRNA Package
2.0", Algorithms for Molecular Biology: 6:26
I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker,
P. Schuster (1994), "Fast Folding and Comparison of RNA Secondary
Structures", Monatshefte f. Chemie: 125, pp 167-188
R. Lorenz, I.L. Hofacker, P.F. Stadler (2016), "RNA folding
with hard and soft constraints", Algorithms for Molecular Biology 11:1
pp 1-13
S. Washietl, I.L. Hofacker, P.F. Stadler, M. Kellis (2012)
"RNA folding with soft constraints: reconciliation of probing data and
thermodynamics secondary structure prediction" Nucl Acids Res: 40(10),
pp 4261-4272
The energy parameters are taken from:
D.H. Mathews, M.D. Disney, D. Matthew, J.L. Childs, S.J.
Schroeder, J. Susan, M. Zuker, D.H. Turner (2004), "Incorporating
chemical modification constraints into a dynamic programming algorithm for
prediction of RNA secondary structure", Proc. Natl. Acad. Sci. USA:
101, pp 7287-7292
D.H Turner, D.H. Mathews (2009), "NNDB: The nearest neighbor
parameter database for predicting stability of nucleic acid secondary
structure", Nucleic Acids Research: 38, pp 280-282
RNApvmin acceptes a SHAPE file and a corresponding nucleotide
sequence, which is read form stdin.
RNApvmin sequence.shape < sequence.fasta > sequence.pv
The normalized SHAPE reactivity data has to be stored in a text
file, where each line contains the position and the reactivity for a certain
nucleotide ([position] [nucleotide] [SHAPE reactivity]).
1 A 1.286
2 U 0.383
3 C 0.033
4 C 0.017
...
...
98 U 0.234
99 G 0.885
The nucleotide information in the SHAPE file is optional and will
be used to cross check the given input sequence if present. If SHAPE
reactivities could not be determined for every nucleotide, missing values
can simply be omited.
The progress of the minimization will be printed to stderr. Once a
solution was found, the calculated perturbation vector will be print to
stdout and can then further be used to constrain RNAfold's MFE/partition
function calculation by applying the perturbation energies as soft
constraints.
RNAfold --shape=sequence.pv --shapeMethod=W < sequence.fasta
Dominik Luntzer, Ronny Lorenz
If in doubt our program is right, nature is at fault. Comments
should be sent to rna@tbi.univie.ac.at.