JULIUS(1) | JULIUS(1) |
julius
- open source multi-purpose LVCSR engine
julius [-C jconffile] [options...]
julius is a high-performance, multi-purpose, open-source speech recognition engine for researchers and developers. It is capable of performing almost real-time recognition of continuous speech with over 60k-word 3-gram language model and triphone HMM model, on most current PCs. julius can perform recognition on audio files, live microphone input, network input and feature parameter files.
The core recognition module is implemented as C library called "JuliusLib". It can also be extended by plug-in facility.
julius needs a language model and an acoustic model to run as a speech recognizer. julius supports the following models.
Acoustic model
Sub-word HMM (Hidden Markov Model) in HTK ascii format are supported. Phoneme models (monophone), context dependent phoneme models (triphone), tied-mixture and phonetic tied-mixture models of any unit can be used. When using context dependent models, inter-word context dependency is also handled. Multi-stream feature and MSD-HMM is also supported. You can further use a tool mkbinhmm to convert the ascii HMM file to a compact binary format for faster loading.
Note that julius itself can only extract MFCC features from speech data. If you use acoustic HMM trained for other feature, you should give the input in HTK parameter file of the same feature type.
Language model: word N-gram
Word N-gram language model, up to 10-gram, is supported. Julius uses different N-gram for each pass: left-to-right 2-gram on 1st pass, and right-to-left N-gram on 2nd pass. It is recommended to use both LR 2-gram and RL N-gram for Julius. However, you can use only single LR N-gram or RL N-gram. In such case, approximated LR 2-gram computed from the given N-gram will be applied at the first pass.
The Standard ARPA format is supported. In addition, a binary format is also supported for efficiency. The tool mkbingram(1) can convert ARPA format N-gram to binary format.
Language model: grammar
The grammar format is an original one, and tools to create a recognirion grammar are included in the distribution. A grammar consists of two files: one is a 'grammar' file that describes sentence structures in a BNF style, using word 'category' name as terminate symbols. Another is a 'voca' file that defines words with its pronunciations (i.e. phoneme sequences) for each category. They should be converted by mkdfa(1) to a deterministic finite automaton file (.dfa) and a dictionary file (.dict), respectively. You can also use multiple grammars.
Language model: isolated word
You can perform isolated word recognition using only word dictionary. With this model type, Julius will perform rapid one pass recognition with static context handling. Silence models will be added at both head and tail of each word. You can also use multiple dictionaries in a process.
Recognition algorithm of julius is based on a two-pass strategy. Word 2-gram and reverse word 3-gram is used on the respective passes. The entire input is processed on the first pass, and again the final searching process is performed again for the input, using the result of the first pass to narrow the search space. Specifically, the recognition algorithm is based on a tree-trellis heuristic search combined with left-to-right frame-synchronous beam search and right-to-left stack decoding search.
When using context dependent phones (triphones), interword contexts are taken into consideration. For tied-mixture and phonetic tied-mixture models, high-speed acoustic likelihood calculation is possible using gaussian pruning.
For more details, see the related documents.
These options specify the models, system behaviors and various search parameters to Julius. These option can be set at the command line, but it is recommended that you write them in a text file as a "jconf file", and specify it by "-C" option.
Applications incorporating JuliusLib also use these options to set the parameters of core recognition engine. For example, a jconf file can be loaded to the enine by calling j_config_load_file_new() with the jconf file name as argument.
Please note that relative paths in a jconf file should be relative to the jconf file itself, not the current working directory.
Below are the details of all options, gathered by group.
These are application options of Julius, outside of JuliusLib. It contains parameters and switches for result output, character set conversion, log level, and module mode options. These option are specific to Julius, and cannot be used at applications using JuliusLib other than Julius.
-outfile
-separatescore
-callbackdebug
-charconv from to
On Linux, the arguments should be a code name. You can obtain the list of available code names by invoking the command "iconv --list". On Windows, the arguments should be a code name or codepage number. Code name should be one of "ansi", "mac", "oem", "utf-7", "utf-8", "sjis", "euc". Or you can specify any codepage number supported at your environment.
-nocharconv
-module [port]
-record dir
With input rejection by -rejectshort, the rejected input will also be recorded even if they are rejected.
-logfile file
-nolog
-help
These are model-/search-dependent options relating audio input, sound detection, GMM, decoding algorithm, plugin facility, and others. Global options should be placed before any instance declaration (-AM, -LM, or -SR), or just after "-GLOBAL" option.
Audio input
-input {mic|rawfile|mfcfile|adinnet|stdin|netaudio|alsa|oss|esd}
´mic' is to get audio input from a default live microphone device, and 'adinnet' means receiving waveform data via tcpip network from an adinnet client. 'netaudio' is from DatLink/NetAudio input, and 'stdin' means data input from standard input.
For waveform file input, only WAV (no compression) and RAW (noheader, 16bit, big endian) are supported by default. Other format can be read when compiled with libsnd library. To see what format is actually supported, see the help message using option -help. For stdin input, only WAV and RAW is supported. (default: mfcfile)
At Linux, you can choose API at run time by specifying alsa, oss and esd.
-chunk_size samples
-filelist filename
-notypecheck
-48
-NA devicename
-adport port_number
-nostrip
-zmean , -nozmean
This option uses static offset for the channel. See also -zmeansource for frame-wise offset removal.
Speech detection by level and zero-cross
-cutsilence , -nocutsilence
-lv thres
-zc thres
-headmargin msec
-tailmargin msec
Input rejection
Two simple front-end input rejection methods are implemented, based on input length and average power of detected segment. The rejection by average power is experimental, and can be enabled by --enable-power-reject on compilation. Valid for MFCC feature with power coefficient and real-time input only.
For GMM-based input rejection see the GMM section below.
-rejectshort msec
-powerthres thres
This option is valid when --enable-power-reject is specified at compilation time.
Gaussian mixture model / GMM-VAD
GMM will be used for input rejection by accumulated score, or for front-end GMM-based VAD when --enable-gmm-vad is specified.
NOTE: You should also set the proper MFCC parameters required for the GMM, specifying the acoustic parameters described in AM section -AM_GMM.
When GMM-based VAD is enabled, the voice activity score will be calculated at each frame as front-end processing. The value will be computed as \[ \max_{m \in M_v} p(x|m) - \max_{m \in M_n} p(x|m) \] where $M_v$ is a set of voice GMM, and $M_n$ is a set of noise GMM whose names should be specified by -gmmreject. The activity score will be then averaged for the last N frames, where N is specified by -gmmmargin. Julius updates the averaged activity score at each frame, and detect speech up-trigger when the value gets higher than a value specified by -gmmup, and detecgt down-trigger when it gets lower than a value of -gmmdown.
-gmm hmmdefs_file
-gmmnum number
-gmmreject string
-gmmmargin frames
This option will be valid only if compiled with --enable-gmm-vad.
-gmmup value
This option will be valid only if compiled with --enable-gmm-vad.
-gmmdown value
This option will be valid only if compiled with --enable-gmm-vad.
Decoding option
Real-time processing means concurrent processing of MFCC computation 1st pass decoding. By default, real-time processing on the pass is on for microphone / adinnet / netaudio input, and for others.
-realtime , -norealtime
Misc. options
-C jconffile
-version
-setting
-quiet
-debug
-check {wchmm|trellis|triphone}
-plugindir dirlist
The following arguments will create a new configuration set with default parameters, and switch current set to it. Jconf parameters specified after the option will be set into the current set.
To do multi-model decoding, these argument should be specified at the first of each model / search instances with different names. Any options before the first instance definition will be IGNORED.
When no instance definition is found (as older version of Julius), all the options are assigned to a default instance named _default.
Please note that decoding with a single LM and multiple AMs is not fully supported. For example, you may want to construct the jconf file as following.
This type of model sharing is not supported yet, since some part of LM processing depends on the assigned AM. Instead, you can get the same result by defining the same LMs for each AM, like this:
-AM name
-LM name
-SR name am_name lm_name
-AM_GMM
-GLOBAL
-nosectioncheck , -sectioncheck
This group contains options for model definition of each language model type. When using multiple LM, one instance can have only one LM.
Only one type of LM can be specified for a LM configuration. If you want to use multi model, you should define them one as a new LM.
N-gram
-d bingram_file
-nlr arpa_ngram_file
Since ARPA file often gets huge and requires a lot of time to load, it may be better to convert the ARPA file to Julius binary format by mkbingram. Note that if both forward and backward N-gram is used for recognition, they together will be converted to a single binary.
When only a forward N-gram is specified by this option and no backward N-gram specified by -nrl, Julius performs recognition with only the forward N-gram. The 1st pass will use the 2-gram entry in the given N-gram, and The 2nd pass will use the given N-gram, with converting forward probabilities to backward probabilities by Bayes rule. (Rev.4.0)
-nrl arpa_ngram_file
Since ARPA file often gets huge and requires a lot of time to load, it may be better to convert the ARPA file to Julius binary format by mkbingram. Note that if both forward and backward N-gram is used for recognition, they together will be converted to a single binary.
When only a backward N-gram is specified by this option and no forward N-gram specified by -nlr, Julius performs recognition with only the backward N-gram. The 1st pass will use the forward 2-gram probability computed from the backward 2-gram using Bayes rule. The 2nd pass fully use the given backward N-gram. (Rev.4.0)
-v dict_file
-silhead word_string -siltail word_string
-mapunk word_string
-iwspword
-iwspentry word_entry_string
-sepnum number
Grammar
Multiple grammars can be specified by repeating -gram and -gramlist. Note that this is unusual behavior from other options (in normal Julius option, last one will override previous ones). You can use -nogram to reset the grammars already specified before the point.
-gram gramprefix1[,gramprefix2[,gramprefix3,...]]
-gramlist list_file
-dfa dfa_file -v dict_file
-nogram
Isolated word
Dictionary can be specified by using -w and -wlist. When you specify multiple times, all of them will be read at startup. You can use -nogram to reset the already specified dictionaries at that point.
-w dict_file
-wlist list_file
-nogram
-wsil head_sil_model_name tail_sil_model_name sil_context_name
User-defined LM
-userlm
Misc. LM options
-forcedict
This section is about options for acoustic model, feature extraction, feature normalizations and spectral subtraction.
After -AM name, an acoustic model and related specification should be written. You can use multiple AMs trained with different MFCC types. For GMM, the required parameter condition should be specified just as same as AMs after -AM_GMM.
When using multiple AMs, the values of -smpPeriod, -smpFreq, -fsize and -fshift should be the same among all AMs.
Acoustic HMM
-h hmmdef_file
-hlist hmmlist_file
-tmix number
-spmodel name
-multipath
This function was a compilation-time option on Julius 3.x, and now becomes a run-time option. By default (without this option), Julius checks the transition type of specified HMMs, and enable the multi-path mode if required. You can force multi-path mode with this option. (rev.4.0)
-gprune {safe|heuristic|beam|none|default}
-iwcd1 {max|avg|best number}
max will apply the maximum likelihood of the same context triphones. avg will apply the average likelihood of the same context triphones. best number will apply the average of top N-best likelihoods of the same context triphone.
Default is best 3 for use with N-gram, and avg for grammar and word. When this AM is shared by LMs of both type, latter one will be chosen.
-iwsppenalty float
-gshmm hmmdef_file
-gsnum number
Speech analysis
Only MFCC feature extraction is supported in current Julius. Thus when recognizing a waveform input from file or microphone, AM must be trained by MFCC. The parameter condition should also be set as exactly the same as the training condition by the options below.
When you give an input in HTK Parameter file, you can use any parameter type for AM. In this case Julius does not care about the type of input feature and AM, just read them as vector sequence and match them to the given AM. Julius only checks whether the parameter types are the same. If it does not work well, you can disable this checking by -notypecheck.
In Julius, the parameter kind and qualifiers (as TARGETKIND in HTK) and the number of cepstral parameters (NUMCEPS) will be set automatically from the content of the AM header, so you need not specify them by options.
Other parameters should be set exactly the same as training condition. You can also give a HTK Config file which you used to train AM to Julius by -htkconf. When this option is applied, Julius will parse the Config file and set appropriate parameter.
You can further embed those analysis parameter settings to a binary HMM file using mkbinhmm.
If options specified in several ways, they will be evaluated in the order below. The AM embedded parameter will be loaded first if any. Then, the HTK config file given by -htkconf will be parsed. If a value already set by AM embedded value, HTK config will override them. At last, the direct options will be loaded, which will override settings loaded before. Note that, when the same options are specified several times, later will override previous, except that -htkconf will be evaluated first as described above.
-smpPeriod period
This option corresponds to the HTK Option SOURCERATE. The same value can be given to this option.
When using multiple AM, this value should be the same among all AMs.
-smpFreq Hz
When using multiple AM, this value should be the same among all AMs.
-fsize sample_num
This option corresponds to the HTK Option WINDOWSIZE, but value should be in samples (HTK value / smpPeriod).
When using multiple AM, this value should be the same among all AMs.
-fshift sample_num
This option corresponds to the HTK Option TARGETRATE, but value should be in samples (HTK value / smpPeriod).
When using multiple AM, this value should be the same among all AMs.
-preemph float
This option corresponds to the HTK Option PREEMCOEF. The same value can be given to this option.
-fbank num
This option corresponds to the HTK Option NUMCHANS. The same value can be given to this option. Be aware that the default value not the same as in HTK (22).
-ceplif num
This option corresponds to the HTK Option CEPLIFTER. The same value can be given to this option.
-rawe , -norawe
This option corresponds to the HTK Option RAWENERGY. Be aware that the default value differs from HTK (enabled at HTK, disabled at Julius).
-enormal , -noenormal
This option corresponds to the HTK Option ENORMALISE. Be aware that the default value differs from HTK (enabled at HTK, disabled at Julius).
-escale float_scale
This option corresponds to the HTK Option ESCALE. Be aware that the default value differs from HTK (0.1).
-silfloor float
This option corresponds to the HTK Option SILFLOOR.
-delwin frame
This option corresponds to the HTK Option DELTAWINDOW. The same value can be given to this option.
-accwin frame
This option corresponds to the HTK Option ACCWINDOW. The same value can be given to this option.
-hifreq Hz
This option corresponds to the HTK Option HIFREQ. The same value can be given to this option.
-lofreq Hz
This option corresponds to the HTK Option LOFREQ. The same value can be given to this option.
-zmeanframe , -nozmeanframe
-usepower
Normalization
Julius can perform cepstral mean normalization (CMN) for inputs. CMN will be activated when the given AM was trained with CMN (i.e. has "_Z" qualifier in the header).
The cepstral mean will be estimated in different way according to the input type. On file input, the mean will be computed from the whole input. On live input such as microphone and network input, the ceptral mean of the input is unknown at the start. So MAP-CMN will be used. On MAP-CMN, an initial mean vector will be applied at the beginning, and the mean vector will be smeared to the mean of the incrementing input vector as input goes. Options below can control the behavior of MAP-CMN.
-cvn
-vtln alpha lowcut hicut
-cmnload file
-cmnsave file
-cmnupdate -cmnnoupdate
-cmnmapweight float
Front-end processing
Julius can perform spectral subtraction to reduce some stationary noise from audio input. Though it is not a powerful method, but it may work on some situation. Julius has two ways to estimate noise spectrum. One way is to assume that the first short segment of an speech input is noise segment, and estimate the noise spectrum as the average of the segment. Another way is to calculate average spectrum from noise-only input using other tool mkss, and load it in Julius. The former one is popular for speech file input, and latter should be used in live input. The options below will switch / control the behavior.
-sscalc
-sscalclen msec
-ssload file
-ssalpha float
-ssfloor float
Misc. AM options
-htkconf file
This section contains options for search parameters on the 1st / 2nd pass such as beam width and LM weights, configurations for short-pause segmentation, switches for word lattice output and confusion network output, forced alignments, and other options relating recognition process and result output.
Default values for beam width and LM weights will change according to compile-time setup of JuliusLib , AM model type, and LM size. Please see the startup log for the actual values.
1st pass parameters
-lmp weight penalty
-penalty1 penalty
-b width
The default value is dependent on acoustic model type: 400 (monophone), 800 (triphone), or 1000 (triphone, setup=v2.1)
-nlimit num
-progout
-proginterval msec
2nd pass parameters
-lmp2 weight penalty
-penalty2 penalty
-b2 width
-sb float
-s num
-m count
-n num
-output num
-lookuprange frame
-looktrellis
Short-pause segmentation / decoder-VAD
When compiled with --enable-decoder-vad, the short-pause segmentation will be extended to support decoder-based VAD.
-spsegment
When compiled with --enable-decoder-vad, this option enables decoder-based VAD, to skip long silence.
-spdur frame
-pausemodels string
-spmargin frame
This option will be valid only if compiled with --enable-decoder-vad.
-spdelay frame
This option will be valid only if compiled with --enable-decoder-vad.
Word lattice / confusion network output
-lattice , -nolattice
-confnet , -noconfnet
-graphrange frame
-graphcut depth
-graphboundloop count
-graphsearchdelay , -nographsearchdelay
Multi-gram / multi-dic recognition
-multigramout , -nomultigramout
Forced alignment
-walign
-palign
-salign
Misc. search options
-inactive
-1pass
-fallback1pass
-no_ccd , -force_ccd
-cmalpha float
-iwsp
-transp float
-demo
ALSADEV
AUDIODEV
LATENCY_MSEC
For examples of system usage, refer to the tutorial section in the Julius documents.
Note about jconf files: relative paths in a jconf file are interpreted as relative to the jconf file itself, not to the current directory.
julian(1), jcontrol(1), adinrec(1), adintool(1), mkbingram(1), mkbinhmm(1), mkgsmm(1), wav2mfcc(1), mkss(1)
http://julius.sourceforge.jp/en/
Julius normally will return the exit status 0. If an error occurs, Julius exits abnormally with exit status 1. If an input file cannot be found or cannot be loaded for some reason then Julius will skip processing for that file.
There are some restrictions to the type and size of the models Julius can use. For a detailed explanation refer to the Julius documentation. For bug-reports, inquires and comments please contact julius-info at lists.sourceforge.jp.
Copyright (c) 1991-2008 Kawahara Lab., Kyoto University
Copyright (c) 1997-2000 Information-technology Promotion Agency, Japan
Copyright (c) 2000-2008 Shikano Lab., Nara Institute of Science and Technology
Copyright (c) 2005-2008 Julius project team, Nagoya Institute of Technology
Rev.1.0 (1998/02/20)
Development by Akinobu LEE (Kyoto University)
Rev.1.1 (1998/04/14), Rev.1.2 (1998/10/31), Rev.2.0 (1999/02/20), Rev.2.1 (1999/04/20), Rev.2.2 (1999/10/04), Rev.3.0 (2000/02/14), Rev.3.1 (2000/05/11)
Rev.3.2 (2001/08/15), Rev.3.3 (2002/09/11), Rev.3.4 (2003/10/01), Rev.3.4.1 (2004/02/25), Rev.3.4.2 (2004/04/30)
Rev.3.5 (2005/11/11), Rev.3.5.1 (2006/03/31), Rev.3.5.2 (2006/07/31), Rev.3.5.3 (2006/12/29), Rev.4.0 (2007/12/19), Rev.4.1 (2008/10/03)
From rev.3.2, Julius is released by the "Information Processing Society, Continuous Speech Consortium".
The Windows DLL version was developed and released by Hideki BANNO (Nagoya University).
The Windows Microsoft Speech API compatible version was developed by Takashi SUMIYOSHI (Kyoto University).
02/11/2009 |