RU_TTS_TRANSFER(3) Library Functions Manual RU_TTS_TRANSFER(3)

ru_tts_transfer - transfer specified Russian text to speech

#include <ru_tts.h>

typedef int (*ru_tts_callback)(void *buffer, size_t size, void *user_data);

void ru_tts_transfer(ru_tts_conf_t *config, char *text, void *wave_buffer, size_t wave_buffer_size, ru_tts_callback wave_consumer, void *user_data);

void ru_tts_config_init(ru_tts_conf_t *config);

The ru_tts_transfer() function transfers text pointed by text argument into digitized sound in the raw linear signed 8-bit 10 kHz format. The source text should be represented by zero-terminated string containing Russian text in koi8-r charset. Symbols ‘+’ and ‘=’ immediately after a vowel are treated as strong and weak stress sign respectively. The resulting data are fed to the callback function referenced by wave_consumer argument chunk by chunk via buffer specified by wave_buffer and wave_buffer_size arguments. The user_data argument is passed to the callback as a pointer to any additional data specified by user.

Various speech synthesis control options can be passed via ru_tts_conf_t data structure pointed by the config argument that contains the following fields:


typedef struct {
    int speech_rate;
    int voice_pitch;
    int intonation;
    int general_gap_factor;
    int comma_gap_factor;
    int dot_gap_factor;
    int semicolon_gap_factor;
    int colon_gap_factor;
    int question_gap_factor;
    int exclamation_gap_factor;
    int intonational_gap_factor;
    int flags;
} ru_tts_conf_t;

This structure should be initialized by the ru_tts_config_init() function that fills it by the default values.

All numeric values represent a percentage of the corresponding parameter normal level. Initially they are set to 100. Each parameter has its own reasonable value range, but out of range values do not cause any problem since they are treated as the nearest boundary of the acceptable range.

Speech rate in percents of the normal level. Reasonable value range is from 20 up to 500.
Voice pitch in percents of the normal level. Reasonable value range is from 50 up to 300.
Voice pitch variation range. It can vary from 0 (absolutely monotonic speech) up to 140 (a bit more expressive than normal).
Percentage factor applied to all interclause gaps. Its lower boundary is 0 that means no gaps at all. The maximum proportionally depends on the speech rate. On normal rate it is approximately 312.
Relative duration of the gap implied by comma encountering. Reasonable value range is from 0 up to 750.
Relative duration of the gap implied by dot encountering. Reasonable value range is from 0 up to 500.
Relative duration of the gap implied by semicolon encountering. Reasonable value range is from 0 up to 600.
Relative duration of the gap implied by colon encountering. Reasonable value range is from 0 up to 600.
Relative duration of the gap implied by question mark encountering. Reasonable value range is from 0 up to 375.
Relative duration of the gap implied by exclamation mark encountering. Reasonable value range is from 0 up to 300.
Relative duration of purely intonational gaps not caused by a punctuation. Reasonable value range is from 0 up to 1000.
Additional flags. The following flag constants being bitwise-or'd may be used here.
Treat point inside a number as decimal separator. Initially this flag is set.
Treat comma inside a number as decimal separator. Initially this flag is set.
Use alternative (female) voice instead of the default (male) one. Initially this flag is not set.

It is suggested that the user provided callback function takes further responsibility on the generated data. It may play it immediately or store somewhere or do whatever it is designed for. This function should return 0 in usual circumstances. Non-zero return value causes immediate transfer stop.

Igor B. Poretsky <poretsky@mlbox.ru>.

January 11, 2023