Flite is a library that we expected will be embedded into other applications. Included with the distribution is a small example executable that allows synthesis of strings of text and text files from the command line.
The example flite binary may be suitable for very simple applications. Unlike Festival its start up time is very short (less that 25ms on a PIII 500MHz) making it practical (on larger machines) to call it each time you need to synthesize something.
flite TEXT OUTPUTTYPE
If TEXT
contains a space it is treated as a string of text and
converted to speech, if it does not contain a space TEXT
is
treated as a file name and the contents of that file are converted to
speech. The option -t
specifies TEXT
is to be treat
as text (not a filename) and -f
forces treatment as a file.
Thus
flite -t hello
will say the word "hello" while
flite hello
will say the content of the file `hello'. Likewise
flite "hello world."
will say the words "hello world" while
flite -f "hello world"
will say the contents of a file `hello world'. If no argument is specified text is read from standard input.
The second argument OUTPUTTYPE
is the name of a file the output
is written to, or if it is play
then it is played to the audio
device directly. If it is none
then the audio is created but
discarded, this is used for benchmarking. If it is stream
then
the audio is streamed through a call back function (though this is not
particularly useful in the command line version. If OUTPUTTYPE
is omitted, play
is assumed. You can also explicitly set the
outputtype with the -o
flag.
flite -f doc/alice -o alice.wav
All the voices in the distribution are collected into a single simple
list in teh global variable flite_voice_list
. You can select a
voice from this list from the command line
flite -voice awb -f doc/alice -o alice.wav
And list which voices are currently supported in the binary with
flite -lv
The voices which get linked together are those listed in the
VOICES
in the `main/Makefile'. You can change that as you
require.
Each voice in Flite is held in a structure, a pointer to which is
returned by the voice registration function. In the standard
distribution, the example diphone voice is cmu_us_kal
.
Here is a simple C program that uses the flite library
#include "flite.h" register_cmu_us_kal(); int main(int argc, char **argv) { cst_voice *v; if (argc != 2) { fprintf(stderr,"usage: flite_test FILE\n"); exit(-1); } flite_init(); v = register_cmu_us_kal(NULL); flite_file_to_speech(argv[1],v,"play"); }
Assuming the shell variable FLITEDIR is set to the flite directory the following will compile the system (with appropriate changes for your platform if necessary).
gcc -Wall -g -o flite_test flite_test.c -I$FLITEDIR/include -L$FLITEDIR/lib -lflite_cmu_us_kal -lflite_usenglish -lflite_cmulex -lflite -lm
Although, of course you are welcome to call lower level functions, there a few key functions that will satisfy most users of flite.
void flite_init(void);
cst_wave *flite_text_to_wave(const char *text,cst_voice *voice);
float flite_file_to_speech(const char *filename, cst_voice *voice, const char *outtype);
play
or
none
. If the feature file_start_position
with an
integer, that point is used as start position in the file to be synthesized.
float flite_text_to_speech(const char *text, cst_voice *voice, const char *outtype);
text
, with the given
voice. outtype
may be a filename where the generated waveform is
written to, or "play" and it will be sent to the audio device, or
"none" and it will be discarded. The return value is the
number of seconds of speech generated.
cst_utterance *flite_synth_text(const char *text,cst_voice *voice);
cst_utterance *flite_synth_phones(const char *phones,cst_voice *voice);
cst_voice *flite_voice_select(const char *name);
name
. Will retrurn
NULL
if there is not match, if name == NULL
then the
first voice in the voice list is returned.
In 1.4 support was added for streaming synthesis. Basically you may provided a call back function that will be called with waveform data immediately when it is available. This potentially can reduce the dealy bewteen sending text to the synthesized and having audio available.
The support is through a call back function of type
int audio_stream_chunk(const cst_wave *w, int start, int size, int last, void *user)
If the utterance feature streaming_info
is set (which can
be set in a voice or in an utterance). The LPC or MLSA resynthesis
functions will call the provided function as buffers become available.
The LPC and MLSA waveform synthesis functions are used for diphones,
limited domain, unit selection and clustergen voices. Note explicit
support is required for streaming so new waveform synthesis function
may not have the functionality.
An example streaming function is provided in
`src/audio/au_streaming.c' and is used by the example flite main
program when stream
is given as the playing option. (Though in
the command line program the function it isn't really useful.)
In order to use streaming you must provide call back function in your particualr thread. This is done bay adding features to the voice in your thread. Suppose your function was declrared as
int example_audio_stream_chunk(const cst_wave *w, int start, int size, int last, void *user)
You can add this function as the streaming function through the statement
cst_audio_streaming_info *asi; ... asi = new_audio_streaming_info(); asi->asc = example_audio_stream_chunk; feat_set(voice->features, "streaming_info", audio_streaming_info_val(asi));
You may also optionally include your own pointer to any information you additionally want to pass to your function. For example
typedef my_callback_struct { cst_audiodev *fd; int count; }; cst_audio_streaming_info *asi; ... mcs = cst_alloc(my_callback_struct,1); mcs->fd=NULL; mcs->count=1; asi = new_audio_streaming_info(); asi->asc = example_audio_stream_chunk; asi->userdata = mcs; feat_set(voice->features, "streaming_info", audio_streaming_info_val(asi));
Go to the first, previous, next, last section, table of contents.