Howto:Customizing Flite TTS Voices
|This article is a stub. You can help the wiki by|
What we'd like to have is exemplified by https://www.youtube.com/watch?v=qcvTHQxBcLw (after 28 seconds or so there's a voice exchange). There's little static, but the voices sound flat and perhaps a bit metallic. I'm not sure what exactly was in the Shuttle vid you have seen - I have no simulation of static in, it used to be that the aircraft com instruments produced some when a transmitting station was overflown (frequency match but distance too far), but this should be gone now as the Shuttle doesn't use a standard aircraft com stack. Only the minority of voice callouts done by the system are intended to simulate 'real' communication with mission control, many are advisory messages, limit warnings or failure notices. There is filtering done, so we know for every message what it is. It would be cool to get a list of real messages and record them with the right distortion - but unfortunately some of them need to be assembled dynamically (like when you get burn parameters transmitted). There's still a siable chunk of standard messages and callouts which could be pre-recorded.
Thorsten stated that he'd very much like to have improvements in the sound, especially in the group of real voice com, but doen't really know how to get there all the way. one rather obvious problem with TTS is that it doesn't really know how the various acronyms are pronounced - so once one decides to go with files, one can start with proper files.
He is planning on compiling a list of real Shuttle phraseology - if we do a recording job, let's do it properly. He would like to improve the sound immersion very much, let's see how this goes.
we're going to implement the solution that's better (in this case, leads to better immersion). Just like it has been done in practically any other area where native FG solutions were not good enough (failure modeling, interaction with navaids, thruster flames, co-orbiting objects...)
we know for each string whether it's supposed to come from mission control (or the pilot) or whether it is an advisory from the simulation software and we can selectively suppress the different groups or dispatch them via different voices.
right now speech is dynamic in that sentences are assembled through text and converted to audio. So really either then filter that output (I'm talking about an audio filter, just so we're clear), or pre-record words or phrases which can then be manipulated and triggered by the system as needed. 
Thorsten hasn't been able to make too much headway with listing the standard voice callouts, but he'd propose to start with them, record and process them to sound 'real', and then see what we can do about dynamically assembled pieces. Probably simply recording the numbers and a few text blocks is all it takes.
Flite Voice Creation
regarding the larger issue of creating an aviation-specific voice that is aware of certain vocabulary/terminology - I'd suggest to google for "FLITE TTS CREATING NEW VOICE". The most interesting aspect covering the aviation/spacecraft use-case is actually a limited domain voice:
There is a bit of setup overhead involved here, but it works basically like building voices for Festival.
The thing to keep in mind here is that once the setup is complete, you have something that can also be scripted/automated - e.g. by feeding existing words into it (e.g. from public domain/FAA sources). If that is something that people would be interested in exploring, I would be interested in helping document the requirements and exact steps in the wiki, because I do think that this could be pretty useful in general - i.e. a TTS engine - however, I this clearly isn't meant to distract people from contributing to the shuttle in a more direct form. My thinking however is that 30 hours spent on documenting and possibly scripting/automating the processes for the FlightGear use-case pecifically may translate into huge benefits in the near term - including the shuttle project specifically. As a matter of fact, with a bit of python scripting, we could gather real audio coverage and map that to common terminology and have a tailored Flite TTS voice auto-generated. To see for yourself, I'd suggest to check out this: http://www.festvox.org/ldom/ldom_time.html