MENU
Design
TTS & MMB
Return to Home
|
Text-to-Speech
TTS has a big potential of the market's five major segments:
education, disabled, computer interface, consumer and telecommunications.
What is Text-to-Speech?
Text-to-speech is a process through which text is rendered as digital
audio and then "spoken." Most text-to-speech engines can be
categorized by the method that they use to translate phonemes into audible
sound.
Why Use Text-to-Speech?
Text-to-speech should be used to audibly communicate information to the
user, when digital audio recordings are inadequate. Generally, text-to-speech
is better than audio recordings when:
- Audio recordings are too large to store on disk or expensive to record.
- Audio recording is impossible because the application doesn't know
ahead of time what it will speak.
Text-to-speech also offers a number of benefits. In general, text-to-speech
is most useful for short phrases or for situations when prerecording is
not practical. Text-to-speech has the following practical uses:
- Reading dynamic text. Text-to-speech is useful for phrases that vary
too much to record and store using all possible alternatives. For example,
speaking the time is a good use for text-to-speech, because the effort
and storage involved in concatenating all possible times is manageable.
- Proofreading. Audible proofreading of text and numbers helps the
user catch typing errors missed by visual proofreading.
- Conserving storage space. Text-to-speech is useful for phrases that
would occupy too much storage space if they were prerecorded in digital-audio
format.
- Notifying the user of events. Text-to-speech works well for informational
messages. For example, to inform the user that a print job is complete,
an application could say "Printing complete" rather than displaying
a message box and requiring the user to click OK. (This should be used
for noncritical notifications in case the user turns the computer's
sound off or is out of hearing range.)
- Providing audible feedback. Text-to-speech can provide audible feedback
when visual feedback is inadequate or impossible. For example, the user's
eyes might be busy with another task, such as transcribing data from
a paper document. Users that have low vision may rely on text-to-speech
as their sole means of feedback from the computer.
Games and Edutainment
Text-to-speech is useful in games and edutainment to allow the characters
in the application to "talk" to the user instead of displaying speech
balloons. Of course, it's also possible to have recordings of the speech.
Text-to-Speech Voice Quality Most text-to-speech
engines can render individual words successfully. However, as soon as
the engine speaks a sentence, it is easy to identify the voice as synthesized
because it lacks human prosody -- i.e., the inflection, accent, and timing
of speech.
|