Making Voice Mail More Manageable
UM and Speech Recognition Make Voice Messaging More Manageable
"Unified messaging" (UM) was always viewed as simply enabling message recipients to manage and retrieve both text and voice messages across device interfaces. Email text messages could be proactively delivered to or retrieved from any phone through text-to-speech technology, giving such email messages some of the real-time flexibility and convenience of voice mail.
Similarly, recipients of voice mail messages benefited from UM through integration with email desktop clients, where the screen interface was extended to show all the voice (and fax) messages in the user’s voice mailbox. This visual interface for voice mail enabled more efficient random access message retrieval, rather than the time consuming, sequential playback of messages that the Telephone User Interface (TUI) offered. Voice message content, however, still had to be listened to through traditional playback controls and important information content still had to be transcribed manually for practical use.
Speech recognition technology, however, has now matured enough to enable voice mail messages to benefit from convergence with email visual interfaces, similar to how email benefited from text-to-speech (TTS) retrieval of email over the phone. Now, voice message retrieval can be managed with the productivity efficiencies of text and visual interfaces, rather than a voice-oriented TUI. Using speech recognition to convert voice messages to text is starting to show up as new subscriber offerings by service providers such as CallWave and Vonage, so it won’t be long strongbefore this capability will make its way into enterprise UM by traditional communication technology providers and new players like SpinVox and TalkText,
Voice Message vs. Text Message Benefits to the Users
Voice mail systems provide important benefits to telephone callers by enabling them to leave a message if there is no answer or the line is busy. By recording a voice message, costs are minimized and message integrity is retained because it is not being transcribed or delivered through a third-party person.
While voice messages have been highly touted as being more "personal," providing more emotional content (tone of voice) and accuracy (name pronunciation) to the recipient, informational content is really not greater than person-to-person text messages. So, just because a caller uses the convenience of voice to create a message, it doesn’t necessarily mean that they want the message to be delivered in voice.
As a message originator, a caller should have the option of having a voice message they create in voice delivered as text, especially if it will contain data that has to be converted to text to be useful. As business users become more mobile and start using multimodal "smartphones" with visual interfaces, the practicality of creating a message with voice that gets delivered as text and vice versa will become viral.
By the same token, the option to transcribe voice messages into text can also be controlled by the recipient depending upon a number of factors pertinent to the recipient’s needs. So, although the caller still leaves a voice message, it gets delivered as text.
Pros and Cons of Converting a Voice Message to Text
The recipient of a voice message that has been converted to text will realize a number of productivity benefits including:
- Quickly scanning the message and selectively reading details of importance, rather than having to listen to the whole message
- Retrieving voice messages in noisy environments
- Embedding such converted messages in other forms of text messaging such as email, IM, and SMS
- Extending the ability to forward, reply, and add attachments to a voice message
- Facilitates searchable archiving of such messages
- Will also preserve original voice message as back up when needed for better understanding
To offset these benefits, there are some disadvantages to converting a voice message to text, including:
- A frequent caller may not include a name or phone number
- Some voice messages from cell phones may be unintelligible
- Speech recognition translation may not be uniformly accurate
- Foreign languages may not be accommodated
- When the recipient is mobile (e.g., driving a car), listening to a voice message will be preferable to looking at text. (In such cases, however, the text message can also be retrieved with text-to-speech capabilities.)
Reducing Enterprise Overhead for Voice Messages
Voice messaging has always paid a penalty in terms of system resources it consumes. These penalties include:
- Voice network bandwidth and priorities for real-time connectivity
- Voice message storage limitations
- Limit on caller message lengths
- Limit on duration of storage for retrieval
- Long retrieval time traffic required because of sequential playback, ambient noise, need to transcribe information, etc.
- More difficult to manage and search voice message content
Since the overhead for recording, transmitting, storing, and retrieving a voice message is much higher than a text version of the message, we do see benefits to both end user communication productivity and to the enterprise TCO from exploiting speech recognition for voice to text messaging. Such capabilities will also integrate nicely with new UC capabilities for integrating with real-time IM or initiating telephone call back responses to such messages.
Voice to text message conversion won’t solve all the problems of efficient business communications and is highly dependent on speech recognition performance. The more critical the need for accurate and personalized contacts, the less likely that shortcuts will be acceptable.
What Do You Think?
Send your comments to me at firstname.lastname@example.org.