By Dave Michels June 4, 2013 1 Comments
Dave Michels JPG
What? by Dave Michels

Over the past decade we’ve made some tremendous progress with audio quality. Most notably the G.722 and the Opus codecs that have made great strides toward the elimination of the narrow tunnel-tone associated with telephony audio.

More recently Dolby and app-based conferencing solutions such as Voxeet introduced spatial audio to conferencing. This is an interesting concept that allows each speaker to hold a unique virtual position within one’s head. The end result is far more than just fancy stereo. Spatial audio provides the brain additional interpretive information that makes audio conferencing more effective. The downside to spatial audio is it requires stereo headsets – at least for the full effect. Stereo headsets are increasing in popularity thanks to iPads, but single ear headsets remain significantly more popular.

One aspect of audio conferencing that’s highly neglected is room equipment. There’s been very little progress in terms of audio pick-up, since Polycom raised the bar with its saucer device about a decade ago. The general solution beyond noise cancelling/beam forming microphones is to simply add more microphones to rooms. That’s both costly but only marginally effective. Why is it that a single pair of ears on the far end of the table can still pick up more details than a remote party connected over broadband IP networks?

There is an answer to this – but it hasn’t quite been released yet. Dolby gave me a sneak-peek of what’s going on in their lab with next-generation audio conferencing equipment. The unit I saw was a pre-production “room capture device.” This particular unit was round and powered by power over Ethernet (POE). It could be used as a generic SIP G.722 endpoint – but that’s not the point. Using it that way offers comparable results to alternative solutions. The device is primarily designed to be used with Dolby’s advanced audio processing software.

Dolby’s still working out its go-to-market approach. Most likely it won’t be sold as a product, but instead as a service from a conferencing provider that markets the service and equipment together. Dolby has already struck one such deal with BT regarding spatial audio conferencing. The advanced Dolby software will likely run in the cloud at the conference provider’s location, with the provider to establish criteria to provide the hardware. A new type of hybrid audio conferencing service that requires specific hardware could be coming.

The current model just doesn’t provide an in-room-like experience, especially with large noisy rooms. Dolby intends to raise the bar with advanced processing and spatial audio techniques that captures audio potentially better than a pair of in-room ears. For example, when someone leaves the table to draw on the board, they often talk to the board as well. In-room attendees can still hear this, but the remote parties don’t. In my demo, the person talking to the board was clearly audible – this was from a single device centered on a table in a large room. The microphones could hear everything – and often filtered out what they determined to be noise. If someone speaking can be heard in the room, they can be heard remotely as well.

This type of solution will change audio conferencing as we know it. I look forward to its arrival.  


1 Responses to "What?" - Add Yours

Michael Graves 6/4/2013 3:14:34 PM


A few things worth noting;

(1) G.722 is vintage. Not of this decade, or even the prior one or the one before that. It harkens from the dark days of the 1980s. It is essentially the Model T of wideband audio codecs.

(2) Spatial conferencing has been tried before, a number of times. Why have you not heard of it? It flopped.

(3) Stereo video conferencing is common-place. Just as most commercially produced music is in stereo, video conference systems seem to have leveled off with such capabilities. The HDX-4500 on my desk does great stereo audio, but it can be hard to tell sometimes given the source material.

(4) Surround sound in the common forms of 5.1 and 7.1 isn't exactly what you think. It's engineered to give the producer the ability to control sound from the sides and rear in only very general ways. It lacks a height dimension. It is in essence the-best-that-could-be-done-for-the-money-and-in-a-fashion-that-would-allow-us-to-actually-install-it-in-the-TV-room-without-offending-the-wife.

(5) If people must wear headphones to appreciate the acoustic effects no-one will likely use it. Hard enough to get people to wear a headset at all when using a soft phone. People rave about how great the echo cancellation is in XYZ soft phone, or the quality of MacBook audio. Convenience trumps quality...always.

(6) How do you transpose the experience of Voxeet into a conference room? Does everyone wear a headset? Anyone?

To Leave a Comment, Please Login or Register

UC Alerts
UC Blogs
UC ROI Tool RSS Feeds