Do MCUs Make Sense?

By Dave Michels May 22, 2013 9 Comments

The MCU, or multi-point control unit, has been the heart of modern videoconferencing. Historically, it’s the only way for more than two sites to join a single videoconference. MCUs are typically specialized DSP-based systems often charged by port capacity.

MCUs are more than a switch, but rather an active participant in the conversation – at least technically. The MCU decodes the video streams and then encodes an appropriate mix of video streams for each participant into a single stream. All this decoding/encoding, or transcoding, makes MCUs very CPU intensive. Additionally, MCUs also provide gateway services that improve interoperability between video systems.

But there are several factors that threaten the future of MCU hardware. It’s becoming questionable if they still make sense.

Multi-Stream Endpoints. The traditional MCU architecture assumes a highly capable multi-stream MCU and simpler single-stream endpoints. Like many other aspects of IP networking, more capabilities are being pushed to the endpoints. Multi-stream endpoints means that each video session gets sent to each endpoint, effectively eliminating the need for an MCU to recode all the content into a single stream. This is largely enabled by Scalable Video Coding (SVC), which allows a stream to be sent in layers. Often the active speaker is larger in HD, and other participants are miniature, thus requiring less bandwidth. Multi-stream capable endpoints and SVC allow local control over the size/quality of each stream. This approach puts more emphasis on the endpoints and less on the infrastructure.
Software-based MCU. Moore’s Law has made it possible for software to now perform many functions previously performed by hardware. The key advantage of software is it is easier to upgrade. That’s important because things are changing quickly. The MCU role may still be necessary, but perhaps not the DSP specialized hardware that gives it a shorter useful life. By separating the hardware, MCUs can be dynamically distributed across data centers and even run on virtual infrastructure.
Appliances. That might not be the right term, but more and more devices and applications support entry level MCU functionality. IBM Sametime will soon offer built-in MCU services. Mitel’s UC360 endpoint has a built-in four-port video bridge. Microsoft Lync now supports limited MCU functions. Video isn’t always about three-screen conference rooms, and increasingly more about on-the-go personal communications. These shifts in processing reduce the requirements for dedicated platforms.
Cloud. There are an increasing number of providers and solutions for cloud-based MCU services. The cloud solutions address gateway, transcoding, and bandwidth requirements needed for video. As with many cloud-scale solutions, the prices are compelling – often lower than enterprises using internal resources. Many cloud operators also offer ongoing upgrades, wider means of interoperability, and a low cost means to spike utilization.

MCUs will remain a mainstay for the enterprise for quite some time supporting existing systems, but their appeal as new devices is thinning. However, MCUs offer no per-minute charges, no scheduling conflicts, and potentially greater security over private networks. Customers that already have major MCU investments should maintain them, but evaluate technology shifts and their long term plans. Video usage is growing, particularly among mobile devices, many of which are capable of supporting multi-stream sessions. Sites with major investments may want to consider a hybrid approach between premises and cloud solutions to accommodate increased demand.

Perhaps the biggest reason organizations continue to purchase hardware-based MCUs is to support an installed base of equipment; self-perpetuating logic. Video systems are changing quickly: SVC, WebRTC, VP8/VP9, H.265 are examples of reasons to question long-term hardware commitments that support today’s technologies.

9 Responses to "Do MCUs Make Sense?" - Add Yours

Kevin Kieller 5/22/2013 7:28:28 PM

Good points Dave.

Additionally, as more participants are mobile and thus on smaller-screened devices (smartphones, tablets) ...

1. The number of pixels requiring encoding/decoding diminishes; and,

2. Less screen real-estate makes demand for "Hollywood squares" presentation layout lessen.

Both of these factors increase likelihood that a software-based MCU on standard server hardware can do the job.

Kevin

michael vernick 5/23/2013 8:48:02 AM

Some comments.

Scalable Resolution (spatial) SVC is not being accepted by the industry. Most SVC implementations are Temporal SVC only. Vidyo is the only company I know that is implementing Spatial SVC. Temporal SVC can be decoded by a standard AVC decoder. Spatial SVC cannot be decoded by a standard AVC decoder. For multi-resolutions, companies are implementing Multi-Stream AVC.

Programmable DSPs are coming down in price like everything else. So hardware MCUs will come down in price.

When using a software router (no transcode) instead of an MCU, each endpoint receives all the incoming streams. This increases network bandwidth and the processing requirements of an endpoint. There are environments where this will work extremely well, but in others, keeping the processing in the network is a better idea.

michael vernick 5/23/2013 8:50:10 AM

Update to above: Spatial SVC cannot be decoded by a standard AVC decoder. Not exactly true. The base layer of a Spatial SVC stream can be decoded by an AVC decoder, not any of the higher spatial layers (resolutions).

Keisuke Hashimoto 5/23/2013 5:04:07 PM

It will not be a zero some game between specialized DSP based hardware MCU and as opposed to software MCU that runs on itel chip. There is not one-size-fits-all-kind product in the market. The market will probably need both as we see a variety of end user's needs out there.

With the advent of software MCU, the cost has dropped in terms of purchase which has opened a new opportunity for those who could not own standalone MCU before. This market segment is where hardware MCU may not be able to address while software MCU that runs on intel based CPU can.

We probably need to make it clear the difference in terms of merits and demerits between hardware and software, and then explain well to the potential buyers(end users) so they can make the right decision based on their requirements. Arguments limited to who will win or lose will only cause confusion among the potential buyers. This way will not benefit the industry as a whole.

Russell Bennett 5/23/2013 8:11:23 PM

The implementer has 3 main choices in a video conference:

1. Mixing - i.e. take all the camera broadcasted streams and turn them into one inbound stream for the client screens to display. This is a bandwidth optimization at the expense of MCU CPU resources and it also introduces latency.
2. Routing - i.e. just route all the camera broadcasted streams to every participant. This requires no MCU, but consumes incremental bandwith for every participant and commensurate client CPU to process all the inbound streams. This also minimizes latency but precludes 'active speaker' as there is no central element to manage 'floor control'.
3. 'Combo' - i.e. the MCU is mainly a router, but implements 'floor control' for the active speaker, allowing that person's outbound stream to have higher resolution (i.e. to use more bandwidth) and the listeners' streams to consume less bandwidth. This is a middle of the road solution as it optimizes bandwidth and CPU at the MCU and the client.

Typically the 'legacy' video conference vendors implemented option 1 and they tend to be HW based; the free public service providers implemented option 2 which was protocol-based and Microsoft (not aware of any others) implemented option 3 which was SW based. There is a 4th option, "barbelling" (or localized MCUs talking to other localized MCUs), that has been discussed, but not implmented, as far as I am aware.

The various implementations have been mostly vendor-proprietary to-date. Similarly, H.264 SVC implementations that allow bandwidth to be optimized for the display capabilities of the client is also vendor-proprietary.

As has been mentioned above, most of the factors influencing each choice are economic factors, but economics change over time. The biggest economic change is the advent of cloud computing which allows large pools of CPU capacity to be shared on a flexible basis among large numbers of users. This is

Russell Bennett 5/23/2013 8:16:03 PM

[I must have used up my char allowance]

The result of this has been the advent of the cloud video confernece services: e.g. Bluejeans, Vidtel, Vidyo.

Kevin Kieller 5/26/2013 5:20:02 PM

Russell, nice summary of different video implementation approaches.

Tsahi Levent-Levi 6/4/2013 12:09:02 PM

I think that the data center trends should be looked at as well. We are shifting into cloud and commodity hardware solutions. This means to me that the days of HW MCUs are numbered - especially when x86 chips are including more and more video coding capabilities.

H Fan 6/3/2015 8:43:24 PM

@russell just one question, what is the difference betw opt2 and opt4. Both looks like p2p situation.