Google Expresso Application Real Time Optimization for Internet Peering
I was at the Open Network Summit (ONS) in Santa Clara last month, speaking and watching some of the keynotes when Google presented something very interesting. Google’s Amin Vahdat, Fellow & Technical Lead for Networking, presented Google implementation of SDN for their cloud in a keynote. Here's the kicker: as Amin was going thru Google’s SDN roadmap on what they have implemented he announced their latest SDN project called Expresso which optimizes networks via quality signals from the application (please see the diagram below and it starts around the 10-minute mark on the You Tube video).
Amin states the goal of Google’s Expresso is to use application signals from their servers, determine how the network is affecting their performance, and have the network self-optimize itself to deliver a better user experience. Does this sound familiar? I have been pioneering and writing about this coming trend since 2012. While I have been specifically writing about Real Time Media (RTM) SDN, Google has harnessed the same idea for any general TCP application loaded onto their servers that utilizes the Internet. What they have done is disaggregate an external BGP routing construct into a very clever concept that has been implemented into production across many metros around the world. They claim it has been in production for over two years and that if you are using Google there is a pretty good chance that Expresso is responsible for carrying some of the traffic.
So basically, Expresso works by taking application signals from their servers and feed it into a SDN controller, which in turn further applies analytics to alter the session’s egress path out towards the Internet BGP peers (please see below).
So here is how I see it works (please refer to the diagram below).
- All their metro data centers connect to many external peering routers using eBGP.
- However, on Google they have disaggregated the eBGP border router into the control and data plane. The label switch fabric is responsible for the data plane forwarding and the BGP speaker is responsible for actually communicating to the remote BGP peer. For scale out in any data center, the label switch fabric can be implemented in a leaf/spine architecture. This is a typical hyper scale data center design that is being used everywhere. For more information please see the white paper I co-authored for the MEF (see pages 14 and 15).
- They have servers in these data centers with specialized NICs that terminate TCP connections. These servers generally serve up video or other content that belongs to Google.
- These specialized line rate NICs, which are on each server, have the ability to insert labels on data sessions which will be used by the label switch fabric to egress the packets to the correct eBGP peer (it will of course strip off the label as it leaves the fabric to the eBGP peer).
- The global controller programs the metro local controller to instruct the labeled switch fabric, that for each label seen, what port to egress out. So in essence the label switch fabric is really dumb and has no knowledge on what to do or keep any forwarding tables except its label forwarding rules (ie: no need for a massive Internet Forwarding Information Base FIBs in the switching fabric).
- The servers send summaries of how flows are behaving in real time to the application signal server which aggregates and normalizes the signals.
- The global controller gets fed signals from the application signal server and decides, looking across all the other metro local controllers, if path “x” is better than path “y.”
- In real time, the global controller is trying to figure out what best Internet path will deliver the best quality of experience to the user. All of this completely evolves the multi-decade long inadequacy of BGP and internetworking with the ability to sense congestion, etc., and reroute on a per flow basis.
In conclusion, what Google has accomplished is remarkable and moves us into the modern era of machines automating networks thru software abstraction. No more will we have to live with the decades old internetworking paradigm of the 80s. Finally, someone gets it!