Describes how SDN is being subsumed into the Software Defined Data Center (SDDC). The SDDC integrates networking, storage and server resources in an autonomous unit - resulting in a "cloud operating system."
The concept of SDDC has profound implications for everyone in the field of networking, from equipment vendors to network engineering and operations. To illustrate the impact, this article describes how SDDC is driving change in the field of network performance monitoring. However, similar changes are also taking place in every area of networking and I hope readers will see parallels to the changes affecting their own areas of interest.
SDDC impact on network performance management
Since 1999, InMon Corporation has been developing network traffic analysis software that provides graphical and reporting tools to support network operators and for most of our history, network architectures were well established and we made incremental improvements to our software in response to incremental changes in customer requirements.
But more recently, the requirements of cloud computing have been driving major changes in the way we build software since the consumer of network performance analytics is no longer a human operator, but is instead the cloud orchestration software. On the measurement side, cloud computing fundamentally alters network architectures by shifting the network edge from physical switches to software virtual switches running on the servers, presenting disruptive organizational and implementation challenges for network monitoring software customers and developers. The rapid adoption of cloud computing has made this an abrupt and complex transition.
InMon has taken a two pronged approach to addressing the challenge of cloud performance monitoring:
Develop network analytics as a service that can be deployed as part of the SDDC stack.
As we became familiar with SDN architectures, we realized that the speed of response possible in automated SDN applications could be used to optimize network performance - provided that the application had access to real-time performance data. The diagram shows how InMon's new real-time analytics engine, sFlow-RT can be integrated with an OpenFlow controller, such as Floodlight to create performance aware software defined networking solutions such as dynamic load balancing and denial of service mitigation.
The Internet Draft draft-krishnan-opsawg-large-flow-load-balancing is a good example of the type of problem that can be addressed using real-time analytics and SDN. The Internet Draft describes the need to for real-time analytics to drive load balancing of long lived flows in LAG/ECMP groups. The draft describes the challenge of managing long lived connections in the context of service provider backbones, but similar problems occur in the data center where long lived storage connections (iSCSI/FCoE) and network virtualization tunnels (VxLAN, NVGRE, STT, GRE etc) are responsible for a significant fraction of data center traffic.
A multi-path SDN load balancing system would consist of following elements:
Measurement - The sFlow standard provides multi-vendor, scaleable, low latency monitoring of the entire network infrastructure.
SDN application - The SDN application implements a load balancing algorithm, immediately responding to large flows with commands to the OpenFlow controller.
Controller - The OpenFlow controller translates high level instructions to re-route flows into low level OpenFlow commands.
Configuration - The OpenFlow protocol provides a fast, programatic, means for the controller to re-configuring forwarding in the network devices.
We see great future potential for performance aware SDN, but today the SDN market is still emerging and so we released sFlow-RT under a free for non-commercial use license to encourage developers to experiment and build new performance aware SDN applications that will help drive adoption of SDN architectures.
Collaborate with vendors and users of cloud infrastructure to include essential instrumentation.
Instrumenting the cloud
InMon is an active participant in the sFlow.org industry consortium responsible for development of the sFlow standard for instrumenting high speed switched networks. Cloud computing has driven an increased demand for bandwidth, accelerating the transition to 10 Gigabit Ethernet adoption of merchant silicon and consequent multi-vendor support for sFlow. In addition, network vendors have been working to extend the sFlow standard to address new challenges in data center monitoring, for example to include support for link aggregation.
Instrumenting software virtual switches requires industry engagement beyond the traditional networking companies to include leaders from the software industry such as Nicira, Citrix and Microsoft. However, implementing sFlow in virtual switches raised further questions. The virtual switch runs as software on a server and so you need to monitor server performance in order to fully characterize virtual network performance.
The high performance computing community has been monitoring the performance of large scale compute clusters for years. Open source cluster monitoring tools, such as Ganglia have made the transition into the mainstream and the tools are used by companies like Twitter, Pandora and Etsy to monitor their large scale cloud computing infrastructures. Engaging the open source community and the companies that use the tools resulted in extensions to the sFlow standard to report server performance metrics. Today, the open source Host sFlow project forms the core of an ecosystem of related open source projects that embed sFlow instrumentation into compute platforms and provide the unified network and server instrumentation needed to effectively manage cloud infrastructures.
Ultimately, the cloud infrastructure is there to support applications and understanding how application performance is affected by the performance of the underlying infrastructure is essential. Cloud application architectures typically consist of pools of servers communicating using HTTP/REST APIs. Input from the cloud development and operations (devops) community lead to the inclusion of HTTP and application performance metrics in the sFlow standard (e.g. URLs, response times, status codes etc.) and support for the standard in popular servers like Apache, Java, Tomcat, NGINX and Memcached.
Before leaving the topic of instrumentation, it is worth briefly describing why InMon's focus has been on sFlow rather than SNMP or NetFlow/IPFIX. When comparing technologies it is important to be clear about requirements and the basis for the comparison - the following comments focus on the task of delivering performance analytics as part of a cloud orchestration stack - if your requirements differ, you may draw different conclusions.
SNMP - Polling is slow, resource intensive, limited in scope and doesn't deal with dynamic cloud environments where agents are constantly being created, moved and destroyed. For additional information, see Push vs Pull.
NetFlow/IPFIX - Flow-based monitoring introduces delay that limits the value of the measurements to SDN applications. In addition, while Flexible NetFlow/IPFIX is theoretically capable of transporting many types of measurement, in practice flow monitoring is complex to configure and offers little uniformity across products and vendors, making it impractical to provide the complete coverage needed for effective control. For additional information, see NetFlow/IPFIX.
Working through the changes described in this article I have learned a number of lessons that I would like to share:
Partnerships and standards are essential. No single company has all the technologies, products and skills needed to deliver a complete solution. Companies need to cooperate to build architectures and interfaces that allow their products and services to be combined as modules in a larger system.
The future is multi-disciplinary. Melding of software, compute and networking technologies requires a broad appreciation for how the pieces come together. A great way to meet people in related disciplines is to find Meetups in your area.
Learn from large cloud application companies like Facebook, Google, NetFlix, Amazon, Yahoo, Zynga, Etsy and Ebay. Their engineers present at conferences like Velocity and many have technical blogs in which they describe the types of challenge they face and the solutions they are building. These companies are at the leading edge and pioneering techniques that will enter the Enterprise market over the next few years.
Don't underestimate the speed of change. SDDC shifts innovation to software, a much faster moving field than traditional network hardware.
Please comment and share your experiences so we can all gain better insights into the transformation taking place in our industry.