I'm Douglas Gourlay, Vice President of Marketing for data center, high-performance computing and high-frequency trading network switch vendor Arista Networks.
Over the past week I polled some folks at different companies manufacturing products that are part of their 'fabric strategy' to get their opinion on what makes their fabric different than a more traditional network.
The best answer came from an account manager at Brocade - "It's really nothing different, but its a different term and it gives me a reason to have a conversation with someone who might otherwise not be so inclined. It also lets me talk about something different so they (the network team) has something to bring up of interest to the rest of the IT organization."
I thought this was one of the most direct answers I received - and it makes a good bit of sense. If all you are talking about it 'same old, same old' Layer-2 and Layer-3 networks, how interesting is that? Many virtualization and server and storage groups within IT will assert that 'we know that stuff' so there is no reason to give the network team any airtime or to pay attention to them, but introduce 'fabric' and at least we think we have something new to discuss.
See, the general premise of 'fabric', depending which vendor-specific variant of the term you give credence to, comes from one of two schools of thought:
Fabric was derived from the use of a switched fabric architecture to connect multiple linecards together in a modular chassis so that each had the ability to talk to any other linecard, at any time. To accomplish this fabrics were often either arbitrated on ingress (such as a virtual output queue system with fabric arbitration) or the fabric was overclocked: meaning the speed the switch fabric could ingest data and forward it was faster than the expectation the linecard had of putting data on the fabric and taking it off. Thus any point in time congestion could be handled by a small but reasonable amount of buffering.
Fabric was derived from the storage area networks (SAN) built out, starting in the late 90s, where the intent was to have a large number of hosts (initiators) connecting to a well-defined number of storage systems (targets). The traffic patterns were well understood and almost always from initiator to target and back. These fabrics were designed to be lossless by implementing a credit-based fabric arbitration mechanism that guaranteed the SAN would not drop traffic under congestion: this was highly necessary because the SCSI protocol was not very tolerant of loss.
In choosing the genesis of each companies specific fabric architecture you can see how they've evolved:
Juniper is trying to extend the concept of a linecard to the fixed configuration rack switches while the others such as Brocade and Cisco are trying to apply FibreChannel technologies such as FSPF to constructing an Ethernet topology. I am not going to state that one approach is necessarily much better than the other, I just want to acknowledge that the design considerations of each is somewhat different and that is why we see disparity in vendor specific architectures between fabric offerings.
While some of the vendors started creating fabrics for the explicit purpose of supporting carrying FibreChannel traffic over an Ethernet substrate it feels like all vendors have rallied around a common shift: that FC is not going to be the primary driver of next-generation network architectures, optimizing network support of virtual machines (VM) is, especially virtual machine mobility, or VMotion is VMware parlance.
The natural evolution of the conversation seems to have been:
Virtualization: "I want to put any VM on any server where I have capacity, at any time, and do all of my provisioning through my VM management GUI or automate it with a private cloud system!"
Network: "I finally got my network stable after all the years you server guys made me support that ridiculous Token Ring network, then your IPX stuff from NetWare 3.12 (the last stable version pre-directory), and the occasional pocket of NetBEUI and AppleTalk! We finally have a stable, secure, and easily supported network with no performance problems and we have all IP, we route at the top-of-rack, and everything is wonderfully summarized and simple and now you want WHAT??!?!?!?!"
VP of Infrastructure: "Hate to do this to you Network, but the Virtualization team is going to save us a ton of budget in adds/moves/changes, simplify our server consolidation project, and enable self-service compute allocation to our developers - they win, figure it out or I will find someone who will."
At this point our hero, the Network Guy, is somewhat stymied and caught between a rock and a hard place - deploy a Large and Flat Layer-2 network with all the known issues of Spanning Tree (as much as I admire the simplicity of the protocol and Radia's poem on the matter, it sort of has become a four-letter acronym in our world) and compromise the hard fought network stability, or, well, else...
This is the main issue that the Fabric crowd is rallying around, regardless of the genesis of their strategy:
Fabric can fix this problem!
With a fabric you can move VMs anywhere because it will give you a stable large flat Layer-2 network.
It's a good promise. But as with everything, the devil is in the details. The fabric trade-off is that the network team must adopt an architecture that generally requires extreme vendor lock-in:
Kind of like standardizing your multi-protocol routing on EIGRP was demanding in the mid 1990s, (except at least in EIGRPs defense it was arguably a better RP than IPX-RIP and AppleTalk with its ZIP storms...)
So lets look at some data and some specifics:
The absolute limit in the size of a fabric is gated on one of two variable depending on the genesis of the fabric:
If its an extended switch fabric, like Juniper's QFabric then the limit is based on four variables:
The port density of the leaf box.
The number of fabric ports used versus host ports delivered on the leaf box.
The port density of the spine box.
The number of spine boxes utilized with the maximum being a HW/SW limitation and not exceeding the number of ports in 1-A-II above.
If its based on TRILL or a TRILL derivative or pre-standard then your limitations are either:
The maximum number of bridges supported in the TRILL implementation (24-100 depending on vendor).
The oversubscription ratio of your design when measuring the cross-sectional bandwidth.
The distribution of traffic across the fabric is going to be based on either a hash based on some N-tuple of the packet headers, or by splitting the packets up into some normal sized chunks or cells and then reassembling them on the far side. Neither is inherently 'bad or good' both have limitations that we can get into if you want to have a fun debate on whether packet striping and reassembly and some jitter is preferable over out of order delivery or if either of those is better than having the possibility of a single high-performance flow congesting other traffic. Its a function of what type of performance guarantee you are offering your customers and the traffic types you have to support as to whether one path is better than another.
The scale of the Network that can be built using L2 based on multi-chassis implementations of LACP (accepting this is still proprietary but only between 2 boxes and not the span of the entire network) is a few thousand ports. If you design at L3 its 10s of thousands of ports.
VMware has one of the more scalable virtualization controllers available today - and it supports 1000 hosts per vSphere instance. Each host can have 20 or so VMs based on today's hardware capabilities with Intel Westmere and the forthcoming Sandy Bridge.
The maximum number of hosts that can participate in an automated vMotion domain that would require L2 adjacency is 32.
The maximum distance you can VMotion workload is when the network latency has a RTT of less than 10msec.
So what does this mean? Well, in short, whether you go with a proprietary fabric or a more open protocol-based network either choice can easily support the scale needed to support the maximum density of VMs and physical hosts that can participate in a stateful vMotion.
It means that future protocols like NVGRE and VXLAN will solve the main Virtualization care-about without requiring fundamental network change, re-architecture, or capital outlay.
It also means that anyone who is selling you the 'move your VMs around the world and follow the sun with your workloads,' doesn't know what they are talking about - there are some hard physics limitations in the way of accomplishing that.
In summary on the specific question of "how to support large and flat L2 domains," I don't think there is a clear winner - both the Fabric architectures, regardless of genesis, support decent scale of L2 domains. Traditional protocol-based networks support the same scale with some variability based on vendor specific product densities. TRILL lets me have a wider spine than multi-chassis LACP models do, but the LACP models extend the network reliability down to the host. Fabric sounds a bit sexier and does give a raison d'etre for some companies as well as a justification for asset churn; open-standard protocol based networking sounds a bit more boring but at least you can hire people who know how.