Cisco Nexus

Herman_Anthony · February 2, 2015, 6:17pm

Nanog,

I would like to poll the collective for experiences both positive and negative with the Nexus line. More specifically I am interested in hearing about FEX with N2K at the ToR and if this has indeed made any impact on Opex as well as non-obvious shortcomings to using the fabric extenders. Also if anyone is using any of the Nexus line for I/O convergence (FCoE) I would be interested in hearing your experience with this as well.

Thank you in advance,

-A

Mann_Jason · February 2, 2015, 6:28pm

The biggest thing we ran into was no support of spanning-tree on the FEX's. The way we are setup, being STATE government, our agency controls the network up to the FEX port. Beyond that, the agency's were in control of what they plugged into our FEX ports.

David_Bass · February 2, 2015, 6:51pm

The n2k ToR is not a great design for user or storage interfaces if most of your traffic is east/west. It is great as a low cost ilo/drac/choose your oob port, or if most of your traffic is north/south. Biggest thing to remember is that it is not a switch, and has limitations such as not connecting other switches to it. Like anything else you have to understand the product so that you don't engineer something that it wasn't designed to do.

Lots of very large companies using Nexus gear...

That being said I prefer Arista when I'm architecting DCs.

Dan_Rohan · February 2, 2015, 7:09pm

I think it depends on what the upstream product from the FEX is and what
your requirements are. Last I checked, eVPC was not supported on the N7K,
but it was supported as an option on the N5K platform. eVPC being the dual
homed FEX to two a pair of N7K's running a VPC cluster. I know this is an
old post, but here's a good one that explains precisely what I mean:

If you look at the N5K verified scalability guide, you see this:

Maximum FEXs dual homed to a vPC Cisco Nexus 5500 Series Switch Pair: 24

http://www.cisco.com/en/US/docs/switches/datacenter/nexus5500/sw/configuration_limits/b_N5500_Config_Limits_602N11_chapter_01.html

If you look at the N7K verified scalability guide, there is *no* mention of
dual-homed fex architectures:

http://www.cisco.com/en/US/docs/switches/datacenter/sw/verified_scalability/b_Cisco_Nexus_7000_Series_NX-OS_Verified_Scalability_Guide.html#reference_E1ED6266546C444093CC27DEB0E1B38E

If I'm wrong and someone is dual homing 2K FEXs to 7K VPC pairs, please
correct me.

If you're interested in latency numbers fex-to-fex, here are some numbers
provided to me by our SE:

http://jumboframe.net/jumboframe/2013/5/5/n2k-fex-to-fex-latency-and-a-reader-follow-up
<http://static1.squarespace.com/static/513c6d36e4b0cc0702f94292/t/51866c97e4b0580e000cf5bd/1367764121313/fex-to-fex-latency.jpg?format=750w>

As you can see, these numbers are decent, but you'd have to be very careful
in choosing your FEX model if you're looking for better than average store
and forward performance out of your FEX.

George_Herbert · February 2, 2015, 7:19pm

I wasn't the implementing engineer but I've been at two places that did that, a larger game company and a network gear manufacturer in their engineering support computational hubs. I was there during planning and rollout at the game company, very early in the Nexus lifespan.

Both sites brought the FEXes back to 5500s; one used a 6-something for core, the other a pair of 7ks.

Game company was more east-west, telco eqpt was very heavy east west.

In both cases it's working fine.

George William Herbert

Chris_Marget · February 2, 2015, 7:28pm

There are some unfortunate limitations in classifying incoming traffic.

It's been a while, but I think the rule is that Nexus 2000 devices can only
classify based on incoming 802.1p cos values.

It's a pretty strange and disappointing limitation for an edge device where
you're less likely to have incoming dot1q tags and you're less likely to
trust the other end of the link to mark its own traffic.

/chris

Brandon_Ewing · February 3, 2015, 12:24am

And remember -- The Nexus 2K performs absolutely ZERO local switching -- all
frames received from client ports are just copied to the upstream device, so
it can handle the frame/packet forwarding logic.

Justin_M_Streiner · February 2, 2015, 8:49pm

Also remember that the Nexus (5K, at least) does cut-through switching. If you receive an errored frame on one port, the switch can and often will happily forward those errored frames once it figures out the destination MAC address(es).

jms

George_Herbert · February 3, 2015, 1:03am

What this really does is force you to consider how much of your East-West is rack-local, versus off rack.

Rack-local-heavy hurts as badly as off rack, with FEX.

If you want to / can localize E/W tighter than that then you want real TOR switching. If the average E-W is cross rack then the FEX are performance equivalent. For random distributions this comes at a few racks. For intentional distributions it's probably better to TOR switch from day one.

George William Herbert

Ray_Soucy · February 3, 2015, 12:18pm

I have a small setup, Nexus 2 x 5596UP + 12 x 2248TP FEX, 2 x B22DELL,
2 x B22HP, 1 x C2248PQ-10GE.

Been using this setup since 2012, so it's getting a bit long in the
tooth. It's in an Active-Active setup because there wasn't much
guidance at the time on which way to go. There are some restrictions
with an AA setup you probably want to avoid. We currently don't do
any FCoE because we're mostly a NetApp and NFS environment.

The performance and stability have been great.

It works well for a traditional environment with a lot of wired ports
to stand-alone servers. If you do a lot with virtualization it's not
a great solution. You really want to avoid connecting VM host servers
to FEX ports because of all the restrictions that come with it. One
restriction that's a real PITA for me right now is that a FEX port
can't be a promiscuous trunk port if you're using PVLAN.

Using config-sync has been a lot of trouble. There are a lot of
actions that will verify OK but then fail. The result is that things
are partially configured and the whole system gets out of sync not
letting you make any other changes; the fix is having to manually go
in to each switch to try and get the configuration to match (which
requires comparing the running-configuration to the running
switch-profile configuration).