1GE L3 aggregation

Saku_Ytti1 · June 16, 2016, 7:51am

Hey,

I've been bit poking around trying to find reasonable option for 1GE
L3 full BGP table aggregator. It seems vendors are mostly pushing
Satellite/Fusion for this application.

I don't really like the added complexity and tight coupling
Satellite/Fusion forces me. I'd prefer standards based routing
redundancy to reduce impact of defects.

ASR9001 and MX104 are not an options, due to control-plane scale. New
boxes in vendor pipeline are completely ignoring 1GE.

I've casually talked with other people, and it seems I'm not really
alone here. My dream box would be 96xSFP + 2xQSFP28, with pretty much
full edge features (BGP, LDP, ISIS, +1M FIB, +5M RIB, per-interface
VLANs, ipfix or sflow, at least per-port QoS with shaper, martini
pseudowires).

With tinfoil hat tightly fit on my head, I wonder why vendors are
ignoring 1GE? Are business cases entirely driven now by Amazon,
Google, Facebook and the likes? Are SP volumes so insignificant in
comparison it does not make sense to produce boxes for them?
Heck even 10GE is starting to become problematic, if your application
is anything else than DC, because you can't choose arbitrary optics.

Pierre_Emeriaud · June 16, 2016, 8:52am

Hello Saku,

I've casually talked with other people, and it seems I'm not really
alone here. My dream box would be 96xSFP + 2xQSFP28, with pretty much
full edge features (BGP, LDP, ISIS, +1M FIB, +5M RIB, per-interface
VLANs, ipfix or sflow, at least per-port QoS with shaper, martini
pseudowires).

I'd go with a Nokia/ALu 7750. Two 48x1G IMM, or 5x m20-1gb-xp-sfp MDA
+ 3 IOM3-XP, and 100G IMMs (1 or 2 CFP ports - no idea about qsfp28
options though) with SF/CPM5. This would fit in a SR7. Not sure about
the smaller models (-c or -a series). The 7450 might be an option too,
but I don't know this hardware.

RIB size is 20M iirc (shared between all af) and fib size 3 or 5M
depending of the card. SR-OS is weird when you're used to ios or
junos, but in the end the cli is very efficient and quite enjoyable
actually.

HTH,
pierre

Joel_Jaeggli · June 16, 2016, 3:05pm

There's not a lot of innovation going on in lower end 1G chipsets. The
natural consequent of that is that you can build a high-end gig switch
or router around a chipset supporting 10Gb/s ports or feeds and speeds
your cogs are naturally going to be rather similar to the 10Gb/s offering.

Baldur_Norddahl · June 16, 2016, 7:36pm

Hi

If I need to speak BGP with a customer that only has 1G I will simply make
a MPLS L2VPN to one of my edge routers. We use the ZTE 5952E switch with
48x 1G plus 4x 10G for the L2VPN end point. If that is not enough the ZTE
8900 platform will provide a ton of ports that can do MPLS.

The tunnel is automatically redundant and will promote link down events, so
there is not really any downside to doing it this way on low bandwidth
peers.

Regards

Baldur

Saku_Ytti1 · June 16, 2016, 8:27pm

Hey,

If I need to speak BGP with a customer that only has 1G I will simply make
a MPLS L2VPN to one of my edge routers. We use the ZTE 5952E switch with
48x 1G plus 4x 10G for the L2VPN end point. If that is not enough the ZTE
8900 platform will provide a ton of ports that can do MPLS.

I wonder if you'd do this, if you could do L3 to the edge. And why is
termination technology dependant on termination rate?

The tunnel is automatically redundant and will promote link down events, so
there is not really any downside to doing it this way on low bandwidth
peers.

When you say redundant, do you mean that label can take any path
between access port and termination IRB/BVI? Or do you actually have
termination redundancy?
If you don't have termination redundancy, you have two SPOF, access
port and termination.
If you do have termination redundancy, you're spending control-plane
resource from two devices, doubling your control-plane scale/cost.

I'm not saying it's bad solution, I know lot of people do it. But I
think people only do it, because L3 at port isn't offered by vendors
at lower rates.

Baldur_Norddahl · June 16, 2016, 9:24pm

Hey,

> If I need to speak BGP with a customer that only has 1G I will simply
make
> a MPLS L2VPN to one of my edge routers. We use the ZTE 5952E switch with
> 48x 1G plus 4x 10G for the L2VPN end point. If that is not enough the ZTE
> 8900 platform will provide a ton of ports that can do MPLS.

I wonder if you'd do this, if you could do L3 to the edge. And why is
termination technology dependant on termination rate?

The ZTE 5952E (routing switch) can do L3VPN including BGP. But it is
limited to about 30k routes. It is usable if the customer wants a default
route solution, but not if he wants the full default free zone.

The ZTE M6000S-2S4 (carrier grade router) will do all you want, however it
is more expensive. We use the MPLS routing switch because it is a $2k
device compared to the router which is more like $15k.

As a small ISP we have two edge routers (the slightly larger M6000-S3 which
is about $20k). Our customers are spread out throughout the city and we
have 26 PoPs, so it is much more cost effective to have the cheaper device
put the traffic in a tunnel and haul it back to the big iron.

> The tunnel is automatically redundant and will promote link down events,
so
> there is not really any downside to doing it this way on low bandwidth
> peers.

When you say redundant, do you mean that label can take any path
between access port and termination IRB/BVI? Or do you actually have
termination redundancy?

Our PoPs are connected in a ring topology (actually multiple rings). If a
link goes down somewhere, or an intermediate device crashes, the L2VPN will
reconfigure and find another path.

If you don't have termination redundancy, you have two SPOF, access
port and termination.

For a BGP customer I could offer two tunnels, one to each of our provider
edge routers. But very few of our customers are BGP customers, they just
want normal internet. For them we do VRRP between the two provider edge
routers and have the one tunnel go to both.

If you do have termination redundancy, you're spending control-plane
resource from two devices, doubling your control-plane scale/cost.

The M6000 devices can handle 64k tunnels and are generally way overpowered
for our current business. It is true that I might be limited to 1x 64k
customers instead of 2x 64k customers, but with that many customers I would
need to upgrade anyway.

I'm not saying it's bad solution, I know lot of people do it. But I
think people only do it, because L3 at port isn't offered by vendors
at lower rates.

We actually moved away from a hybrid solution with L3 termination at the
customer edge to simply backhauling everything in L2VPNs. We did this
because the L2VPN tunnels are needed anyway for other reasons and it is
easier to have one way to do things.

Regards,

Baldur

Harald_F_Karlsen · June 17, 2016, 10:10am

What about the Huawei NE20E-S2F/NE40E-M2F?
4 * SFP+ and 40 * SFP fixed ports and two PICs with either 4*SFP+ or 1*QSFP each. Decent FIB. Not really sure about the IPFIX/sflow thought. Pricing seems very aggresive on these devices as well.

Saku_Ytti1 · June 17, 2016, 10:43am

Last I checked you can't commit/replace configuration in VRP. Has this
changed? Can you give it full new config and expect it to figure out
how to apply the new config without breaking existing?

I'm definitely not excluding Huawei. No 1GE PICs? Can you give
indicative pricing, what is aggressive pricing? Are these under 5k?

Colton_Conor · June 17, 2016, 12:58pm

What size FIB/RIB table does that Huawei have?

Saku_Ytti1 · June 17, 2016, 1:05pm

It has 25M RIB and 4M FIB. Same Solar NPU as their largest kit.

Colton_Conor · June 17, 2016, 1:17pm

Whats the price piont though? Is that the router he was saying in 15K
range?

Saku_Ytti1 · June 17, 2016, 1:20pm

I'm all Shania Twain on 15k.

I've seen people buy MX80 for bit over 3k, this isn't that much
denser. 5k would impress me much.

Colton_Conor · June 17, 2016, 1:25pm

Thats some extreme level of unheard discount to get a full MX80 for 3K.

Saku_Ytti1 · June 17, 2016, 2:00pm

Yeah it's best I've seen. 8-10k isn't anything special.

Radu-Adrian_Feurdean · June 17, 2016, 8:50pm

Last I checked you can't commit/replace configuration in VRP. Has this
changed? Can you give it full new config and expect it to figure out
how to apply the new config without breaking existing?

... later...

Yeah it's best I've seen. 8-10k isn't anything special.

I suppose that's the reason I didn't see the Brocade CER-RT (some time
ago best-seller) listed. Probably the lack of need for full-table FIB
also counts.

Mark_Tinka1 · June 18, 2016, 11:04am

Personally (and at work), I stay away from such topologies. Centralizing
IP connectivity like this may seem sexy and cheap in the start, but it
has serious scaling and operational issues down the line, IMHO.

We push IP/MPLS all the way into the Metro-E Access using a team of
Cisco ASR920's and ME3600X's. The value of being able to instantiate an
IP service or BGP session directly in the Metro-E Access simplifies
network operations a great deal for us. Needless to say, not having to
deal with eBGP Multi-Hop drama does not hurt.

Centralizing is just horrible, but that's just me. The goal is to make
all these unreliable boxes work together to offer a reliable service to
your customers, so making them too inter-dependent on each other has the
potential to that away in the long run.

Mark.

Mark_Tinka1 · June 18, 2016, 11:05am

A lot of people did it because because there really wasn't a cheap,
dense solution until about 2010. And even then, the traditional strategy
had become so entrenched that running IP all the way in the Access was
such a foreign concept which was most certainly a lot more expensive
than incumbent Layer 2-based Access models.

I feel this has since changed with the current offerings from Cisco,
Juniper and Brocade. The problem now is how to scale the low-speed port
density up, as well as add 10Gbps port density, without increasing the
cost or size of the platforms.

Mark.

Mark_Tinka1 · June 18, 2016, 11:07am

The ZTE 5952E (routing switch) can do L3VPN including BGP. But it is
limited to about 30k routes. It is usable if the customer wants a default
route solution, but not if he wants the full default free zone.

Might be worthwhile to ask ZTE to develop their own implementation of
BGP Selective Download.

Our PoPs are connected in a ring topology (actually multiple rings). If a
link goes down somewhere, or an intermediate device crashes, the L2VPN will
reconfigure and find another path.

Which is what would happen anyway with your IGP if the service were
delivered in the Access, but with fewer moving parts and less
inter-dependence if the problem went beyond just ring failure or device
crash.

For a BGP customer I could offer two tunnels, one to each of our provider
edge routers. But very few of our customers are BGP customers, they just
want normal internet. For them we do VRRP between the two provider edge
routers and have the one tunnel go to both.

If your BGP customer-count grows, while managing 2 eBGP sessions per
customer is not life-threatening, it's certainly won't go unnoticed from
an operational perspective, especially if you are doing this as a matter
of (redundancy) course, in lieu of a revenue-generating request by the
customer to increase their SLA.

We actually moved away from a hybrid solution with L3 termination at the
customer edge to simply backhauling everything in L2VPNs. We did this
because the L2VPN tunnels are needed anyway for other reasons and it is
easier to have one way to do things.

I've never been one to support the confluence of infrastructure tunnels
with customer service tunnels. That's why we avoid infrastructure
tunnels in general, e.g., creating a tunnel from a data centre to a
peering point over which you will run peering traffic because the device
at the data centre can't support peering, or running a tunnel between
two PoP's to handle intra-PoP traffic, e.t.c. When you have all these
tunnels running around, side-by-side with customer revenue-generating
tunnels for their own sake (like a site-to-site l2vpn you've sold to a
customer), it can get hairy at scale, I think. Too much
inter-dependence, too many lines coming together. But again, that's just me.

Mark.

James_Jun · June 18, 2016, 3:37pm

Centralizing is just horrible, but that's just me. The goal is to make
all these unreliable boxes work together to offer a reliable service to
your customers, so making them too inter-dependent on each other has the
potential to that away in the long run.

One issue with pushing IP transit (L3-wise) with small boxes down to the
metro is that if a particular customer comes under attack, any DDoS in
excess of 10-30 Gbps is going to totally destroy the remote site down to
the floor and then some, until NOC intervenes to restore service.

A Big Expensive Router at head-end site fed with big pipes to your IP core
just needs a subscriber line rate policer configured on the customer EVC
off the NNI facing your metro transport network, largely protecting your
metro POP during an attack.

There are also issues with control-plane policing (or limited options
there of) with some of these low-end platforms.

We push IP/MPLS all the way into the Metro-E Access using a team of
Cisco ASR920's and ME3600X's. The value of being able to instantiate an
IP service or BGP session directly in the Metro-E Access simplifies
network operations a great deal for us. Needless to say, not having to
deal with eBGP Multi-Hop drama does not hurt.

BGP Selective Download has its own drawbacks too--in that, it's largely
meant to be used on single-tailed environment with FIB only having single
point of egress.

Consider a topology where an ASR920 in the metro is dual-homed to two
peering sites using variably distant dark fiber (say 30km to Site A,
90km to Site B), with IGP costs configured to conform to fiber distances.

How will you guarantee that the best-path that ASR920 chooses for your
customer taking full table to be actually congruent with the box's
real forwarding path? Your 920 may choose site A as best-path, only
to see FIB-programmed default route to force it out on site B. If you're
doing active-standby on your fiber uplinks, then it would not be an issue;
or may be in metro environment where latency differences are minimal (<1ms),
you probably don't care in practice to be bothered with that.

Yes, there are some operational complexities and issues with L2vpn'ing
customers to head-end router -- such as, link-state propagation needs to
be properly validated; and now you're burning two ports instead of one,
one at the terminus, one at the access, doubling SPOF and maintenance
liabilities.

At the end of the day, it's lack of full-featured ports at reasonable cost
that drive centralization to head-ends. Spamming Small Expensive Routers
(ASR9001/MX104) in every small metro site doesn't scale (btdt myself), but
neither is hacking up BGP to work on platforms that aren't really meant to
function as heavy L3 routers (e.g. ASR920, ME3600), IMHO.

James

Baldur_Norddahl · June 18, 2016, 7:31pm

Is the claim about fewer moving parts actually true? Yes if you are
comparing to a plain native single-stack network with IPv4 (or IPv6)
directly on the wire. But we are doing MPLS, so in our case it is L2VPN vs
L3VPN. Both will reroute using the exact same mechanism, so no difference
here.

I found that I could remove large parts of the configuration on the access
edge devices when we went from L3VPN to L2VPN. Some people will find the
network easier to understand when all major configuration is in only two
devices, and those two devices are mostly a mirror of each other.

I agree that L3VPN is the better solution, at least in principle. That is
why we started by implementing L3VPN. But in practice the L2VPN solution we
have now is actually easier.

Regards,

Baldur