Do ATM-based Exchange Points make sense anymore?

Nenad_Trifunovic · August 9, 2002, 9:16pm

It appears that for analysis purposes one has to separate access
from switching. How much payload one brings to the exchange depends
on port speed and protocol overhead. In that light, Frame Relay
can bring similar amount of payload as Ethernet (comparable overhead)
and preserve good properties of ATM (traffic flow separation).

Regards,
nenad

p.s.
both juniper 160 and cisco gsr can handle oc-48 frame relay, and
they don't seem to be frame relay switches

Mikael_Abrahamsson · August 9, 2002, 9:39pm

What functionality does PVC give you that the ethernet VLAN does not?

What is the current max speed of frame relay in any common vendor
implementation (I'm talking routers here).

Petri_Helenius · August 9, 2002, 10:06pm

What functionality does PVC give you that the ethernet VLAN does not?

That�s quite easy. Endpoint liveness. A IPv4 host on a VLAN has no idea
if the guy on the "other end" died until the BGP timer expires.

FR has LMI, ATM has OAM. (and ILMI)

Pete

Paul_A_Vixie4 · August 10, 2002, 2:32am

> What functionality does PVC give you that the ethernet VLAN does not?

That�s quite easy. Endpoint liveness. A IPv4 host on a VLAN has no idea
if the guy on the "other end" died until the BGP timer expires.

FR has LMI, ATM has OAM. (and ILMI)

Adding complexity to a system increases its cost but not nec'ily its value.
Consider the question: how often do you expect endpoint liveness to matter?

If the connection fabric between your routers has an MTBF best measured in
hours or days, then you've got bigger problems than you'll solve with LMI.

If on the other hand the MTBF is best measured in months or years, then when
it does fail the failure is likely to be *in* the extra complexity you added.

Petri_Helenius · August 10, 2002, 8:19am

Paul Vixie wrote:

Adding complexity to a system increases its cost but not nec'ily its value.
Consider the question: how often do you expect endpoint liveness to matter?

The issue I'm trying to address is to figure out how to extend the robustness
that can be achieved with tuned IGP's with subsecond convergence across
an exchange point without suffering a one to five minute delay blackholing
packets. Liveness is an issue when a box either loses coherency between
software and hardware state on an interface or decides to reload all or
part of the system without minding to reset the BGP TCP sessions before
going away.

I'd be happy to hear solutions that are in use and commonplace for this
problem. Mostly I've seen "it's the other guys problem" as an answer and
solution being migrating all connectivity to one ISP.

If the connection fabric between your routers has an MTBF best measured in
hours or days, then you've got bigger problems than you'll solve with LMI.

If on the other hand the MTBF is best measured in months or years, then when
it does fail the failure is likely to be *in* the extra complexity you added.

As far as I understand, this "complexity" just got added with Neighbor Discovery
on IPv6. Which would solve this problem when properly propagated up the stack
from ND to TCP and tweaking the ND timers down. No need to touch the BGP timers.

Pete

Paul_A_Vixie4 · August 10, 2002, 9:21am

warning: i've had one "high gravity steel reserve" over my quota. hit D now.

The issue I'm trying to address is to figure out how to extend the robustness
that can be achieved with tuned IGP's with subsecond convergence across
an exchange point without suffering a one to five minute delay blackholing
packets.

why on god's earth would subsecond anything matter in a nonmilitary situation?

are you willing to pay a cell tax AND a protocol complexity tax AND a device
complexity tax to make this happen? do you know what that will do do your
TCO and therefore your ROI? you want to pay this tax 100% of the time even
though your error states will account for less than 0.001% of the time? you
want to have the complexity as your most likely source of (false positive)
error?

As far as I understand, this "complexity" just got added with Neighbor
Discovery on IPv6.

if so, then, you misunderstand.

Mikael_Abrahamsson · August 10, 2002, 11:24am

It does when you start doing streaming anything, say TV or telephony. I
agree that this wont be solved using any current L3 or above protocol
since BGP takes quite a while to recalculate anyway. Any redundancy has to
be pre-calculated or on a lower level, this is where for instance SRP/DPT
claims excellence, guess same claims come from the MPLS crowd.

I guess you have to pay a 50% tax on capacity to handle this whatever you
do.

Personally I agree with you, the KISS principle is golden here. Peering
should be cheap, that is the only reason to do it, and therefore one does
not want a lot of complexity that brings up the cost. Tweaking eBGP dead
timers to 5-10 seconds works well in most cases.

I have some idea about bringing together some of the signalling from
DPT/SRP into a switching ethernet environment (for instance, have some
kind of signalling between switches (propagated to hosts) that a certain
port has gone down and notify that certain mac addresses are no longer
reachable). I have not looked into it more carefully and it would take
several years to get any standard implemented (even though I feel that it
wouldn't be that hard to do). Just state what mac addresses was removed
from your forwarding table due to link down, signalling this to everybody
connected to you. Probably won't scale to very large L2 domains, but would
perhaps be ok for 50-100 nodes connected to an IX.

Mike_Hughes2 · August 10, 2002, 12:24pm

It does when you start doing streaming anything, say TV or telephony. I
agree that this wont be solved using any current L3 or above protocol
since BGP takes quite a while to recalculate anyway. Any redundancy has to
be pre-calculated or on a lower level, this is where for instance SRP/DPT
claims excellence, guess same claims come from the MPLS crowd.

For it to be of any use, this rapid failover would have to be end-to-end
too. It's no good picking on one network element, such as the exchange,
and getting them to spend significant amounts of time and energy on rapid
failover, if it's just going to fall apart on either side.

We've been down this road with multicast. Getting good (non-IGMP)
multicast containment on a switched ethernet isn't easy, nor is the
current situation ideal - there are several different approaches to
containment out there (and then we come back to getting stuff through the
standards process too).

But, the pressure isn't there either, because the access networks aren't
enabled/capable right now - certainly from talking to UK broadband
providers. There's also a non-technical driver - the bizdev people who are
in favour of per megabit billing will oppose multicast on the grounds that
the meter won't tick over as quickly (in their eyes).

Personally I agree with you, the KISS principle is golden here. Peering
should be cheap, that is the only reason to do it, and therefore one does
not want a lot of complexity that brings up the cost. Tweaking eBGP dead
timers to 5-10 seconds works well in most cases.

Agreed. However, one thing to consider is the effect that the short timers
has on the routing table, in terms of announcements and withdrawals.

It takes about 20-30 seconds to warm boot a Foundry BI8000/15000 and get
it forwarding.

So, in the event of a software upgrade (or some other need to reboot,
fairly rare), as long as you dont have fast-external-fallover enabled or
your timers shortened, you will blackhole some traffic, but in the large
majority, BGP sessions will stay up.

With the shorter timers or fast-external-fallover, a very short
maintenance slot at a large exchange can cause ripples in the routing
table. It would be interesting to do some analysis of this - how far the
ripples spread from each exchange!

I'm not saying that one or the other is right, it's just another tax!

I have some idea about bringing together some of the signalling from
DPT/SRP into a switching ethernet environment (for instance, have some
kind of signalling between switches (propagated to hosts) that a certain
port has gone down and notify that certain mac addresses are no longer
reachable).

Keith Mitchell had some ideas about harnessing OSPF at the MAC layer,
which I become involved in. People thought it may have had some potential
(others thought we were on interesting drugs!), but we're back to the tax
thing again. It's yet another protocol, and some people believed that it's
usefulness would be overtaken by MPLS (despite the potential for more
complexity), which we already have.

Probably won't scale to very large L2 domains, but would perhaps be ok
for 50-100 nodes connected to an IX.

Which, some argue, reduces the number of potential applications, and
therefore the justification for building it.

Mike

Rob_Seastrom2 · August 10, 2002, 2:24pm

Mikael Abrahamsson <swmike@swm.pp.se> writes:

> why on god's earth would subsecond anything matter in a
> nonmilitary situation?

It does when you start doing streaming anything, say TV or telephony. I

I submit that it doesn't matter for voice or video, if the MTBF is
reasonably high. Consider the reliability that people put up with
from their cable companies, and the voice quality that we accept from
our (North American) cell phones, not to mention the dropped calls.
Streaming video and VOIP is an order of magnitude better in my
experience without doing anything special.

I hate to come across (particularly in this forum) as an advocate of
purely market-driven engineering, you have to ask yourself what you're
buying if you're spending money to fix a problem that your customers
don't (and won't) perceive as such.

Remember the words of Admiral Gorshkov, who is variously quoted as
having said: "(Better,Perfect) is the enemy of good enough."

---Rob

Petri_Helenius · August 10, 2002, 3:09pm

Paul Vixie wrote:

warning: i've had one "high gravity steel reserve" over my quota. hit D now.

> The issue I'm trying to address is to figure out how to extend the robustness
> that can be achieved with tuned IGP's with subsecond convergence across
> an exchange point without suffering a one to five minute delay blackholing
> packets.

why on god's earth would subsecond anything matter in a nonmilitary situation?

If the software MTBF would be better, convergence would not be an issue. As long
as it's an operational hazard to run core boxes (with some vendors anyway)
with older piece of code than six months, you end up engineering convergence
into the networks.

are you willing to pay a cell tax AND a protocol complexity tax AND a device
complexity tax to make this happen? do you know what that will do do your
TCO and therefore your ROI? you want to pay this tax 100% of the time even
though your error states will account for less than 0.001% of the time? you
want to have the complexity as your most likely source of (false positive)
error?

Who said anything about cell tax? If I ask for liveness you give me ATM?

> As far as I understand, this "complexity" just got added with Neighbor
> Discovery on IPv6.

if so, then, you misunderstand.

As far as I understand, ND does contain the functionality I'd like to accomplish,
unfortunately it does not do that for IPv4. I'm just making points why, in existing
operational environment, going from ATM to GE reduces robustness. Instead of going
on the defensive it would probably help to discuss how to make ethernet-based
solutions more robust, since that's where everybody is moving to anyway.

Pete

Richard_A_Steenbegen · August 10, 2002, 3:20pm

Odd, I think most people would say it's an operational hazard to run code
newer than 6 months old, or at least with less than 6 months of testing on
any particular image.

How they're able to completely break so many critically important things
within 2 weeks between a bugfix code rev is still beyond me.

Jared_Mauch · August 10, 2002, 3:41pm

>
> If the software MTBF would be better, convergence would not be an issue.
> As long as it's an operational hazard to run core boxes (with some
> vendors anyway) with older piece of code than six months, you end up
> engineering convergence into the networks.

Odd, I think most people would say it's an operational hazard to run code
newer than 6 months old, or at least with less than 6 months of testing on
any particular image.

With all the recent software secuirty advisories that affect
many vendors (ssh, snmp, etc..) running anything older than that is
a blatant security risk for anyones network. Not keeping up-to-date
on these items and thinking you're fine is just asking to be
brought down.

How they're able to completely break so many critically important things
within 2 weeks between a bugfix code rev is still beyond me.

I'm not sure what vendor you are refering to, but i've not seen
any problems like this anytime in the past 6+ months.

- jared

Vadim_Antonov1 · August 10, 2002, 8:59pm

Telemedicine, tele-robotics, etc, etc. Actually, there's a lot of cases
when you want to have subsecond recovery. The current Internet routing
technology is not up to the task; so people who need it have to build
private networks and pay for that arm and leg, too.

--vadim

Al_Reuben · August 10, 2002, 9:42pm

What functionality does PVC give you that the ethernet VLAN does not?

Shaping, for one.

What is the current max speed of frame relay in any common vendor
implementation (I'm talking routers here).

Doesn't OC48 POS on GSR and Jewniper do FR?

--
Mikael Abrahamsson email: swmike@swm.pp.se

-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben --
-- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --

Al_Reuben · August 10, 2002, 9:49pm

If the connection fabric between your routers has an MTBF best measured in
hours or days, then you've got bigger problems than you'll solve with LMI.

Agreed. However, I think the debate may be over the (un)reliability of
routers connected to the exchange, not the exchange itself.

-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben --
-- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --

Richard_A_Steenbegen · August 10, 2002, 9:55pm

Welcome to MAE Chicago/New York, http://www.mae.net/FE/. But M160's and
OC48 ports are expensive, I suspect its overkill for the amount of traffic
that will actually be exchanged there.

I do wonder why most GigE exchange points are still doing single lan
segment peering instead of having a peermaker type service for dynamic
vlan configurations. Manual configuration is slow and a pain, and with
some of them charging you per-vlan what it would cost for a copper
crossconnect, it's no wonder most people don't use them.

David_Diaz5 · August 11, 2002, 10:26pm

Paul just hit on it. At how many layers do you want protection, and will they interfere with each other. Granted not all protection schemes overlap. If there if not a layer 1 failure, and a router maintains link0 but the card or routers has somehow failed and is no longer passing packets, I suppose that would have to be caught at layer 3.

At an (MAN) exchange pt based in S. Fl, the technology is a multi-node area exchange point (layer 1 technology) based on dwdm and optical switches. The detection of nodes and failures is done with enhanced-OSPF. On testing, failure between the farthest two nodes and recovery took 16ms (approx 95miles dist btw nodes).

Each individual circuit has a choice of protection level. This allows for no protection for any of a number of reasons. One may be to not interfere with a protection scheme at a higher level. While the switches do use OSPF for detection and recovery, they also use MPLS for reservation of bandwidth. None of this information is passed onto the customer routers however.

It seems there should be a clear delineation btw the layers and what protection schemes should run at each. I also believe in separation of church and state if u will, router companies should play in their space while optical companies show stay in theirs. While it makes sense for some information to pass btw differing types of equipment (such as ODSI protocol or UNI 1.0) integration of the protection schemes runs a high degree of a cascade failure, or susceptibility to an exploit attach.

As an added thought, the same MAN exchange point can do intranode connections (hairpinning). So that the same node that is used in internodal transport and peering, can also be used within a colo as an intelligent cross-connect box. This would allow for visibility and monitoring within the colo and even customer network management of their cross connects.

I suppose the discussion is what do you want from your exchange pt operator and what do you NOT want. Many people would not feel comfortable that circuit operators have visibility and maintain stats

Stephen_Sprunk2 · August 12, 2002, 1:31pm

Thus spake "Alex Rubenstein" <alex@nac.net>

> What functionality does PVC give you that the ethernet VLAN does not?

Shaping, for one.

There is nothing inherent in Ethernet which precludes shaping. Low- and
mid-range routers can do it just fine. If your core router doesn't, speak with
your vendor. Then again, do your core routers really support shaping on OC192
FR either?

> What is the current max speed of frame relay in any common vendor
> implementation (I'm talking routers here).

Doesn't OC48 POS on GSR and Jewniper do FR?

If those boxes approached the reliability of carrier FR/ATM gear, that might be
relevant.

S

Stephen_Sprunk2 · August 12, 2002, 1:39pm

Thus spake "Petri Helenius" <pete@he.iki.fi>

> What functionality does PVC give you that the ethernet VLAN does not?
>
That�s quite easy. Endpoint liveness. A IPv4 host on a VLAN has no idea
if the guy on the "other end" died until the BGP timer expires.

FR has LMI, ATM has OAM. (and ILMI)

FR LMI and ATM ILMI are so notoriously unreliable at endpoint liveness that FR
EEK and ATM OAM became necessary. Be glad Ethernet is not stuck with such a
useless "feature".

It would be trivial for someone to write up an "Ethernet EEK" or "IPv4 ND" draft
and submit it to their favorite router vendors for implementation. If nobody
has done so, it's obviously not that important.

S