RE: Winstar says there is no TCP/BGP vulnerability

Christopher / David,

Christopher L. Morrow wrote:
if you mean resetting sessions to change keys I believe
it's more code dependent than anythingn else.

That was my point: I thought that changing keys without resetting the
session was tied to a specific version of the "S" train that I have
never seen it on anything lower than a 7200. Anyway, given David Luyer's
post, it appears that unless you are willing to accept the risk of an
unplanned session reset, you'd better have a planned outage for it.

David Luyer wrote:
Have done around 100 of these in the past 24 hours. It's
not related to platform AFAIK - we've successfully done
the changes on a lowly 2651 and 3620 without outages, but
a 7200 with older IOS did have an outage.

Given the context, I assume that you have added MD5 to sessions that did
not have it previously, I am correct?
Then, do you mean by "without outages" that the session was not reset by
the password add/change? If I may ask, how many out of 100 did not
reset?

Christopher L. Morrow wrote:
For pure: "Don't blow me up with prefixes" just limit the
maximum-prefix to some # over your expected peer's list.

Please allow me to try to make my point again: you store the expected
peer maximum-prefix somewhere in your management system. I do understand
the added complexity, but in the big scheme of things would it be _that_
more difficult to store a comma-delimited string or something that
contains the prefixes that could be announced by that peer instead of
the maximum-prefix? Yes, it generates more work to update the database,
but OTOH it provides the LIII engineer with a lot more to troubleshoot
issues. Is it simply not worth the work at your scale?

- There are cases (such as the peer being a tier-2 customer of
UUNET and me being a tier-3 customer of UUNET via a different
tier-2) when the routes seen coming from the peer will have the
same length AS-PATH than the ones coming from my transit, some
other BGP tie-breaking criteria favoring the peer over the
transit, leading to disaster.

use a route-map to add/remove metric or localpref? or any
other settable thing on your side? or prepend or ....

Based on what criteria? Both the peer and the transit announce the same
prefix with the same AS-PATH length. I agree that in many cases,
favoring the route coming from the transit provider would work, but what
guarantees it? What we are trying to define is the idiot-proof setup for
peers; what if the misconfiguration is with the transit?

- In theory, I could add a "route-map blah deny 1" that matches
everything, then manipulate the subsequent seqs at will, then
remove the "route-map blah deny 1"; in this situation though,
I do not see a clear advantage over clearing the session.
What am I missing?

you could tftp in your config change, that doesn't cause the
problems... then just wait for next update time.

I don't see much of a difference. AFAIK when you tftp a config into the
running-config, it is appended to the existing config same as if you
pasted the commands into conf t. What happens when the next update time
happens in the middle of tftp merging the old route-map with the new
one?

Michel.

Michael,

> David Luyer wrote:
> Have done around 100 of these in the past 24 hours. It's
> not related to platform AFAIK - we've successfully done
> the changes on a lowly 2651 and 3620 without outages, but
> a 7200 with older IOS did have an outage.

Given the context, I assume that you have added MD5 to sessions that did
not have it previously, I am correct?
Then, do you mean by "without outages" that the session was not reset by
the password add/change? If I may ask, how many out of 100 did not
reset?

98 of the first 100 did not reset. Today, I did another 12 and only one
failed.

The 2 to fail were one with 12.2(17a) and 12.2(23) where we got the timing
a couple of seconds off -- not sure if the reason was IOS or timing -- and
one with a 12.0S release where the reset is automatic. Testing also
revealed that 12.1 had an automatic reset, just we didn't run into it in
production. The one which failed today was another case of 12.0S.

However some providers run 12.0S exclusively so I'd expect they'd see
every single session reset. And when it comes to Juniper/Foundry/Extreme,
I haven't set up passwords on any of the sessions to these vendors yet.

The important thing is the IOS - as I stated earlier, it's easy to test
in a lab and see if the router syslogs a reset on putting in a password,
if it doesn't syslog one then you have a very strong chance it won't
reset, but if it's a critical BGP session it would still be sensible to
do it in an off-peak window (which happens to be why I haven't done any
of the BGP sessions to peers using Juniper/Foundry/Extreme yet).

If you have a fully redundant internal BGP, and are running all
12.2S/12.3/12.2T, then you can rather safely do the internal BGP
passwords without a customer notice, expecting no session drop but
knowing if one did you'd have routes via a second BGP reflector anyway.

One note - when you do this, the table version shown in 'show ip bgp sum'
resets to zero in some IOS versions. This appears cosmetic; routing is
not impacted and this can also occur when simply putting a description
against a BGP peer.

David.

For pure: "Don't blow me up with prefixes" just limit the
maximum-prefix to some # over your expected peer's list.

Please allow me to try to make my point again: you store the expected
peer maximum-prefix somewhere in your management system. I do understand
the added complexity, but in the big scheme of things would it be _that_
more difficult to store a comma-delimited string or something that
contains the prefixes that could be announced by that peer instead of
the maximum-prefix? Yes, it generates more work to update the database,
but OTOH it provides the LIII engineer with a lot more to troubleshoot
issues. Is it simply not worth the work at your scale?

Until recently, I had always worked with maximum prefixes only, but last month I suggested something along the lines of the above to a customer. However, after spending an hour trying to come up with a prefix filter for just one peer I changed my mind. This just doesn't work unless your peers are all tiny, announcing a couple of prefixes or so, or you generate the filters from a routing registry. However, the latter is very problematic as well. Just installing RAtoolset requires a PhD in Unix system administration, and few networks register their information correctly.

With max prefixes you can just set the limit at 10k and never look back. Obviously this still allows your peer to do lots of very bad things, but announcing the full table to you isn't one of them. Or you can keep the limit to 150% of what is actually announced but this requires more work and incurs occasional session drops because many peers don't announce an increase in the number of prefixes in advance.

use a route-map to add/remove metric or localpref? or any
other settable thing on your side? or prepend or ....

Based on what criteria? Both the peer and the transit announce the same
prefix with the same AS-PATH length. I agree that in many cases,
favoring the route coming from the transit provider would work,

Huh? You don't pay for peering traffic by the megabit, so the idea is to always prefer routes from peers.

- In theory, I could add a "route-map blah deny 1" that matches
everything, then manipulate the subsequent seqs at will, then
remove the "route-map blah deny 1"; in this situation though,
I do not see a clear advantage over clearing the session.
What am I missing?

Traditional way: have both prefix and AS path filters. Only update one at a time. You should be ok even if one filter lets something through during an update.

More advanced way: have route maps that tag routes with communities on ingress, allow only routes with the right communities on egress. Any problems with either set of route maps hits the implicit deny so the route won't be propagated so if something goes wrong, no harm, no foul.

And a nice round of "clear ip bgp * in" and "clear ip bgp * out" afterwards never hurts. :slight_smile: (CPUs can't feel pain, right?)

Christopher L. Morrow wrote:
For pure: "Don't blow me up with prefixes" just limit the
maximum-prefix to some # over your expected peer's list.

Please allow me to try to make my point again: you store the expected
peer maximum-prefix somewhere in your management system. I do understand
the added complexity, but in the big scheme of things would it be _that_
more difficult to store a comma-delimited string or something that
contains the prefixes that could be announced by that peer instead of
the maximum-prefix?

Yes.

Yes, it generates more work to update the database,
but OTOH it provides the LIII engineer with a lot more to troubleshoot
issues. Is it simply not worth the work at your scale?

Exactly.

And you do not have to be at 701's scale for this to not work.

Process is a bitch. Especially when it involves other people over whom you no control.

And when that process involves customers calling to ask why they can't get to XXX web site (no pun intended - I'm sure no one would filter a pr0n site :), it is much more than "a bitch", it is a CLM/CEM.

>>Christopher L. Morrow wrote:
>>For pure: "Don't blow me up with prefixes" just limit the
>>maximum-prefix to some # over your expected peer's list.
>
>Please allow me to try to make my point again: you store the expected
>peer maximum-prefix somewhere in your management system. I do
>understand
>the added complexity, but in the big scheme of things would it be
>_that_
>more difficult to store a comma-delimited string or something that
>contains the prefixes that could be announced by that peer instead of
>the maximum-prefix?

Yes.

>Yes, it generates more work to update the database,
>but OTOH it provides the LIII engineer with a lot more to troubleshoot
>issues. Is it simply not worth the work at your scale?

Exactly.

And you do not have to be at 701's scale for this to not work.

  We've not had these issues and have been using
bgp passwords/md5 for years. We do have a fancy configuration
managment system in place, whereby people put things into the
database first before they configure the router.

Process is a bitch. Especially when it involves other people over whom
you no control.

  When people generate configs based on database actions, and
if they're worng they break things and it is quickly
noticed next time someone loads/commits a config.

  We even have scripts to check to make sure that on other
devices where we can't just do 'load override' that the configs
are in sync and warn of pitfalls.

  it takes time and effort to build a well maintained system like
this. sounds like that effort has not been expended on your side.

  then again, i'm guesing you're dealing with less clued people
and have to help them a lot with their bgp configs...

  - jared

Sorry, in this particular post, we were (or at least I was) talking about having prefix filters for all your peers. I know I've talked a lot about MD5 lately, just thought it would be a nice change of subject. :slight_smile:

If you do prefix filter all your peers, that is impressive. Do you get out of sync a lot? Does it help keep the network more stable? Or do process problems make it worse than just max-prefixes on a peer?

>>
>>>Yes, it generates more work to update the database,
>>>but OTOH it provides the LIII engineer with a lot more to
>>>troubleshoot
>>>issues. Is it simply not worth the work at your scale?
>>
>>Exactly.
>>
>>And you do not have to be at 701's scale for this to not work.
>
> We've not had these issues and have been using
>bgp passwords/md5 for years. We do have a fancy configuration
>managment system in place, whereby people put things into the
>database first before they configure the router.

Sorry, in this particular post, we were (or at least I was) talking
about having prefix filters for all your peers. I know I've talked a
lot about MD5 lately, just thought it would be a nice change of
subject. :slight_smile:

  (sorry, i was speaking to the md5 issue here as well.. but
i can comment on the peer prefix-filtering issue as well..)

If you do prefix filter all your peers, that is impressive. Do you get
out of sync a lot? Does it help keep the network more stable? Or do
process problems make it worse than just max-prefixes on a peer?

  We have some peers that fluxuate prefix ranges enough (even
in a 24 hr period) it is causing problems.

  we had 4MB+ router configs @ LINX when we were doing full peer
prefix filtering. It's easier to do in Europe as RIPE provides
a well-structured (yet annoying at times) registration system
whereby people need to know how to do set up the route objects
to get PI space. People also tend to be more clued there than
joe-average ISP elsewhere that runs BGP. People here say "why
should i have to register my routes, just accept what i announce"
whereas people in europe have (more than) half the work already
done as part of their obligations/interaction with RIPE.

  - jared