BGP Routing problem

Hi

We are a ISP in South Africa and have a problem we are trying to
figure out rather urgently, without much success so far.

We are currently multi-homed during the day and tripple homed at
night. (our peaks are at night and we get cheap bandwidth from
another ISP who has their peak in the day). Internic has refused to
give us our own block of addresses which would have made thins much
simpler. (despite repeated attempts, and we now have 7 /24's from
various places and badly need more).

Basicaly we (AS6180) have been announcing all our adresses including
196.25.116.0, 196.25.117.0 and 196.25.203.0 via AS 3741 at night,
and this gets switched of during the day. The problem seems to relate
to the fact that these addresses are part of AS 5713's CIDR
196.25.0.0/16 and we thought we could get away with announcing them
like this. It did work fine for over a week, but now today we have a
problem.

MAE-East (see example at end of message) , the Sprintnap and who
knows who else still have entries for these via AS3741 but they are
showing as received-only, and no best path. Yet via MAE West they are
fine.

We are going to stop announcing these via AS3741 at night, but its
been 9 hours now since they were not announced and these
received-only entries are still sitting there blocking these routes
from it seems about a third of the internet.

Can any one offer any comments on why they are still there, what
received-only means and how we can get if fixed?

Many thanks
Regards
Anthony Walker

MAE-East Looking Glass Results
Query: bgp
Addr: 196.25.203.0
BGP routing table entry for 196.25.203.0/24, version 5006458
Paths: (1 available, no best path, advertised over IBGP)
  1673 1239 4005 3741 6180, (received-only)
    192.41.177.141 from 192.41.177.141 (140.223.57.217)
      Origin IGP, external

We have seen upon occasion where a dampened route will not be replaced by
a route of the same specificity for about 20-45 minutes. While we didn't
care for this behaviour, I guess it was somewhat predictable.

What had us concerned was we have seen each of our three providers black
hole us on at least one occasion (sometimes more) when their backbone or
core flapped. It could be described as appearing as if their borders did
not receive the information to withdraw our advertisements. Looking Glass
showed us "history" and "received only" entries, but no active
replacements... similar to the symptoms that you are describing.
Sprint's NOC reported hearing active advertisements from our "down"
providerm but I heard this second-hand. Killing our BGP session with the
offending provider after the problem started would not fix the problem,
our routes continued to be preferred to someone we no longer had a bgp
session with. This situation would usually fix itself within 45 minutes.

While this doesn't happen often, it has the affect that having two other
providers is useless if the third is erronously advertising us. Is there
a failure scenerio where borders will continue to advertise AS's due to
loss of connectivity with their interior routers?

Bil Herd
InterActive

Anthony Walker said that the following was seen while the BGP session
between AS6180 and AS3741 was down:

MAE-East Looking Glass Results
Query: bgp
Addr: 196.25.203.0
BGP routing table entry for 196.25.203.0/24, version 5006458
Paths: (1 available, no best path, advertised over IBGP)
  1673 1239 4005 3741 6180, (received-only)
    192.41.177.141 from 192.41.177.141 (140.223.57.217)
      Origin IGP, external

AFAIK, "received-only" means that DIGEX heard that route from AS1673
but did not install it, presumably because it failed to pass a filter.

The question is, why did that route not get withdrawn properly when the
BGP session between AS6189 and AS3741 was terminated? This looks very
much like the cisco bug that bit Sprint during the AS7007 incident a month
or so ago.

Hey, Ravi, I thought you fixed that bug? Perhaps AS4005 is not running
the fixed code. Perhaps there's another bug.

--apb (Alan Barrett)