tracking down subnet announcement disappearing / dropping to a single provider

I apologize if this is off-topic - I rarely post, mostly just monitor. I am not posting this to [nsp] or [isp-bgp] for I don’t believe the following is a configuration problem on my side, but an operational one of a certain provider. I am not going to sling dirt or put actual names / ASN, etc for now. I want to list the problem and if it is relevent I can post more specific data or be contacted offlist with suggestions or from the provider.

The problem is completely intermittent. 99% of the time everything works fine. I have begged the customer to get their ASP’s ISP involved since I believe that is where the problem lies. They claim it is my problem since we connect to them…(Don’t want to start a war here…I know)

I am Provider A and my customer cannot connect to an ASP through Provider X. The problem lasts for what seems to be exactly 10, 15 or 30 minute intervals. I am announcing my /19 broken up into multiple /22 and /21’s. It is only 1 of the /22 that has the problem. One of my links is not going down and I do back backup routes through multiple providers. It took me almost a month to confirm the problem because they would call be after it happened and everything would look fine. I put some monitors on the destination site and it will never test bad since the monitors were on other subnets. They finally called me while it was occuring and I did confirm I could not reach the site from any of the IP’s in that certain /22 subnet.

So, my point and questions:

Has anyone ever heard of such a problem ? I tried to search some archives and google - but I can’t really figure out what to search on that won’t result in 1000 + matches.

Is there any function or problem that might exist for exactly 10, 15 or 30 minutes (i.e. BGP Flap, subnet block filter / ACL ?) over and over ?

Should I contact provider X dircectly ? Will they work directly with me ? I believe my customers ASP opened a ticket with their ISP but with all the layers and short time frame of the issue I don’t think it will get resolved quickly or easily.

Responses online or offline welcome. If anyone wants more specific info posted to the forum (IP subnet blocks, Provider X, ASN, etc) please let me know.


Eric Kagan
Commrail, Inc. dba
Access Northeast

More specific information would help, but this is what I'd think.

1) Flap dampening can exist in pretty discrete blocks of time.

2) Do you announce the entire /19 as a back up in case a provider does not accept your /22 (many won't)

3) Set up a ping monitor to the service at the ASP, and a ping monitor to a few of the upstream routers between you and the ISP. Make sure they are running out of the /22. That way, if only the ASP disappears, you can tell its not your problem. If everyone disappears, its probably closer to home.

4) There are guys out there that monitor the flappiness of a route. If you check for yours and the destination address blocks, you will be able to see if the "global" internet saw any flaps.

5) If you are not advertising the /19 and you haven't nailed down the /22 to a loopback or similar interface, you could be having a physical interface that the /22 (say a connected route) is running go up/down and your own router or ISP could be withdrawing it. Its always safer to announce the larger aggregate and nail down your BGP routes.

Deepak Jain

Eric Kagan wrote: