Help with bad announcement from UUnet

Hi,

  The easiest would be to contact the remote site directly. As you're
having issues reaching them, they'll also have issues reaching you. If
it's enough of an issue for them they should be happy to assist in
fixing it.

  They should not only be aware of planned maintance, but also of
other issues that may be affecting them.

  Having a support model in which anyone can call any NOC about a
problem they're having does not scale very well.

- marcel

How about a model where any large (multiple OC12s) CUSTOMER can call a NOC
about a problem they're having???

What would work better/faster?

my-noc -> b0rken-noc

or

my-noc -> my-upstream-noc -> b0rken-noc-upstream-noc -> b0rken-noc

?

Work better for who? For you? Sure. For a any provider that needs to
provide quality services to its customers and follow processes to do so,
not a chance. The Big Picture is key here.

andy

What would work better/faster?
my-noc -> b0rken-noc
or
my-noc -> my-upstream-noc -> b0rken-noc-upstream-noc -> b0rken-noc

for dinner this evening, would you prefer to walk or take a taxi to
a closed restaurant?

and there are scaling issues as well.

randy

In a message written on Fri, Mar 29, 2002 at 08:11:14AM -0600, Andy Walden wrote:

> What would work better/faster?
>
> my-noc -> b0rken-noc
>
> or
>
> my-noc -> my-upstream-noc -> b0rken-noc-upstream-noc -> b0rken-noc

Work better for who? For you? Sure. For a any provider that needs to
provide quality services to its customers and follow processes to do so,
not a chance. The Big Picture is key here.

Note that in both cases, b0rken-noc takes a single call, so their
load is unchanged. The second case adds a call to both my-upstream-noc,
and b0rken-noc-upstream-noc.

It would seem going direct would put a lower load on NOC's in general,
which presumably would let them spend more time on problems and provide
better service.

Where is the limit though? Once I open things up to non customers, and let
any random person call me, without any sort of filters or controls, what
keeps my best guys from troubleshooting someone's mistyped SMTP server in
their mail client? Processes are put in place to scale and when they are
disregarded, things generally end up worse in the long run.

andy

Hi

Note that in both cases, b0rken-noc takes a single call, so their
load is unchanged. The second case adds a call to both my-upstream-noc,
and b0rken-noc-upstream-noc.

It would seem going direct would put a lower load on NOC's in general,
which presumably would let them spend more time on problems and provide
better service.

surely a noc's first responsability is to direct customers? even if the
other network experiencing the problem may affect said customer, the
service is not just about connectivity, but also about trying to deal with
calls in the best possible manner. if more time were spent on
non-customers, a paying customer would end up losing out on that warm
fuzzy feeling when his call is answered promptly, the person he speaks to
actually listens, and his general experience interacting with the noc is
something he doesn't walk away from feeling cheated.

Regards

--Rob

The difference being that if the call comes from b0rken-noc-upstream-noc,
the guys at b0rken-noc have at least a snowball's chance of knowing the
person calling and whether they have any kloo.

If our NOC calls one of our upstreams and says "hey, ASnnn is sending you
bogons that you're forwarding to us", they tend to listen, and call the
guys at ASnnn and tell them to cut it out. (Yes Leo, you know most of our
NOC monkeys, so you know what the chances are they're right about something :wink:
On the other hand, if we call ASnnn directly, they have no way of knowing if
we're us, or if we're some bunch that thinks it makes sense to hang an AS
off a residential ADSL line...

Now, if we had a PGP-ish "web of clue", it would be different....

So apply filters and controls.

Your NOC: Are you one of our customers or peers, sir/madam?
Caller : Uhm, no ...
Your NOC: Ah, I see. Then could I please have your registry handle for
          route maintenance, and which registry that belongs to?

Accept peers, customers, or anyone who has clue sufficient to have a
registry handle, preferably one listed as a maintainer for one end of
whichever path they're complaining about.

Verification probably isn't needed, as by that point in the
conversation, the people who can't even configure their mail clients
will have been weeded out.

Clue filters. Gotta love 'em.

I felt justified in calling UUnet. I know the conversation had
morphed by the time you made the above comment. However in my case
UUnet was propagating an announcement that was stepping on one of
ours; the owner of the netblock was there to say that he did not want
that announcement being made; the UUnet customer making the
announcement (who I would rather have dealt with) was apparently
operating without a crew. Here was a conversation between directly
affected parties. It came down to who was bothering who: was it UUnet
bothering me by announcing my route, or was it me bothering them by
asking them to stop?

The model of "I won't talk to anybody who isn't my customer" is
probably almost always right, but it does not work for every single
situation. With that stand, you wouldn't have an abuse@ contact.
Sometimes your actions directly affect somebody and you should be
willing to deal with the consequences of that.

While their initial reaction in my case was "I can't talk to you,"
they did indeed reconsider and help out. Thanks again. It happened
pretty much at the instant I asked for help here, which is the usual
sort of kharma..

BTW as I mentioned when I contacted Genuity, they advised me to contact
UUnet directly. So by inference at least one large carrier (Genuity)
seems to feel that contacting them directly is appropriate.

-mm-

my once-per-year posting average is really blown now..

I believe this is the problem. Providers can't expected to have it
both ways.

If you are a customer of provider A, and the problem is inside providers
B network what is the appropriate method to get provider B to fix the
problem?

   1. Call provider A. Open a trouble ticket. Provider A forwards
      the ticket through the chain of providers to Provider B. Provider
      B accepts the trouble ticket. B find the problem in their network
      and fixes it, closing the trouble ticket back to A.

   2. Call provider A. Provider A says its not a problem with A's
      network and closes the ticket. A tells customer, call Provider B.
      User looks up Provider B's contact information. User calls Provider
      B and is told, we don't take calls from non-customers, call Provider
      A. Rinse and Repeat.

   3. Call lawyer. Sue Provider A and B for tortious interference with
      the user's peaceful enjoyment of the Internet by negligently and/or
      fraudently propagating false routing information and failing to
      correct the problem after being notified by the user.

I think method 1 is the best way to handle the situation. Unfortunately,
most of the time method 2 is what happens. Eventually, someone will
try method 3, and I don't want to be around when that happens.

I've worked at 2 ISP's in the last 3 years, both have been very good about
support to their customers and non-customers.
The ideologies were different but basically boiled down to:
  "If there's a problem on our network, let's just fix it."
Who REALLY wants to have a broken network?

It's unfortunate that monsters like UUnet have such policies in place that
allow them (internally) to have broken networks and not do anything about
it.

If was going to pay UUnet prices for internet access, I'd want them to
fix a problem in their network regardless of the reporter.
In the above case the problem could have effected me also. Why should I
have to wait x periods of time before I realise it's broken, and then
another x periods of time before they get it fixed?

/me seems to remember the UDP put on UUnet quite a while ago, but it got
some action.
Maybe when number 3 happens, it'll wake some of these unclued people up?

When I take part in the management of a network, I like to know that I'm
doing everything I can to make sure the network is all it can be.
I can't count the sleepless times I've had, staying up to fix things just
so the customers won't notice, just so the internet could be a nicer
place.
It's sad not everyone shares the same ideologies.

I'd like to add the obvious 4th solution to your question:
    4) Providers accept complaints about their network regardless of the
       source and just fix them.

We are not talking about SMTP here, but about someone bogusly announcing
routes. I agree with you that your noc is not helpdesk for anyone but if
your noc announces bogus routes (which should originate from my AS) I
think I have every right to contact your noc and try to solve things.
Afterall, it is you doing somewthing wrong which affects my network.

To answer Randy's remark about scaling: this scales very well; the number
of AS-es are limited.

On the other side, I know how annoying it is if other people's customers
call you about their b0rked up Windows RAS configuration. That should not
happen, I am talking about professional noc-to-noc contact here which imho
should not be to bureaucratic.

What would work better/faster?

my-noc -> b0rken-noc

or

my-noc -> my-upstream-noc -> b0rken-noc-upstream-noc -> b0rken-noc

?

OK, rant time (blame the easter long weekend... a 4 day weekend down
here... and associated excessive alcohol)...

General comment: the below isn't meant to reflect badly on any of our
past or present providers or peers... and in the most part problems
mentioned relate to previous suppliers so please don't try to guess
who they could be about :slight_smile:

Becomes much more relevant when you're not in America. Often a company
in, say, Singapore or New Zealand may manage an Australian company's
connection to the US internet. And then said Australian company may
have a problem connecting through the US internet to, say, China via
Japan (which the company I work at doesn't do anymore - one of our
providers now has connectivity via Singapore to China which is much
better, but that still isn't the case for many in Australia).

You want to think how many NOCs and language barriers there can be in
that path? And peering relationships, timezone changes (harder to get
good engineers sometimes, and 24 hour NOCs aren't common in many
countries), etc?

Or, we can directly contact a NOC in southern China and get resolution
as well as having a very satisfied customer because all his other
upstreams attempted and failed the NOC to upstream NOC through a massive
number of NOCs who couldn't resolve the issue. The problem is when you
take this approach you have to be very sure of which AS is causing the
real problem (and/or what the real problem is - calling your upstream's
upstream and telling them to tune their tx-ring-limits is another
example,
where your direct upstream at the time may not have heard of such a
thing to know to relay the fault in a way the remote NOC would work out
what the problem was and how to fix it. of course the better thing to
tell the provider in question should have been "don't try and put that
many OC3 cards in a 7206!").

Admittedly the escalation in the southern China case (which wasn't
our standard problem with providers in China turning on routers which
make classful assumptions, and us having some 61.* IP space) was:

customer's customers -> customer's NOC -> our NOC ->
   problem site's upstream's NOC (who liased with problem site,
   and fortunately spoke english - the problem site didn't, but
   if it had been an issue, our customer's NOC had offered to
   translate)

but that cut out a _lot_ of NOCs. To me there's some maximum number
of NOCs to be involved in a problem to coordinate well, and it is
around 4 (end ISP NOC, their upstream NOC to confirm the problem,
problem site's upstream NOC to enforce fixing of problem, problem
site's actual NOC), which then becomes 3 in the case where the
problem network is someone like sprint, at&t or uunet who we
wouldn't consider to actually have an "upstream" (and for the
record in the cases I've had to, I haven't had a problem dealing
those three directly even though we're not a customer; maybe I've
just been lucky).

An Australian company who is being directly affected by a problem may
keep good staff on until the right time to contact a US or other
international NOC directly during their working hours and get decent
staff, rather waiting on all the various NOCs to miscommunicate the
problem across various hops.

Another problem is "follow the clock" NOCs and trying to call at the
right time to get a US operator, since operators in the UK or Singapore
in a certain ISP had pretty much no access to their routers and could
do nothing more than email the US staff and hope to get some resolution
12 hours later... the country that took the call owned the problem, but
had to pass it off internally, then wait till that country was active
again to call the customer back, repeat that a few times to convince
them of the actual fault. Glad I don't deal with that particular
company
anymore :slight_smile:

I haven't had a problem from large US providers in providing me a
trouble
ticket even though we're far removed from being a customer. And we've
found the "trouble" has been things as lame as a certain large US
provider
putting a /32 static backhole in one of their routers, and following the
"correct" escalation path NOC to NOC in one case (since it was minor and
worked around) did nothing for a week, a direct email (in that case,
calls are for more urgent issues :-)) to their US NOC and the problem
was fixed within an hour.

The only group in the US I've found hard to deal with in any way
internet
operationally related was a bad experience and waste of international
calls to NetSol/VeriSign, they had no intent to deal with a _customer_
in a timely manner over an urgent change (domain change for a company
who
had just gone into liquidation and were about to lose the routability of
their IP space in 48 hours, and NetSol's systems weren't accepting IP
changes for the nameservers due to what turned out to be design problems
in their database application - the permission to change info update in
some cases needs 24+ hours to propogate internally before you can make
changes under the new permissions... ugly).

David.

I think the usual method is to find someone who IS a customer of provider
B's network -- ie whoever it is you can't access -- and have them complain.

Or

    2a. Call provider A. Provider A says its not a problem with A's
        network and closes the ticket. A tells customer, call Provider B.
        User looks up Provider B's contact information. User calls Provider
        B and Provider B opens a trouble ticket. B finds the problem in
        their network and fixes it, closing the trouble ticket back to the
        user.

    2b. Call provider A. Provider A says its not a problem with A's
        network and closes the ticket. A tells customer, call Provider B.
        User looks up Provider B's contact information. User calls Provider
        B and is told, we don't take calls from non-customers, call Provider
        A. User replaces Provider A with a more responsive provider and moves
        back to option 1.

I like option 2b better than option 3. Both 2b and 3 will take longer than you want, but 2b is likely to be faster than 3. 2a isn't my favorite path, but if it gets the problem fixed, I can live with it.

    -Jeff