BCP Question: Handling trouble reports from non-customers

Owen_DeLong · September 1, 2006, 4:26pm

I think my previous post may have touched on a more global issue.

Given the number of such posts I have seen over time, and, my experiences trying to
report problems to other ISPs in the past, it seems to me that a high percentage of
ISPs, especially the larger ones, simply don't allow for the possibility of a non-customer
needing to report a problem with the ability to reach one of their customers.

I'm curious how people feel about this. As I see it, there are a number of possible
responses:

1. Don't help the person at all. Tell them to contact the customer they are
  trying to reach and have the customer report the problem. This seems,
  by far, to be the most popular approach in my experience, but, it makes
  for a very frustrating experience to the person reporting the problem.

2. Accept any trouble report and attempt to resolve it or determine that it
is outside of your network. This approach is the least frustrating to the
end user, but, probably creates a resource allocation and cost problem.

3. Have a procedure for triage which allows a quick determination if the
  problem appears to be within your network. Using that procedure,
  reject problems which appear to be outside of your network while
  accepting problems that appear to be within your network.

It seems to me that option 3 probably poses the best cost/benefit tradeoff,
but, it is the approach least taken from my observations. So, I figured
I'd try and start a discussion on the topic and see what people thought.

Feel free to comment on list or directly to me (I'll summarize), but, if you
want to tell me I'm off-topic or whatever, please complain directly to me
without bothering the rest of the people on the list. I believe that this
is an operational issue within scope of Nanog, but, I can see the
argument that it's a business practices question instead.

Owen

Mike_Tancsa · September 1, 2006, 5:15pm

I think its more of an issue of being able to get through to the right people as opposed to customers or non customers reporting problems. We had an issue with one large ILEC here in Canada recently (but similar problems in the past with others) where they did some upgrades to their radius servers that busted non PAP logins. Some of our older VPN devices used scripted logins so these all broke. We only were "regular" customers, so we tried our best to work through the front line tech support. Basically we got stuff like "we dont support UNIX. You need to call UNIX for help" "we dont have terminal servers", "there is nothing wrong with R-A-Y-D-E-E-U-S or even the circumference", and other crap that was an obvious 'jettison customer' leaf in the decision tree. It was an incredibly frustrating situation for 3 days despite asking to escalate etc. Ultimately, we discovered the issue had security issues, so we used that as a pretext to use a net-sec contact to pass on the info and it was acted on almost right away.

In general, the dilemma seems to be this-- customer calls up saying stuff that makes no sense to the front line tech. Does front line tech pass each and every, "the customer is saying our ION-Dilithium deflector array is misalligned and needs to be refilled with dark neutrino particles" and "You have a bogus next hop route in your IGP"... Pass it up the food chain ? Or just dismiss it. The answer seems to be, "if there is a bogus next hop issue, our second line will catch it on their own" so dont bother second line if you cant figure it out. Whether its a good business decision or not, dont know but that seems to be the popular thing to do in my experience.

---Mike

Steve_Gibbard · September 1, 2006, 10:48pm

I think my previous post may have touched on a more global issue.

Given the number of such posts I have seen over time, and, my experiences trying to report problems to other ISPs in the past, it seems to me that a high percentage of ISPs, especially the larger ones, simply don't allow for the possibility of a non-customer needing to report a problem with the ability to reach one of their customers.

Anybody trying to put together such a BCP should first give some consideration to what sorts of calls from non-customers a service provider should be expected to accept. Based solely on Owen's earlier post, this looks to me like a good example of why service providers are sometimes reluctant to accept trouble reports from non-customers.

From DNS and whois, it looks like the IP address Owen sited earlier is an

individual DSL customer. The equipment Owen says is dropping packets looks like DSL concentration equipment in a local POP. Owen says he's having trouble reaching the address from multiple locations. If we look at this from the service provider's perspective, we see some random person calling up to complain that somebody else, whose phone number the caller doesn't know, is having trouble with their DSL service. That's probably not a call they get a lot of, and it probably seems pretty strange. If that DSL customer were really having problems getting anywhere, wouldn't they call it in themselves? If there were a problem with the whole POP, the random outside person calling in would be more believable, but the people in the call center would probably have their hands full dealing with actual customers.

There are DSL customers who use their DSL circuits to host actual services that others might want to access, or IP phones that somebody might want to call. There are big hosting companies who specialize in making content available to lots and lots of end users, who still don't like taking calls from non-customers. In those cases, it's at least obvious why a non-customer might call and complain, but there are scaling issues because of which somebody might not want to accept such a call. The cost of passing bits through to somebody with thousands or millions of customers may be significantly less than the cost of taking phone calls from the customers' customers. Transit providers therefore tend to expect such organizations to handle their own customer support, and to call the transit provider themselves if there's a problem. That way the transit provider knows who they're dealing with, and only has to explain things once.

This isn't to say there aren't valid reasons for network operators to contact other network operators they don't have relationships with. Packet loss affecting only the providers' customers may not count, but a call saying "hi, your customer, who isn't answering their phone, is sourcing unauthorized routes to my address space" probably should be taken seriously. Of course, the challenge there is determining that the person calling *is* authorized to tell you not to announce the space. Same for customers sourcing attacks, and the like.

Some questions you might want to consider would be:

What sorts of problems should a non-customer legitimately be reporting?

Which non-customers should be reporting such things? Affected individuals? Other network operators (as defined by who?)? CERTs? Law enforcement?

What channels should be used for such contacts? Phone? E-mail? INOC-DBA? Where should the contact information be published and who should have access to it?

How should identity of callers be determined?

Also, note that lots of solutions to many of these problems have already been tried at various times with varying degrees of success, and that some of them are working fairly well. You'd probably do better to build on existing systems and practices than to start from scratch.

-Steve

Joe_Abley1 · September 2, 2006, 1:38am

A long time ago, I was a backbone engineer at 6461. There was one particular 6461 customer who ran online games, and whose customers were encouraged to submit noc tickets to 6461 every time they had an issue with network performance.

This resulted in a lot of tickets. Gamers being their naturally twitchy selves, though, there were lots of times when we got really early notice of problems that monitoring hadn't picked up and which weren't reported by anybody else until much later (if at all).

So, there is *some* benefit in accepting tickets from non-customers and churning them through the support process, even if it's not especially cheap to do.

Joe

Per_Gregers_Bilse1 · September 2, 2006, 2:56am

You're absolutely right, but your struggle is uphill. Some considerable
time ago my "XO" (James Aldridge) had a big hand in RFC2142, but in spite
of it being Standards Track and otherwise receiving universal approval,
real uptake was patchy. In fact, in spite of most peering contracts (which
started to emerge at the time) being very specific about listing 24*7
problem resolution contact information, any issues beyond the truly banal
required one to resort to private, carefully maintained lists of names
and telephone numbers, many of which were gleaned from business cards
(just about the only useful thing to come out of Finance & Administration)
exchanged at NANOG meetings.

Has anything changed since then? Probably not ... Vive le NANOGue!

Probably, in fact, increasingly dense interconnectivity between
especially upper level providers has outright masked the absence of
out-of-band communication, and a truly catastrophic routing problem
could well separate the Net. If a really huge problem were to occur
these days, could you expect to be able to email somebody about it?

Probably not, in fact. Maybe RFC2142 should be revived and turned
into something much more extensive and formal?

-- Per

Sean_Donelan · September 2, 2006, 5:46am

I think you omitted at least one other option.

Contact your own ISP, i.e. the provider you pay, and report the problem. You make the choice how much support you want to pay for when you select your provider, including what type of inter-provider contacts they maintain. Your own provider can confirm who you are, knows your history
about reporting problems, perform preliminary diagnostics and sectionalization to confirm a problem exists, maintain contacts to the next provider in the chain, etc. Other ISPs are more likely to recognize
the reputation of an ISP they maintain a relationship than random callers.

If you want high touch support, your provider will probably charge you a high price. Other people may be prefer a low price and are satisfied with
that service, as apparent by the other customer not opening a trouble ticket with their ISP. The Internet does not have uniform service levels, nor uniform pricing for any level of service.

This method seems to work better in other industries with customer contacts, e.g. you call your credit card company about problems not
the merchant's credit card company, you call your shipping company
about problems not the transit carriers a package may have traversed,
you call your telephone company about problems dialing another telephone number not other phone companies, and so forth.