PlayStationNetwork blocking of CGNAT public addresses

Simon_Lockhart1 · September 16, 2016, 1:12pm

All,

We operate an access network with several hundred thousand users. Increasingly
we're putting the users behind CGNAT in order to continue to give them an IPv4
service (we're all dual-stack, so they all get public IPv6 too). Due to the
demographic of our users, many of them are gamers.

We're hitting a problem with PlayStationNetwork 'randomly' blocking some of our
CGNAT outside addresses, because they claim to have received anomalous, or
'attack' traffic from that IP. This obviously causes problems for the other
legitimate users who end up behind the same public IPv4 address.

Despite numerous attempts to engage with PSN, they are unwilling to give us
any additional information which would allow us to identify the 'rogue' users
on our network, or to identify the 'unwanted' traffic so that we could either
block it, or use it to identify the rogue users ourselves.

Has anyone else come up against the problem, and/or have any suggestions on
how best to resolve it?

Many thanks in advance,

Simon

Mike_Hammett · September 16, 2016, 1:28pm

A network that doesn't support IPv6, yet discriminates against CGNAT? That seems like a promising future.

Dobbins_Roland · September 16, 2016, 1:32pm

I'm pretty sure that at least part of it has to do with DDoS-related activity. The best bet is to try and identify and engage with the relevant operational personnel with clue. Going the customer-service route isn't fruitful, as you indicate.

Another aspect is ensuring that one has the ability to detect, classify, traceback, and mitigate outbound badness southbound of the CGN.

This sort of thing has always been a problem with NAT; as CGN becomes more prevalent on wireline broadband networks, it's only going to get worse.

AFAIK, PSN doesn't support IPv6. That would be another topic of discussion with the operational folks.

Simon_Lockhart1 · September 16, 2016, 1:38pm

Unless PSN can tell us what traffic they consider bad, how can we detect and
classify it? We certainly have the ability to traceback and mitigate, once we
know what we're looking for.

My understanding of the issue is that there are infected PCs on our network,
which are being used as part of a distributed attack, but at the application
layer, rather than network layer - distributed password brute-force, or
similar. Unless we know what to look for, it's hard to detect and stop it.

Simon

michalis.bersimis · September 16, 2016, 1:41pm

Another aspect, for those users that need to go the PSN network but experience issues via the CGNAT, an opt-out solution (giving them public IPv4) may should mitigate the problem, that PSN network does not support IPv6.

After all what percentage of your total subscribers that uses PSN and are gamers 2-3% ? Which might be relatively small amount to give public IPv4.

Michalis

Alan_Buxey · September 16, 2016, 1:49pm

Hi,

as others have said, need to engage with one of their other units to get this sorted
out - as a network provider, their customers are relying on YOU to access their service, PSN should
care.

technically, you could start looking at netflows to the PSN and see if anyone is engaged in DDoS
via that route...and , if you offer IPv6 native service to end users, ask PSN when they are going to
be offer an IPv6 service to their users - so this CGNAT stuff can go

alan

Dobbins_Roland · September 16, 2016, 1:49pm

It's not just application-layer stuff - they're subject to all sorts of attacks. Screening out the obvious stuff would certainly help.

The main issue is a dearth of engagement of clueful folks in the global operational community. Some gaming-oriented networks are well-represented; others are not, sadly.

Ca_By · September 16, 2016, 4:02pm

Here is a picture of what you are experiencing

http://test-ipv6.com/faq_avoids_ipv6.html

Sometimes people need pictures to understand why IPv6 is important

Tony_Wicks · September 16, 2016, 9:40pm

So the pain has finally flowed down to other parts of the world. (APNIC ran
out of IP's a long time ago, so CGN has been in use here for a lot longer)
This issue is one I have been dealing with for the last four years. Only
with Sony, no other company has caused such a headache in regard to CGNAT. I
will not go into the long and painful saga of dealing with the constant
issue of Sony putting blocks on random pool addresses, refusing to supply
sufficient information to identify rouge users (timestamp, source IP,
destination IP and port) then telling our customers it is a problem at the
ISP end, but... Something happened about three months ago that Proves that
if the Sony technical people want to get off their asses they are perfectly
capable of supplying adequate information to identify a rogue user for the
ISP to deal with. One of the local Sony PSN helpline managers actually
managed to convince one of their technical people to supply a spreadsheet
that magically contained sufficient information to allow us to identify a
couple of users that did indeed have multiple infections. Great I thought,
now if we can just get them to automate/regularly sent this info we will
have a way forward. Alas, it appears it was a one off and we are back to the
start. I will quote below what the Sony Network guy said when explaining why
they can't send detailed information every time -

" From: SNEI-NOC-Abuse [mailto:SNEI-NOC-Abuse@am.sony.com]

Masataka_Ohta · September 17, 2016, 12:54am

Simon Lockhart wrote:

Has anyone else come up against the problem, and/or have any suggestions on
how best to resolve it?

The best solution is to have a common practice on a set of public
port numbers assigned to a host behind NAT.

For example, with a practice that, if a port in a range between N*8
and N*8+7 is assigned to a host, other ports in the range is not
assigned to other hosts, service providers can block packets
based on IP addresses and ranges, especially if correspondence between
hosts and ranges are rather stable.

But, it may be too late to make such practice common, I'm afraid.

Or, wait for a while until service providers receive enough amount
of feedback from innocent users. To accelerate it, you can make correspondence between hosts and public addresses not so stable,
which makes almost all your IP addresses marked bad quickly,
which may make you loss some customer, unless other ISPs also do so.

Masataka Ohta

Tom_Smyth · September 18, 2016, 12:30pm

Hi Simon,

as other responders have said it is an inherent issue with NAT in general,
on workaround is to limit the ratio of actual users to an external IPv4
address, the other thing we have seen from our Abuse contact emails from
PSN, is that malicious activity towards the PSN is often accompanied by
other malicious activities such as SSH brute force outbound and spaming...

I would suggest that

1) limit the ratio of users to an external ipv4 address as much as possible
(which would reduce the impact of one compromised customer bringing down
play time for other clients behind the same nat

2)do some "canary in the mine" monitoring for obviously malicious traffic
(loads of SMTP traffic outbound) and lots of connection requests to SSH
servers ... if you see that traffic from behind your CGNAT device .. just
temporarily block the internal ip of the user until they clean up their
devices.

this is the pain with NAT you have to do extra work in order prevent
infected users interrupting internet connectivity for other innocent
users...
I think you can use simple firewall rules on your edge router to identify
multiple connections to SMTP and SSH in a short period of time..

If you do the minimum to detect that abuse then you cant be accused of
invading peoples privacy... (bear in mind obvious false positives)
(Monitoring systems etc) ...

Hope this helps,

Rich_Kulawiec · September 18, 2016, 1:07pm

Seconded. This is something I've recommended for years (decades, I suppose
by now). Simple measurements of what's "normal" for your operation in
terms of connection rates, types, etc., are easy to make. That in turn
enables measurements of what's abnormal and that in turn enables manual
or automatic actions. For example: if the average number of outbound
SSH connections established per hour per host across all hosts behind CGNAT
is 3.2, and you see a host making 1100/hour: that's a problem. It might be
someone who botched a Perl script; or it might be a botted host trying
to brute-force its way into something.

These kinds of measurements are relatively easy to make and don't require
invading user privacy. They won't catch everything, of course, but they're
not intended to. They may catch enough to solve the problem in front of
you at the moment *and*, if they do that, they may reduce the scope/scale
of the rest of the problems to make them more tractable via other techniques.

---rsk

Tom_Beecher · September 18, 2016, 1:15pm

This is, as many things are, a huge problem in communication.

Sony tells ISP 'Hey, you have customers abusing us. Fix it!'.
ISP says 'Oh crap, sorry, what's going on? We'll run it down.'
Sony says nothing.

Let's just stop here for a second. This is fundamentally no different then
the 'I have a problem, it's the network! complaints we've all dealt with
forever. You spend days/weeks/months working on it. Maybe you ultimately
find a goofy switchport, or maybe you discover that the server HDDs were
crapping the bed and the problem server was chugging because of that. But
you had to spend tons of time working on it because you couldn't get the
info you need because the reporter was CONVINCED they KNEW what it was.

Why should Simon have to spend hours of engineering time fishing through
traffic captures and logs when he doesn't even know what he's LOOKING for?
What does PSN consider 'abuse' here?

Does Simon have customers infected with botnets that are targeting PSN at
times? Or does PSN assume nobody will ever have more than a couple
Playstations in a house, so if they see more than N connections to PSN from
the same IP, it's malicious, since CGN is likely not something they
considered? ( If anyone wants to place beer wagers, I'm picking the later. )

I spend about 8 weeks this year going back and forth with a Very Large
Website Network who had blocked a /17 of IP space from accessing ANY of
their sites because of 'malicious traffic' from a specific /23. 5 of those
weeks, their responses consisted of 'it's malicious, you go find it, should
be obvious', 'you clearly don't know what you're doing, we're wasting our
time', etc. Week 5, I was able to extract that it was a specific web
crawler that they said was knocking their databases over. After a
conversation with their CIO the following week, they came back and admitted
that a junior system admin made some PHP changes on a bunch of servers that
he didn't think was in production,and when we crawled THOSE servers, Bad
Things Happened for them. We were doing nothing wrong ; they just refused
to look, and found it easier to blame us.

Simon's getting screwed because he's not being given any information to try
and solve the problem, and because his customers are likely blaming him
because he's their ISP.

Sony needs to stand up and work with him here.

Mike_Hammett · September 18, 2016, 1:19pm

People love to hate incumbent telcos because of their arrogance (and frankly it's deserved), but people forget that big content can be just as arrogant and just as deserving of hatred.

Florian_Weimer · September 18, 2016, 1:56pm

* Rich Kulawiec:

For example: if the average number of outbound SSH connections
established per hour per host across all hosts behind CGNAT is 3.2,
and you see a host making 1100/hour: that's a problem. It might be
someone who botched a Perl script; or it might be a botted host
trying to brute-force its way into something.

If you do this, you break Github.

(If I guess Simon's network correctly, then I've seen reports which
suggest that they might already be doing this.)

Florian_Weimer · September 18, 2016, 1:58pm

* Tom Beecher:

Simon's getting screwed because he's not being given any information to try
and solve the problem, and because his customers are likely blaming him
because he's their ISP.

We don't know that for sure. Another potential issue is that the ISP
just cannot afford to notify its compromised customers, even if they
were able to detect them.

Simon_Lockhart1 · September 18, 2016, 2:06pm

I'd like to think that we're pretty responsive to taking our users offline
when they're compromised and we're made aware of it - either through our own
tools, or through 3rd party notifications.

The process with Sony goes something like:

- User reports they can't reach PSN
- We report the Sony/PSN, they say "Yes, it's blocked because that IP attacked
  us"
- We say "Okay, that's a CGNAT public IP, can you help us identify the which
  inside user that is - (timestamp,ip,port) logs, or some way to identify the
  bad traffic so we can look for it ourselves"
- Sony say no, either through silence, or explicitly.
- We have unhappy user(s), who blame us.

Simon

Tom_Beecher · September 18, 2016, 3:07pm

An email to a user notifying them they're likely compromised costs
basically nothing. An email to their entire subscriber base also costs
nothing. If you find me an ISP that can't afford to notify users, I'll show
you one that shouldn't be in business anyways.

There's this presumption of guilt here, that Sony is right, and Simon's
subscribers are doing something malicious, yet they won't provide any
evidence of that. Even if they didn't know what it was, come back with
'We're seeing weird bursts of [traffic characteristics] aimed at PSN during
these times. We're not quite sure what it is, but it's causing [problem
X].' It would still be a question of maliciousness or not, but it would be
something to work with. Providing nothing just perpetuates this finger
pointing game, and nothing gets solved.

Florian_Weimer · September 18, 2016, 3:13pm

* Tom Beecher:

An email to a user notifying them they're likely compromised costs
basically nothing.

If this increases the probability that the customer contacts customer
support, in some markets, there is a risk that the account will never
turn profitable during the current contract period. (Granted, my
information may be woefully out of date, but my impression is that
price-based competition is still pretty much cut-throat over here.)

If you find me an ISP that can't afford to notify users, I'll show
you one that shouldn't be in business anyways.

I'm not blaming the ISP. (I may have done so in the past.) If we end
up in such a situation, it's hardly the fault of one single ISP.

There's this presumption of guilt here, that Sony is right, and Simon's
subscribers are doing something malicious, yet they won't provide any
evidence of that. Even if they didn't know what it was, come back with
'We're seeing weird bursts of [traffic characteristics] aimed at PSN during
these times. We're not quite sure what it is, but it's causing [problem
X].' It would still be a question of maliciousness or not, but it would be
something to work with. Providing nothing just perpetuates this finger
pointing game, and nothing gets solved.

Yes, indeed. Resolving most networking problems needs cooperation,
because at the most basic level, the Internet is still about
connecting otherwise unrelated networks.

Florian_Weimer · September 18, 2016, 3:17pm

* Simon Lockhart:

* Tom Beecher:
> Simon's getting screwed because he's not being given any information to try
> and solve the problem, and because his customers are likely blaming him
> because he's their ISP.

We don't know that for sure. Another potential issue is that the ISP
just cannot afford to notify its compromised customers, even if they
were able to detect them.

I'd like to think that we're pretty responsive to taking our users offline
when they're compromised and we're made aware of it - either through our own
tools, or through 3rd party notifications.

Okay, then perhaps my guess of the ISP involved is wrong.

The process with Sony goes something like:

- User reports they can't reach PSN
- We report the Sony/PSN, they say "Yes, it's blocked because that IP attacked
  us"
- We say "Okay, that's a CGNAT public IP, can you help us identify the which
  inside user that is - (timestamp,ip,port) logs, or some way to identify the
  bad traffic so we can look for it ourselves"
- Sony say no, either through silence, or explicitly.
- We have unhappy user(s), who blame us.

Yes, that's not very constructive.

Out of curiosity, how common is end-to-end reporting of
source/destination port information (in addition to source IP
addresses and destination IP addresses)? Have the anti-abuse
mechanisms finalyl caught on with CGNAT, or is it possible that the
PSN operator themselves do not have such detailed data?