NAT444 or ?

Valdis_Kletnieks · September 7, 2011, 10:11pm

And you store the 4 or 8 bits in what part of the IPv4 header, exactly?

Leigh_Porter · September 7, 2011, 11:08pm

Nobody uses the TOS bits, do they?

Owen_DeLong · September 8, 2011, 12:18am

From: Seth Mos [mailto:seth.mos@dds.nl]
Sent: 07 September 2011 20:26
To: NANOG
Subject: Re: NAT444 or ?

I think you have the numbers off, he started with 1000 users sharing
the same IP, since you can only do 62k sessions or so and with a
"normal" timeout on those sessions you ran into issues quickly.

The summary is that with anything less then 20 tcp sessions per user
simultaneous google maps or earth was problematic. From 15 and
downwards almost unsable.

He deducted from testing that about 10 users per IP was a more
realistic limit without taking out the entire CGN "experience".

On a personal note, this isn't even taking into question things like
broken virus scanners or other software updates that will happily try
to do 5 sessions per second, or a msn client lost trying to do 10 per
second. The most the windows IP stack will allow on client versions.

The real big issue that will be the downfall of NAT444 is the issue
with ACLS and automatic blocklists and the loss of granular access
control on that which the ISP has no control of. Which roughly
estimates to the internet.

Regards,

Seth

I was thinking of an average of around 100 sessions per user for working out how things scale to start with. It would also be handy to be able to apply sensible limits to new sessions, say limit the number of sessions to a single destination IP address and apply an overall session limit of perhaps 200 sessions per source IP address.

ACLs and blocklists are going to be a problem, perhaps, as LSN becomes more and more common, such things will gradually die out.

I think that such things will kill the NAT444 user experience rather than having the NAT444 user experience problems kill the block lists.

The people maintaining said lists are generally trying to protect larger systems from abusers and don't have any strong motivation to preserve the user experience of particular ISPs or particular subsets of users.

Considering that offices, schools etc regularly have far more than 10 users per IP, I think this limit is a little low. I've happily had around 300 per public IP address on a large WiFi network, granted these are all different kinds of users, it is just something that operational experience will have to demonstrate.

Yes, but, you are counting individual users whereas at the NAT444 level, what's really being counted is end-customer sites not individual users, so the term
"users" is a bit misleading in the context. A given end-customer site may be from 1 to 50 or more individual users.

I would love to avoid NAT444, I do not see a viable way around it at the moment. Unless the Department of Work and Pensions release their /8 that is

The best mitigation really is to get IPv6 deployed as rapidly and widely as possible. The more stuff can go native IPv6, the less depends on fragile NAT444.

Owen

gih · September 8, 2011, 5:26am

It may not be what Randy was referring to above, but as part of that program at APNIC32 I reported on the failure rate I am measuring for Teredo. I'm not sure its all in the slides I was using, but what I was trying to say was that STUN is simply terrible at reliably negotiating a NAT. I was then wondering what pixie dust CGNs were going to use that would have any impact on the ~50% connection failure rate I'm observing in Teredo. And if there is no such thing as pixie dust (damn!) I was then wondering if NATs are effectively unuseable if you want anything fancier than 1:1 TCP connections (like multi-user games, for example). After all, a 50% connection failure rate for STUN is hardly encouraging news for a CGN deployer if your basic objective is not to annoy your customers.

regards,
Geoff

Seth_Mos · September 8, 2011, 5:41am

The striking thing I picked up is that NTT considers the CGN equipment a big black hole where money goes into. Because it won't solve their problem now or in the future and it becomes effectively a piece of equipment they need to buy and then scrap "soon" after.

They acknowledge the need, but they'd rather not buy one.
That and they (the isp) get called for anything which doesn't work.

Regards,

Seth

Leigh_Porter · September 8, 2011, 8:48am

From: Owen DeLong [mailto:owen@delong.com]
Sent: 08 September 2011 01:22
To: Leigh Porter
Cc: Seth Mos; NANOG
Subject: Re: NAT444 or ?

> Considering that offices, schools etc regularly have far more than 10
users per IP, I think this limit is a little low. I've happily had
around 300 per public IP address on a large WiFi network, granted these
are all different kinds of users, it is just something that operational
experience will have to demonstrate.
>
Yes, but, you are counting individual users whereas at the NAT444
level, what's really being counted is end-customer sites not individual
users, so the term
"users" is a bit misleading in the context. A given end-customer site
may be from 1 to 50 or more individual users.

Indeed, my users are using LTE dongles mostly so I expect they will be single users. At the moment on the WiMAX network I see around 35 sessions from a WiMAX modem on average rising to about 50 at peak times. These are a combination of individual users and "home modems".

We had some older modems that had integrated NAT that was broken and locked up the modem at 200 sessions. Then some old base station software died at about 10K sessions. So we monitor these things now..

> I would love to avoid NAT444, I do not see a viable way around it at
the moment. Unless the Department of Work and Pensions release their /8
that is
>

The best mitigation really is to get IPv6 deployed as rapidly and
widely as possible. The more stuff can go native IPv6, the less depends
on fragile NAT444.

Absolutely. Even things like google maps, if that can be dumped on v6, it'll save a load of sessions from people. The sooner services such as Microsoft Update turn on v6 the better as well. I would also like the CDNs to be able to deliver content in v6 (even if the main page is v4) which again will reduce the traffic that has to traverse any NAT.

Soon, I think content providers (and providers of other services on the 'net) will roll v6 because of the performance increase as v6 will not have to traverse all this NAT and be subject to session limits, timeouts and such.

Leigh_Porter · September 8, 2011, 8:52am

From: Seth Mos [mailto:seth.mos@dds.nl]
Sent: 08 September 2011 06:43
To: NANOG
Subject: Re: NAT444 or ?

>
>
> It may not be what Randy was referring to above, but as part of that
program at APNIC32 I reported on the failure rate I am measuring for
Teredo. I'm not sure its all in the slides I was using, but what I was
trying to say was that STUN is simply terrible at reliably negotiating
a NAT. I was then wondering what pixie dust CGNs were going to use that
would have any impact on the ~50% connection failure rate I'm observing
in Teredo. And if there is no such thing as pixie dust (damn!) I was
then wondering if NATs are effectively unuseable if you want anything
fancier than 1:1 TCP connections (like multi-user games, for example).
After all, a 50% connection failure rate for STUN is hardly encouraging
news for a CGN deployer if your basic objective is not to annoy your
customers.

I have a concern about some weird and wonderful VPN solutions that people may be using. I am quite sure that some of them will just not work through NAT444, though I have no evidence of this. People have problems with some VPN solutions with single NAT (especially with no ALGs). NAT444 will just be a mess.

The striking thing I picked up is that NTT considers the CGN equipment
a big black hole where money goes into. Because it won't solve their
problem now or in the future and it becomes effectively a piece of
equipment they need to buy and then scrap "soon" after.

Well if you buy the 'right' solution then you can re-use it elsewhere. Many solutions use multi-purpose processing cards to deliver NAT functionality which can be used for other stuff such as firewalling or some other manor of traffic mangling.

They acknowledge the need, but they'd rather not buy one.
That and they (the isp) get called for anything which doesn't work.

Well at least these little problems that pop up keep people in jobs If everything just worked (tm) there would be nothing to do..

Mike_Jones1 · September 8, 2011, 11:33am

As HTTP seems to be a major factor causing a lot of short lived
connections, and several large ISPs have demonstrated that large scale
transparent HTTP proxies seem to work just fine, you could also move
the IPv4 port 80 traffic from the CGN to a transparent HTTP proxy. As
well as any benefits from caching keeping connections local it can
also combine 1000 users trying to load facebook in to a handful of
persistent connections to the facebook servers. The proxy can of
course also have its own global IPv4 address rather than going through
the NAT, I have no experience with large scale HTTP proxy deployments
but I strongly suspect a single HTTP proxy can handle traffic for a
lot more users than low hundreds currently being suggested for NAT444!
and can be scaled out separately if required.

As an end user this is probably a little worse with HTTP coming from a
different IP address to everything else, but not that much worse. As a
provider it may be much easier to scale to larger numbers of
customers. The proxy can also take IPv4-only users to a dual stacked
site over IPv6, as I am under no illusions that even with IPv6 to
every customer you will still have customers behind IPv4-only NAT
routers they bought themselves for quite a while. With some DNS tricks
this might be useful for those users reaching IPv6-only sites, however
it would probably be better if they were unable to reach those sites
at all to give them an incentive to fix their IPv6.

Ca_By · September 8, 2011, 2:02pm

> From: Owen DeLong [mailto:owen@delong.com]
> Sent: 08 September 2011 01:22
> To: Leigh Porter
> Cc: Seth Mos; NANOG
> Subject: Re: NAT444 or ?
>
> > Considering that offices, schools etc regularly have far more than 10
> users per IP, I think this limit is a little low. I've happily had
> around 300 per public IP address on a large WiFi network, granted these
> are all different kinds of users, it is just something that operational
> experience will have to demonstrate.
> >
> Yes, but, you are counting individual users whereas at the NAT444
> level, what's really being counted is end-customer sites not individual
> users, so the term
> "users" is a bit misleading in the context. A given end-customer site
> may be from 1 to 50 or more individual users.

Indeed, my users are using LTE dongles mostly so I expect they will be

single users. At the moment on the WiMAX network I see around 35 sessions
from a WiMAX modem on average rising to about 50 at peak times. These are a
combination of individual users and "home modems".

We had some older modems that had integrated NAT that was broken and

locked up the modem at 200 sessions. Then some old base station software
died at about 10K sessions. So we monitor these things now..

>
> > I would love to avoid NAT444, I do not see a viable way around it at
> the moment. Unless the Department of Work and Pensions release their /8
> that is
> >
>
> The best mitigation really is to get IPv6 deployed as rapidly and
> widely as possible. The more stuff can go native IPv6, the less depends
> on fragile NAT444.

Absolutely. Even things like google maps, if that can be dumped on v6,

it'll save a load of sessions from people. The sooner services such as
Microsoft Update turn on v6 the better as well. I would also like the CDNs
to be able to deliver content in v6 (even if the main page is v4) which
again will reduce the traffic that has to traverse any NAT.

Soon, I think content providers (and providers of other services on the

'net) will roll v6 because of the performance increase as v6 will not have
to traverse all this NAT and be subject to session limits, timeouts and
such.

What do you mean by performance increase? If performance equals latency, v4
will win for a long while still. Cgn does not add measurable latency.

Cb

Christian · September 8, 2011, 3:04pm

I wonder if the discussion as useful as it is isn't forgetting that the edge of Internet has a stake in getting this right too! This is not just an ISP problem but one where content providers and services that is the users need to get from here to there in good order.

So

What can users do to encourage ISPs to deploy v6 to them?
What can users do to ease the pain in reaching IPv4 only sites once they are on IPv6 tails?

Is there not a bit of CPE needed here? What should the CPE do? and not do? should it deprecate NAT/PAT when it receives 1918 allocation from a CGN?
and less technically but relevant I think is to ask about cost? who pays?

Christian

Lyle · September 8, 2011, 3:49pm

Can we really push an IPv6 agenda for CDN's when IPv6 routing at high backend levels is still not complete? I certainly don't have the 'clout' to push that, but full routing between Cogent and HE needs to be fixed.

Lyle Giese
LCR Computer Services, Inc.

Bandy_Rush1 · September 8, 2011, 3:54pm

Can we really push an IPv6 agenda for CDN's when IPv6 routing at high
backend levels is still not complete? I certainly don't have the
'clout' to push that, but full routing between Cogent and HE needs to be
fixed.

if you are worried about full v4 or v6 or v8-juice routing between
cogent and X, for many values of X, then you will never be unworried.

randy

Joel_Jaeggli · September 8, 2011, 4:22pm

Can we really push an IPv6 agenda for CDN's when IPv6 routing at high
backend levels is still not complete? I certainly don't have the
'clout' to push that, but full routing between Cogent and HE needs to be
fixed.

It's your job to run your network such that you have connectivity to the
destinations your customers want to reach not Cogent's or HE's...

Dan_Wing1 · September 8, 2011, 4:44pm

From: Geoff Huston [mailto:gih@apnic.net]
Sent: Wednesday, September 07, 2011 10:27 PM
To: Leigh Porter
Cc: nanog@nanog.org list; Daniel Roesen
Subject: Re: NAT444 or ?

>
>
>> From: Daniel Roesen [mailto:dr@cluenet.de]
>> Sent: 07 September 2011 17:38
>> To: nanog@nanog.org
>> Subject: Re: NAT444 or ?
>>
>>>> I'm going to have to deploy NAT444 with dual-stack real soon now.
>>>
>>> you may want to review the presentations from last week's apnic
>> meeting
>>> in busan. real mesurements. sufficiently scary that people who
were
>>> heavily pushing nat444 for the last two years suddenly started to
say
>>> "it was not me who pushed nat444, it was him!" as if none of us
had
>> a
>>> memory.
>>
>> Hm, I fail to find relevant slides discussing that. Could you please
>> point us to those?
>>
>> I'm looking at http://meetings.apnic.net/32
>
> There is a lot in the IPv6 plenary sessions:
>
> http://meetings.apnic.net/32/program/ipv6
>
> This is what I am looking at right now. Randy makes some good
comments in those sessions. I have not found anything yet, but I am
only on session 3, pertaining specifically to issues around NAT444.

It may not be what Randy was referring to above, but as part of that
program at APNIC32 I reported on the failure rate I am measuring for
Teredo. I'm not sure its all in the slides I was using, but what I was
trying to say was that STUN is simply terrible at reliably negotiating
a NAT.

Teredo does not use STUN; Teredo was implemented before STUN was fully
spec'd and does its own thing.

Teredo tries to determine if the type of NAT it is behind ("cone",
"symmetric", etc.) Determining the type of NAT has been found to
be difficult, and nearly impossible (*) and removed from the current
RFC for STUN (**). I suspect most of Teredo's failures are related
to this procedure not working effectively. A CGN can't improve that.

(*) RFC 5780 - NAT Behavior Discovery Using Session Traversal Utilities for NAT (STUN)
(**) RFC 5389 - Session Traversal Utilities for NAT (STUN)

I was then wondering what pixie dust CGNs were going to use that
would have any impact on the ~50% connection failure rate I'm observing
in Teredo. And if there is no such thing as pixie dust (damn!) I was
then wondering if NATs are effectively unuseable if you want anything
fancier than 1:1 TCP connections (like multi-user games, for example).
After all, a 50% connection failure rate for STUN is hardly encouraging
news for a CGN deployer if your basic objective is not to annoy your
customers.

If the CGN has both Endpoint-Independent Filtering (***) behavior
and Endpoint-Independent Mapping (****) behavior, the CGN won't make
Teredo worse -- Teredo will be as bad as today, caused by the home
user's own pretty NAT. With that behavior, it also won't make
applications that use STUN or ICE worse, or applications that use
STUN-like or ICE-like techniques such as Skype.

(***) endpoint-independent filtering: means it doesn't filter incoming
packets after a mapping is created. See
RFC 4787 - Network Address Translation (NAT) Behavioral Requirements for Unicast UDP for canonical definition.

(****) Endpoint-Independent Mapping: means when the internal host sends a
packet with the same source port, to any destination, the same public port
is mapped. See RFC 4787 - Network Address Translation (NAT) Behavioral Requirements for Unicast UDP for
canonical definition

-d

Dan_Wing1 · September 8, 2011, 4:47pm

...

The striking thing I picked up is that NTT considers the CGN equipment
a big black hole where money goes into. Because it won't solve their
problem now or in the future and it becomes effectively a piece of
equipment they need to buy and then scrap "soon" after.

It would get scrapped when all servers support dual stack. What year
is that predicted to occur?

They acknowledge the need, but they'd rather not buy one.
That and they (the isp) get called for anything which doesn't work.

-d

Dan_Wing1 · September 8, 2011, 4:52pm

From: Christian de Larrinaga [mailto:cdel@firsthand.net]
Sent: Thursday, September 08, 2011 8:05 AM
To: Cameron Byrne
Cc: NANOG
Subject: what about the users re: NAT444 or ?

I wonder if the discussion as useful as it is isn't forgetting that the
edge of Internet has a stake in getting this right too! This is not
just an ISP problem but one where content providers and services that
is the users need to get from here to there in good order.

So

What can users do to encourage ISPs to deploy v6 to them?
What can users do to ease the pain in reaching IPv4 only sites once
they are on IPv6 tails?

Is there not a bit of CPE needed here? What should the CPE do? and not
do? should it deprecate NAT/PAT when it receives 1918 allocation from a
CGN?

Careful with that idea -- people like their in-home network to continue
functioning even when their ISP is down or having an outage. Consider
a home NAS holding delivering content to the stereo or the television.
It is possible to eliminate reliance on the ISP's network and still
have the in-home network function, but it's more difficult than just
continuing to run NAT44 in the home like today. (Dual Stack-Lite
can accomplish this pretty easily, because the IPv4 addresses in
the home can be any IPv4 address whatsoever -- which allows the
in-home CPE ("B4", in Dual Stack-Lite parlance) to assign any address
it wants with its built-in DHCP server.)

-d

Dan_Wing1 · September 8, 2011, 5:04pm

From: Leigh Porter [mailto:leigh.porter@ukbroadband.com]
Sent: Wednesday, September 07, 2011 1:38 PM
To: David Israel; nanog@nanog.org
Subject: RE: NAT444 or ?

> From: David Israel [mailto:davei@otd.com]
> Sent: 07 September 2011 21:23
> To: nanog@nanog.org
> Subject: Re: NAT444 or ?
>
> > I think you have the numbers off, he started with 1000 users
sharing
> the same IP, since you can only do 62k sessions or so and with a
> "normal" timeout on those sessions you ran into issues quickly.
> >
>
> Remember that a TCP session is defined not just by the port, but by
the
> combination of source address:source port:destination
> address:destination port. So that's 62k sessions *per destination*
per
> ip address. In theory, this particular performance problem should
> only
> arise when the NAT gear insists on a unique port per session (which
is
> common, but unnecessary) or when a particular destination is
> inordinately popular; the latter problem could be addressed by
> increasing the number of addresses that facebook.com and google.com
> resolve to.

Good point, but aside from these scaling issues which I expect can be
resolved to a point, the more serious issue, I think, is applications
that just do not work with double NAT. Now, I have not conducted any
serious research into this, but it seems that draft-donley-nat444-
impacts does appear to have highlight issues that may have been down to
implementation.

Draft-donley-nat444-impacts conflates bandwidth constraints with CGN
with in-home NAT. Until those are separated and then analyzed carefully,
it is harmful to draw conclusions such as "NAT444 bad; NAT44 good".

Other simple tricks such as ensuring that your own internal services
such as DNS are available without traversing NAT also help.

Yep. But some users want to use other DNS servers for performance
(e.g., Google's or OpenDNS servers, especially considering they
could point the user at a 'better' (closer) CDN based on Client
IP), to avoid ISP DNS hijacking, or for content control (e.g.,
"parental control" of DNS hostnames). That traffic will, necessarily,
traverse the CGN. To avoid users burning through their UDP port
allocation for those DNS queries it is useful for the CGN to
have short timeouts for port 53.

Certainly some more work can be done in this area, but I fear that the
only way a real idea as to how much NAT444 really doe break things will
be operational experience.

Yep. (Same as everything else.)

-d

Dan_Wing1 · September 8, 2011, 5:10pm

There are two dimensions of that scalability, of course:

Endpoint-independent mapping means better scaling of the NAT itself,
because it stores less state (slightly less memory for each active
mapping and slightly less per-packet processing). This savings
is exchanged for worse IPv4 utilization -- which I agree is not so
good for scalability.

-d

Dan_Wing1 · September 8, 2011, 5:22pm

Try it at home. With aggressive usage of Microsoft's Terraserver,
mapquest, or google maps, I'm able to burn through 120 or so
TCP connections. Move the map around, zoom in/out, enable/disable
traffic, switch between satellite and map and overlay, repeat those
steps 2-3 times. Don't be slow and don't wait for everything
to paint.

Or crash your browser and when it restarts watch how many connections
it makes to re-open all your tabs.

I understand iTunes opens lots of connections, but I haven't looked
at that.

To experiment with limited ports at home, load 3rd party firmware
onto your NAT -- most of them allow controlling the number
of mappings (and by default, have higher limits than stock
firmware). Or consume a bunch of your mappings with a
script (such as the brain-dead Perl script below) and then
start your testing. Some NATs and some servers will kill the
TCP sessions after inactivity (which is why I describe the
script as brain-dead).

-d

Dan_Wing1 · September 8, 2011, 5:44pm

Many of the problems are due to IPv4 address sharing, which will be
problems for A+P, CGN, HTTP proxies, and other address sharing
technologies. RFC6269 discusses most (or all) of those problems.
There are workarounds to those problems, but most are not
pretty. The solution is IPv6.

-d