State of QoS peering in Nanog

Folks,

The Canadian telecommunications regulator, the CRTC, has just launched a public notice with possible worldwide implications IMHO, Telecom Notice of Consultation CRTC 2011-206:

http://www.crtc.gc.ca/eng/archive/2011/2011-206.htm

I think this is the very first regulatory inquiry into IP to IP interconnection for PSTN local interconnection.

One of the postulates that I intend to defend, is that in the PSTN today, in addition to interconnecting for the purpose of exchanging voice calls, it is possible to LOCALLY (at the Local Interconnection Region, roughly a US LATA) interconnect with guaranteed QoS for ISDN video conferencing.

In other words, there is more to PSTN interconnection than the support of the G.711 CODEC. Other CODECs are supported, such as H.320.

This brings me to a point. Why should we loose this important feature of the PSTN, support for multiple CODECs, as we carelessly bottom level IP-IP interconnection to G.711 only.

Video conferencing on the Internet, particularly at high resolution, is not a reality today to say the least, foregoing of guessing what the future will hold.

Why not consider HD audio ?

Therefore:

A) I want to capture all instances where this issue has been addressed worldwide.

B) I also want to understand what is going on, insofar as enabling guaranteed QoS peering across BGP-4 interconnections in the Nanog community.

C) I also want to understand whether there is inter-service-provider RSVP or other per-session QoS establishment protocols.

I call upon the Nanog community to consider this proceeding as very important and contribute to this thread.

And I will try to provide a forum for discussing this outside of Nanog when required.

Regards,

-=Francois=-

In a message written on Sat, Apr 02, 2011 at 04:00:30PM -0400, Francois Menard wrote:

One of the postulates that I intend to defend, is that in the
PSTN today, in addition to interconnecting for the purpose of
exchanging voice calls, it is possible to LOCALLY (at the Local
Interconnection Region, roughly a US LATA) interconnect with
guaranteed QoS for ISDN video conferencing.

The PSTN "features" fixed, known bandwidth. QoS isn't really the
right term. When I nail up a BRI, I know I have 128kb of bandwidth,
never more, never less. There is no function on that channel similar
to IP QoS.

When talking about IP QoS people like to talk about guaranteed, or
reserved bandwidth for particular applications. The reality is
though that's not how IP QoS works. IP QoS is really about identifying
which traffic can be thrown away first in th face of congestion.
Guaranteeing 128kb for a video call really means making sure all
other traffic is thrown away first, in the face of congestion.

In other words, there is more to PSTN interconnection than the
support of the G.711 CODEC. Other CODECs are supported, such as
H.320.

This brings me to a point. Why should we loose this important
feature of the PSTN, support for multiple CODECs, as we carelessly
bottom level IP-IP interconnection to G.711 only.

IP networks can't tell the difference between G.711, H.320, and the
SMTP packets used to deliver this e-mail. IP networks know nothing
about CODECs, and operate entirely on IP address and port information.

B) I also want to understand what is going on, insofar as enabling
guaranteed QoS peering across BGP-4 interconnections in the Nanog
community.

You're looking at the wrong point in the network. In my experience,
full peering circuits are very much the exception, not the rule.
While almost all the exceptions hit NANOG and are the subject of
fun and lively discussion, the reality is they are rare.

When there is no congestion, there is no reason to drop a packet.
A QoS policy would go unused, or if you want to look from the other
direction everything has 100% bandwidth across that link.

In an IP network, the bandwidth constraints are almost always across
an administrative boundary. This means in the majority of the case
across transit circuits, not peering. 80-90% of the packet loss
in the network happens at the end user access port, inbound or
outbound. Another 5-10% occurs where regional or non-transit free
providers buy transit. Lastly, 3-5% occurs where there are geographic
or geopolitical issues (oceans to cross, country boarders with
restrictive governments to cross).

Basically, you could mandate QoS on every peering link in the
Internet and I suspect 99% of the end users would never notice any
change.

If you want to advocate for useful changes to end users that provide a
better network experience, you need to focus your efforts in three
areas:

1) Fight bufferbloat. Bufferbloat - Wikipedia
   Understanding bufferbloat and the network buffer arms race | Ars Technica
   http://www.bufferbloat.net/

2) Get access ISPs to offer QoS on customer access ports, ideally in
   some user configurable way.

3) Get ISP's who purchase transit further up the line to implement QoS
   with their transit provider for their customers traffic, if they are
   going to run those links at full.

The PSTN "features" fixed, known bandwidth. QoS isn't really the
right term. When I nail up a BRI, I know I have 128kb of bandwidth,
never more, never less. There is no function on that channel similar
to IP QoS.

The PSTN also has exactly one unidirectional flow per access port.
This is not true of IP networks, where an end-user access port may
have dozens of flows going at once for common web browsing, and
perhaps hundreds of flows when using P2P file sharing applications,
etc. The lifetime of these flows may be several hours (streaming
movie) or under a second (web browser.)

Where the PSTN has channels between two access ports (which might be
packetized within the backbone) and a relatively complex control plane
for establishing flows, the IP network has little or no knowledge of
flows, and if it does have any knowledge of them, it's not because a
control plane exists to establish them, it's because punting from the
data plane to the control plane allows flow state to be established
for things like NAT.

Basically, you could mandate QoS on every peering link in the
Internet and I suspect 99% of the end users would never notice any
change.

I don't agree with this. IMO all DDoS traffic would suddenly be
marked into the highest priority forwarding class that doesn't have an
absurdly low policer for the DDoS source's access port, and as a
result, DDoS would more easily cripple the network, either from
hitting policers on the higher-priority traffic and killing streaming
movies/voip/etc, or in the absence of policers, it would more easily
cause significant packet loss to best-effort traffic.

I think end-users would notice because their ISP would suddenly grind
to a halt anytime a clever DDoS was directed their way.

We will no sooner see a practical solution to this than we will one
for large-scale multicast in backbone and subscriber access networks.
The limitations are similar: to be effective, you need a lot more
state for multicast. For a truly good QoS implementation, you need a
lot more hardware counters and policers (more state.) If you don't
have this, all your QoS setup will do, deployed across a large
Internet subscriber access network, is work a little better under
ideal conditions, and probably a lot worse when subjected to malicious
traffic.

2) Get access ISPs to offer QoS on customer access ports, ideally in
some user configurable way.

I do agree that QoS should be available to end-users across access
links, but I don't agree with pushing it further towards the core
unless per-subscriber policers are available beyond those on access
routers. Otherwise, all someone has to do to be mean to Netflix is
send a short-term, high-volume DoS attack that looks like Netflix
traffic towards an end-user IP, which would interrupt movie-viewing
for a potentially larger number of users, or at least as many
end-users as the same DoS would in the absence of any QoS. The case
of per-subscriber policers pushed further towards the ISP core fares
better.

In a message written on Sat, Apr 02, 2011 at 07:00:52PM -0400, Jeff Wheeler wrote:

I don't agree with this. IMO all DDoS traffic would suddenly be
marked into the highest priority forwarding class that doesn't have an
absurdly low policer for the DDoS source's access port, and as a
result, DDoS would more easily cripple the network, either from
hitting policers on the higher-priority traffic and killing streaming
movies/voip/etc, or in the absence of policers, it would more easily
cause significant packet loss to best-effort traffic.

Agree in part, and disagree in part.

No doubt DDoS programs will try and masquerade as "high priority"
traffic. This will create a new set of problems, and require some
new solutions.

Let's separate the problem into two parts. The first is "best
effort" traffic. Provided the QoS policy only prioritizes a fraction
of the bandwidth (20 to maybe 40%), the impact of a DDoS that came
in prioritized would only be a few percentage points worse than a
standard DDoS.

Today it takes about 10x link speed to make a link "completely
unusable" (although YMMV, and it depends a lot on your traffic mix
and definition of unusable). Witha 25% priority queue, and the
DDoS hitting it that may drop to 8x. I think it is both statistically
interesting, but also relatively minor.

The second problem is what happens to priority traffic. You are
correct that if DDoS traffic can come in prioritized then you only
need fill the priority queue 2x-4x to generate issues (as streaming
traffic is more sensitive), assuming traffic over the limit is not
dropped but rather allowed best effort. This is likely a lower
threshold than filling the entire link 5x-10x, and thus easier for
the attacker.

But it also only affects priority queue traffic. I realize I'm
making a value judgment, but many customers under DDoS would find
things vastly improved if their video conferencing went down, but
everything else continued to work (if slowly), compared to today
when everything goes down.

In closing, I want to push folks back to the buffer bloat issue
though. More than once I've been asked to configure QoS on the
network to support VoIP, Video Conferencing or the like. These
things were deployed and failed to work properly. I went into the
network and _reduced_ the buffer sizes, and _increased_ packet
drops. Magically these applications worked fine, with no QoS.

Video conferencing can tolerate a 1% packet drop, but can't tolerate
a 4 second buffer delay. Many people today who want QoS are actually
suffering from buffer bloat. :frowning:

This is very hard to explain, while people on NANOG might get it 99% of
the network engineers in the world think minimizing packet loss is the
goal. It is very much an uphill battle to make them understand higher
packet loss often _increases_ end user performance on full links.

Hi Leo,

I think you bring up some interesting points here, and my experience and
observations largely lend credence to what you are saying. I'd like to know
however, just for my own personal knowledge, are the numbers you are using
above based on some broad analysis or study of multiple providers, or are
you deriving these numbers likewise you're your own personal observations?

Thanks,

Stefan Fouant

From: Leo Bicknell [mailto:bicknell@ufp.org]
Sent: Saturday, April 02, 2011 10:24 PM

But it also only affects priority queue traffic. I realize I'm making
a value judgment, but many customers under DDoS would find things
vastly improved if their video conferencing went down, but everything
else continued to work (if slowly), compared to today when everything
goes down.

I'd like to observe that discussion when the Netflix guys come calling on
the support line - "Hey Netflix, yeah you're under attack and your
subscribers can't watch videos at the moment, but the good news is that all
other apps running on our network are currently unaffected". ;>

In closing, I want to push folks back to the buffer bloat issue though.
More than once I've been asked to configure QoS on the network to
support VoIP, Video Conferencing or the like. These things were
deployed and failed to work properly. I went into the network and
_reduced_ the buffer sizes, and _increased_ packet drops. Magically
these applications worked fine, with no QoS.

Video conferencing can tolerate a 1% packet drop, but can't tolerate a
4 second buffer delay. Many people today who want QoS are actually
suffering from buffer bloat. :frowning:

Concur 100%. In my experience, I've gotten much better performance w/
VoIP/Video Conferencing and other delay-intolerant applications when setting
buffer sizes to a temporal value rather than based on a _fixed_ number of
packets.

Stefan Fouant

There is no magic here at all.

There are dark buffers all over the Internet; some network operators run routers and broadband without RED enabled, our broadband gear suffers from excessive buffering, as does our home routers and hosts.

What is happening, as I outlined at the transport area meeting at the IETF in Prague, is that by putting in excessive buffers everywhere in the name of avoiding packet loss, we've destroyed TCP congestion avoidance and badly damaged slow start while adding terrible latency and jitter. Tail drop with long buffers delays notification of congestion to TCP, and defeats the algorithms. Even without this additional problem (which causes further havoc), TCP will always fill buffers on either side of your bottleneck link in your path.

So your large buffers add latency, and when a link is saturated, the buffers on either side of the saturated links fill, and stay so (most commonly in the broadband gear, but often also in the hosts/home routers over 802.11 links).

By running with AQM (or small buffers), you reduce the need for QOS (which doesn't yet exist seriously in the network edge).

See my talk in http://mirrors.bufferbloat.net/Talks/PragueIETF/ (slightly updated since the Prague IETF) and you can listen to it at

  http://ietf80streaming.dnsalias.net/ietf80/ietf80-ch4-wed-am.mp3

A longer version of that talk is at:http://mirrors.bufferbloat.net/Talks/BellLabs01192011/

Note that there is a lot you can do immediately to reduce your personal suffering, by using bandwidth shaping to reduce/eliminate the buffer problem in your home broadband gear, and by ensuring that your 802.11 wireless bandwidth is always greater than your home broadband bandwidth (since the bloat in current home routers can be even worse than in the broadband gear).

See http://gettys.wordpress.com for more detail. Please come help fix this mess at bufferbloat.net.
The bloat mailing list is bloat@lists.bufferbloat.net.

We're all in this bloat together.
        - Jim