We’re seeing consistent +100ms latency increases to Verizon customers in Pennsylvania, during peak business hours for the past couple of weeks.
If someone is able to assist, could they please contact me off-list?
We’re seeing consistent +100ms latency increases to Verizon customers in Pennsylvania, during peak business hours for the past couple of weeks.
If someone is able to assist, could they please contact me off-list?
pennsylvania is largeish, maybe: “To philadelphia customers behind deviceX, Y, Z” or “Pittsburghers behind devices M, N, O”
or something else helpful
PSA to people running transit networks.
a) During congestion you are not buffering just the exceeding traffic,
you will delay every packet in the class for duration of congestion
b) Adding buffering does not increase RX rate during persistent
congestion, it only increases delay
c) Occasional persistent congestion is normal, because how we've
modeled economics of transit
d) Typical device transit network operates can add >100ms latency on a
single link, but you don't want more than 5ms latency on BB link
Fix for IOS-XR:
class BE
bandwidth percent 50
queue-limit 5 ms
Fix for Junos:
BE {
transmit-rate percent 50;
buffer-size temporal 5k;
}
The actual byte value programmed is interface_rate * percent_share *
time. If your class is by design out-of-contract, that means your rate
is actually higher, which means the programmed buffer byte value
results in smaller queueing delay. The configured byte value will only
result in configured queueing delay when actual rate == g-rate.
The buffers are not large to facilitate buffering single queue for
100ms, the buffers are large to support configurations of large amount
of logical interfaces each with large number of queues. If you are
configuring just few queues, assumption is that you are dimensoning
your buffer sizes.
Hopefully this motivates some networks to limit buffer sizes.
Thanks!
or something else helpful
Here's traceroutes, for those interested. Times are UTC. The issue is present to Verizon customers in both Pittsburgh and BlueBell. I don't have any other PA Verizon customers to reference against, though all of our other Verizon customers outside of PA look fine.
phil@debian:~$ mtr -zwc1 108.16.123.123
Start: Tue Mar 12 00:19:43 2019
HOST: debian Loss% Snt Last Avg Best Wrst StDev
1. AS32334 192.30.36.123 0.0% 1 2.7 2.7 2.7 2.7 0.0
2. AS??? 10.11.11.1 0.0% 1 2.6 2.6 2.6 2.6 0.0
3. AS2914 129.250.199.37 0.0% 1 1.5 1.5 1.5 1.5 0.0
4. AS2914 ae-6.r24.nycmny01.us.bb.gin.ntt.net 0.0% 1 9.4 9.4 9.4 9.4 0.0
5. AS2914 ae-1.r08.nycmny01.us.bb.gin.ntt.net 0.0% 1 6.6 6.6 6.6 6.6 0.0
6. AS701 et-7-0-5.BR3.NYC4.ALTER.NET 0.0% 1 8.5 8.5 8.5 8.5 0.0
7. AS??? ??? 100.0 1 0.0 0.0 0.0 0.0 0.0
8. AS701 ae203-0.PHLAPA-VFTTP-302.verizon-gni.net 0.0% 1 137.2 137.2 137.2 137.2 0.0
9. AS701 static-108-16-123-123.phlapa.fios.verizon.net 0.0% 1 118.4 118.4 118.4 118.4 0.0
phil@debian:~$ mtr -zwc1 108.16.123.123
Start: Tue Mar 12 07:48:25 2019
HOST: debian Loss% Snt Last Avg Best Wrst StDev
1. AS32334 192.30.36.123 0.0% 1 2.7 2.7 2.7 2.7 0.0
2. AS??? 10.11.11.1 0.0% 1 1.0 1.0 1.0 1.0 0.0
3. AS2914 129.250.199.37 0.0% 1 2.9 2.9 2.9 2.9 0.0
4. AS2914 ae-6.r24.nycmny01.us.bb.gin.ntt.net 0.0% 1 7.2 7.2 7.2 7.2 0.0
5. AS2914 ae-1.r08.nycmny01.us.bb.gin.ntt.net 0.0% 1 9.1 9.1 9.1 9.1 0.0
6. AS701 et-7-0-5.BR3.NYC4.ALTER.NET 0.0% 1 7.1 7.1 7.1 7.1 0.0
7. AS??? ??? 100.0 1 0.0 0.0 0.0 0.0 0.0
8. AS701 ae203-0.PHLAPA-VFTTP-302.verizon-gni.net 0.0% 1 14.7 14.7 14.7 14.7 0.0
9. AS701 static-108-16-123-123.phlapa.fios.verizon.net 0.0% 1 17.8 17.8 17.8 17.8 0.0
Smokeping graph at https://ibb.co/g4VQR8k
or something else helpful
Here’s traceroutes, for those interested. Times are UTC. The issue is present to Verizon customers in both Pittsburgh and BlueBell. I don’t have any other PA Verizon customers to reference against, though all of our other Verizon customers outside of PA look fine.
phil@debian:~$ mtr -zwc1 108.16.123.123
Start: Tue Mar 12 00:19:43 2019
HOST: debian Loss% Snt Last Avg Best Wrst StDev
- AS32334 192.30.36.123 0.0% 1 2.7 2.7 2.7 2.7 0.0
- AS??? 10.11.11.1 0.0% 1 2.6 2.6 2.6 2.6 0.0
- AS2914 129.250.199.37 0.0% 1 1.5 1.5 1.5 1.5 0.0
- AS2914 ae-6.r24.nycmny01.us.bb.gin.ntt.net 0.0% 1 9.4 9.4 9.4 9.4 0.0
- AS2914 ae-1.r08.nycmny01.us.bb.gin.ntt.net 0.0% 1 6.6 6.6 6.6 6.6 0.0
- AS701 et-7-0-5.BR3.NYC4.ALTER.NET 0.0% 1 8.5 8.5 8.5 8.5 0.0
- AS??? ??? 100.0 1 0.0 0.0 0.0 0.0 0.0
- AS701 ae203-0.PHLAPA-VFTTP-302.verizon-gni.net 0.0% 1 137.2 137.2 137.2 137.2 0.0
- AS701 static-108-16-123-123.phlapa.fios.verizon.net 0.0% 1 118.4 118.4 118.4 118.4 0.0
phil@debian:~$ mtr -zwc1 108.16.123.123
Start: Tue Mar 12 07:48:25 2019
HOST: debian Loss% Snt Last Avg Best Wrst StDev
- AS32334 192.30.36.123 0.0% 1 2.7 2.7 2.7 2.7 0.0
- AS??? 10.11.11.1 0.0% 1 1.0 1.0 1.0 1.0 0.0
- AS2914 129.250.199.37 0.0% 1 2.9 2.9 2.9 2.9 0.0
- AS2914 ae-6.r24.nycmny01.us.bb.gin.ntt.net 0.0% 1 7.2 7.2 7.2 7.2 0.0
- AS2914 ae-1.r08.nycmny01.us.bb.gin.ntt.net 0.0% 1 9.1 9.1 9.1 9.1 0.0
- AS701 et-7-0-5.BR3.NYC4.ALTER.NET 0.0% 1 7.1 7.1 7.1 7.1 0.0
- AS??? ??? 100.0 1 0.0 0.0 0.0 0.0 0.0
- AS701 ae203-0.PHLAPA-VFTTP-302.verizon-gni.net 0.0% 1 14.7 14.7 14.7 14.7 0.0
- AS701 static-108-16-123-123.phlapa.fios.verizon.net 0.0% 1 17.8 17.8 17.8 17.8 0.0
I’m not in philly, but from IAD area the path back is via HE.net.it seems quick enough from IAD, but as a data point PHL may head back via NYC or it may go through IAD and HE.net.
From: NANOG <nanog-bounces@nanog.org> On Behalf Of Saku Ytti
Sent: Tuesday, March 12, 2019 7:58 AMPSA to people running transit networks.
a) During congestion you are not buffering just the exceeding traffic, you will
delay every packet in the class for duration of congestion
b) Adding buffering does not increase RX rate during persistent congestion, it
only increases delay
c) Occasional persistent congestion is normal, because how we've modeled
economics of transit
d) Typical device transit network operates can add >100ms latency on a single
link, but you don't want more than 5ms latency on BB linkFix for IOS-XR:
class BE
bandwidth percent 50
queue-limit 5 msFix for Junos:
BE {
transmit-rate percent 50;
buffer-size temporal 5k;
}The actual byte value programmed is interface_rate * percent_share * time.
If your class is by design out-of-contract, that means your rate is actually
higher, which means the programmed buffer byte value results in smaller
queueing delay. The configured byte value will only result in configured
queueing delay when actual rate == g-rate.The buffers are not large to facilitate buffering single queue for 100ms, the
buffers are large to support configurations of large amount of logical
interfaces each with large number of queues. If you are configuring just few
queues, assumption is that you are dimensoning your buffer sizes.Hopefully this motivates some networks to limit buffer sizes.
Thanks!
+1 to that.
The overall system works so much better if the network nodes don't interfere and instead report the actual network conditions accurately and in a timely fashion to the end hosts -i.e. by inducing drops as and when they occur.
There are a number of papers on this topic btw..
adam
We're seeing consistent +100ms latency increases to Verizon customers in Pennsylvania, during peak business hours for the past couple of weeks.
Verizon reached out shortly after my e-mail to say they had resolved the issue - latency has been within normal bounds since. Many thanks