PPP multilink help

Hey folks, I'm sure to you it's peanuts, but I'm a bit puzzled (most likely
because of the lack of knowledge, I bet).

I'm buying an IP backbone from VNZ (presumably MPLS). I get a MLPPP hand off
on all sites, so I don't do the actual labeling and switching, so I guess
for practical purposes what I'm trying to say is that I have no physical
control over the other side of my MLPPP links.

When I transfer a large file over FTP (or CIFS, or anything else), I'd
expect it to max out either one or both T1, but instead utilization on the
T1s is hoovering at 70% on both and sometimes MLPPP link utilization even
drops below 50%. What am I'm not gettting here?

Tx,
Andrey

Below is a snip of my config.

controller T1 0/0/0
cablelength long 0db
channel-group 1 timeslots 1-24
!
controller T1 0/0/1
cablelength long 0db
channel-group 1 timeslots 1-24
!
ip nbar custom rdesktop tcp 3389
ip cef
!
class-map match-any VoIP
match dscp ef
class-map match-any interactive
match protocol rdesktop
match protocol telnet
match protocol ssh
!
policy-map QWAS
class VoIP
    priority 100
class interactive
    bandwidth 500
class class-default
    fair-queue 4096
!
interface Multilink1
description Verizon Business MPLS Circuit
ip address x.x.x.150 255.255.255.252
ip flow ingress
ip nat inside
ip virtual-reassembly
load-interval 30
no peer neighbor-route
ppp chap hostname R1
ppp multilink
ppp multilink links minimum 1
ppp multilink group 1
ppp multilink fragment disable
service-policy output QWAS
!
interface Serial0/0/0:1
no ip address
ip flow ingress
encapsulation ppp
load-interval 30
fair-queue 4096 256 0
ppp chap hostname R1
ppp multilink
ppp multilink group 1
!
interface Serial0/0/1:1
no ip address
ip flow ingress
encapsulation ppp
load-interval 30
fair-queue 4096 256 0
ppp chap hostname R1
ppp multilink
ppp multilink group 1

Andrey Gordon wrote:

Hey folks, I'm sure to you it's peanuts, but I'm a bit puzzled (most likely
because of the lack of knowledge, I bet).

I'm buying an IP backbone from VNZ (presumably MPLS). I get a MLPPP hand off
on all sites, so I don't do the actual labeling and switching, so I guess
for practical purposes what I'm trying to say is that I have no physical
control over the other side of my MLPPP links.

When I transfer a large file over FTP (or CIFS, or anything else), I'd
expect it to max out either one or both T1, but instead utilization on the
T1s is hoovering at 70% on both and sometimes MLPPP link utilization even
drops below 50%. What am I'm not gettting here?
  
I seem to be in a similar situation as you (but with AT&T). I have not noticed any unexpected missing bandwidth.

I don't see any specific problems with your config, but I'll include mine in hopes it will be useful:

controller T1 0/0/0
framing esf
linecode b8zs
channel-group 0 timeslots 1-24
!
controller T1 0/0/1
framing esf
linecode b8zs
channel-group 0 timeslots 1-24
!
class-map match-any imaging
match access-group 112
class-map match-any rdp
match access-group 113
class-map match-any voice
match ip dscp ef
match access-group 110
!
policy-map private_wan
class voice
  priority percent 60
  set ip dscp ef
class rdp
  bandwidth percent 32
  set ip dscp af31
class imaging
  bandwidth percent 4
  set ip dscp af21
class class-default
  bandwidth percent 4
  set ip dscp default
!
interface Multilink1
ip address x.x.x.38 255.255.255.252
no keepalive
no cdp enable
ppp chap hostname xxxxxxx
ppp multilink
ppp multilink fragment disable
ppp multilink group 1
max-reserved-bandwidth 100
service-policy output private_wan
!
interface Serial0/0/0:0
no ip address
encapsulation ppp
no cdp enable
ppp chap hostname xxxxxxxxxx
ppp multilink
ppp multilink group 1
max-reserved-bandwidth 100
!
interface Serial0/0/1:0
no ip address
encapsulation ppp
no cdp enable
ppp chap hostname xxxxxxxxxx
ppp multilink
ppp multilink group 1
max-reserved-bandwidth 100
!
access-list 110 permit ip any 10.0.0.0 0.0.255.255
access-list 110 permit ip 10.0.0.0 0.0.255.255 any
access-list 110 permit icmp any any
access-list 112 permit ip any host x.x.x.x
access-list 113 permit ip any host x.x.x.x

Gents,

Andrey Gordon wrote:

[snip]

When I transfer a large file over FTP (or CIFS, or anything else), I'd
expect it to max out either one or both T1, but instead utilization on the
T1s is hoovering at 70% on both and sometimes MLPPP link utilization even
drops below 50%. What am I'm not gettting here?

Sounds like the TCP window is either set 'small' or TCP window scaling
either isn't enabled or isn't scaling to your bandwidth/delay product
(for the hosts in question). Since FTP is a 'stream' based transport
of file data (like http), you should see this scale to nearly all of
or most of your links (assuming TCP isn't your issue).

Additionally, when using CIFS, SMB, TFTP, NFS, and other
command->acknowledgment style protocols over wide-area links (which
aren't stream-based operations, but rather iterative operations on
blocks or parts of a file), you likely will never observe a single
transfer filling up the links.

-Tk

Hey folks, I'm sure to you it's peanuts, but I'm a bit puzzled (most likely
because of the lack of knowledge, I bet).

I'm buying an IP backbone from VNZ (presumably MPLS). I get a MLPPP hand off
on all sites, so I don't do the actual labeling and switching, so I guess
for practical purposes what I'm trying to say is that I have no physical
control over the other side of my MLPPP links.

When I transfer a large file over FTP (or CIFS, or anything else), I'd
expect it to max out either one or both T1,

Most MLPPP implementations don't has the flows at the IP layer to an
individual MLPPP member link. The bundle is a virtual L3 interface and
the packets themselves are distributed over the member links. Some people
reference it as a "load balancing" scenario vs. "load sharing" as the
traffic is given to the link that isn't currently "busy".

  but instead utilization on the

T1s is hoovering at 70% on both and sometimes MLPPP link utilization even
drops below 50%. What am I'm not gettting here?

If you have Multilink fragmentation disabled it sends a packet down each
path. It could be a reordering delay causing just enough variance in
the packet stream that the application thorttles back. If you have a bunch
of individual streams going you would probably see a higher throughput.
Remember there is additional overhead for the MLPPP.

Rodney

I would also think the problem is with flow control not allowing the maximum bandwidth. Trying multiple ftp streams and seeing if that would max it out would help.

I would think you would want to add a WRED to the class-default entry to prevent global tcp synchronization

...
class class-default
  fair-queue 4096
  random-detect dscp-based

To address the concerns about the overhead (FTP is still transferring that
file:

core.bvzn#sh proc cpu hist

core.bvzn 12:44:07 PM Monday May 11 2009 EST

    333344444222222222222222222223333333333222222222233333222222
100
90
80
70
60
50
40
30
20
10
   0....5....1....1....2....2....3....3....4....4....5....5....6
             0 5 0 5 0 5 0 5 0 5 0
               CPU% per second (last 60 seconds)

                                             4 4
    333334344353344455444444444554544445455664445551444445454544
100
90
80
70
60
50
40 * *
30 * *
20 * *
10 * ** ** * * ****# ***# * * *
   0....5....1....1....2....2....3....3....4....4....5....5....6
             0 5 0 5 0 5 0 5 0 5 0
               CPU% per minute (last 60 minutes)
              * = maximum CPU% # = average CPU%

                                                                       1
    433433344444434344444444443344444433444443444433333333434444344334301332
    497289281236443538550242449336950644007664423513486377362431706922208088
100 *
90 *
80 *
70 *
60 *
50 * * *** *** * *** * * *
40 ****** ****** ************* ****** ***** **** ** ************ * * *
30 ******************************************************************** ***
20 *******************************************************************#****
10 *******************************************************************#****
   0....5....1....1....2....2....3....3....4....4....5....5....6....6....7..
             0 5 0 5 0 5 0 5 0 5 0 5 0
                   CPU% per hour (last 72 hours)
                  * = maximum CPU% # = average CPU%

core.bvzn#sh inv
NAME: "2821 chassis", DESCR: "2821 chassis"
<snip>

Serial0/0/0:1 is up, line protocol is up
  Hardware is GT96K Serial
  Description:
  MTU 1500 bytes, BW 1536 Kbit/sec, DLY 20000 usec,
     reliability 255/255, txload 149/255, rxload 15/255
  Encapsulation PPP, LCP Open, multilink Open
  Link is a member of Multilink bundle Multilink1, loopback not set
  Keepalive set (10 sec)
  Last input 00:00:00, output 00:00:00, output hang never
  Last clearing of "show interface" counters 14w0d
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: weighted fair [suspended, using FIFO]
  FIFO output queue 0/40, 0 drops
  30 second input rate 93000 bits/sec, 86 packets/sec
  30 second output rate 899000 bits/sec, 122 packets/sec
     105433994 packets input, 3520749026 bytes, 0 no buffer
     Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
     155813204 packets output, 1174780375 bytes, 0 underruns
     0 output errors, 0 collisions, 1 interface resets
     0 unknown protocol drops
     0 output buffer failures, 0 output buffers swapped out
     1 carrier transitions
  Timeslot(s) Used:1-24, SCC: 0, Transmitter delay is 0 flags
Serial0/0/1:1 is up, line protocol is up
  Hardware is GT96K Serial
  Description:
  MTU 1500 bytes, BW 1536 Kbit/sec, DLY 20000 usec,
     reliability 255/255, txload 149/255, rxload 15/255
  Encapsulation PPP, LCP Open, multilink Open
  Link is a member of Multilink bundle Multilink1, loopback not set
  Keepalive set (10 sec)
  Last input 00:00:00, output 00:00:00, output hang never
  Last clearing of "show interface" counters 14w0d
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: weighted fair [suspended, using FIFO]
  FIFO output queue 0/40, 0 drops
  30 second input rate 94000 bits/sec, 86 packets/sec
  30 second output rate 898000 bits/sec, 122 packets/sec
     105441924 packets input, 3518841511 bytes, 0 no buffer
     Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
     155734625 packets output, 1156759105 bytes, 0 underruns
     0 output errors, 0 collisions, 1 interface resets
     0 unknown protocol drops
     0 output buffer failures, 0 output buffers swapped out
     1 carrier transitions
  Timeslot(s) Used:1-24, SCC: 1, Transmitter delay is 0 flags

Multilink1 is up, line protocol is up
  Hardware is multilink group interface
  Description: Verizon Business MPLS Circuit
  Internet address is x.x.x.150/30
  MTU 1500 bytes, BW 3072 Kbit/sec, DLY 100000 usec,
     reliability 255/255, txload 148/255, rxload 14/255
  Encapsulation PPP, LCP Open, multilink Open
  Listen: CDPCP
  Open: IPCP, loopback not set
  Keepalive set (10 sec)
  DTR is pulsed for 2 seconds on reset
  Last input 00:00:00, output never, output hang never
  Last clearing of "show interface" counters 14w0d
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 252140
  Queueing strategy: Class-based queueing
  Output queue: 3/1000/0 (size/max total/drops)
  30 second input rate 179000 bits/sec, 172 packets/sec
  30 second output rate 1795000 bits/sec, 243 packets/sec
     207501114 packets input, 1445648459 bytes, 0 no buffer
     Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
     42 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
     307484312 packets output, 2277871516 bytes, 0 underruns
     0 output errors, 0 collisions, 3 interface resets
     0 unknown protocol drops
     0 output buffer failures, 0 output buffers swapped out
     0 carrier transitions

I also ran 6 flows worth of iperf between a server at the site and my laptop
while the transfer was running (iperf -i 2 - P 6 -t 120 -c 10.1.150.4) in
the same direction

core.bvzn#sh policy-map int mu1
Multilink1

  Service-policy output: QWAS

    queue stats for all priority classes:

      queue limit 64 packets
      (queue depth/total drops/no-buffer drops) 0/0/0
      (pkts output/bytes output) 0/0

    Class-map: VoIP (match-any)
      0 packets, 0 bytes
      30 second offered rate 0 bps, drop rate 0 bps
      Match: dscp ef (46)
        0 packets, 0 bytes
        30 second rate 0 bps
      Priority: 100 kbps, burst bytes 2500, b/w exceed drops: 0

    Class-map: interactive (match-any)
      31490239 packets, 14882494949 bytes
      30 second offered rate 4000 bps, drop rate 0 bps
      Match: protocol rdesktop
        10981329 packets, 1277510597 bytes
        30 second rate 3000 bps
      Match: protocol telnet
        1104192 packets, 183832229 bytes
        30 second rate 0 bps
      Match: protocol ssh
        9263601 packets, 11659456657 bytes
        30 second rate 0 bps
      Queueing
      queue limit 64 packets
      (queue depth/total drops/no-buffer drops) 0/1103/0
      (pkts output/bytes output) 31489136/14887505365
      bandwidth 500 kbps

    Class-map: class-default (match-any)
      275000011 packets, 120951145536 bytes
      30 second offered rate 1494000 bps, drop rate 0 bps
      Match: any
      Queueing
      queue limit 64 packets
      (queue depth/total drops/no-buffer drops/flowdrops) 0/251092/0/251092
      (pkts output/bytes output) 276085337/122442704318
      Fair-queue: per-flow queue limit 16
core.bvzn#

It could very well be microburst in the flow creating congestion
as seen in the default class:

  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 252140

  30 second output rate 1795000 bits/sec, 243 packets/sec

    Class-map: class-default (match-any)
      275000011 packets, 120951145536 bytes
      30 second offered rate 1494000 bps, drop rate 0 bps
      Match: any
      Queueing
      queue limit 64 packets
      (queue depth/total drops/no-buffer drops/flowdrops) 0/251092/0/251092
      (pkts output/bytes output) 276085337/122442704318
      Fair-queue: per-flow queue limit 16

Which matches mostly to the default class. I don't recall if the per flow
queue limit kicks in without congestion or not.

You could try a few things:

a) remove WFQ in the default class
b) add a BW statement to it to allocate a dedicated amount
c) implement WRED in the class
d) remove WFQ in the default class

to see if one of those improves it.

btw, the overhead I was referring to was the additional MLPPP overhead
to each packet which reduces effective throughput.

Rodney