2008.02.19 NANOG 42 simple effective 50ms resilience for IPTV

Gotta bop these out fast, CRG West gathering is about
to start. ^_^;


2008.02.19 simple, effective O(50ms) resilience for IPTV

Dino and Clarence from Cisco talk about multicast
only fast reroute.

go through problem, solution, two examples, some

packet loss is greatest loss on video apps
expected human MTBA is about 2 hours
node/link failures MTBA is about 100 hours

losing I frame with 50ms reroute has same
visual impact as with 400ms reroute

switching time requirements
500-1000ms unicast routing with FC
100-500 unicast routing
50-100ms problem space for MoFR

multicast streams need resiliency for
network outages
need fast switchover times with near 0 packet loss
50-100ms, definitely <<< 1second

existing redundancy
one source, multiple diverse network paths
multiple sourcessend same stream on diverse paths
device recieves, drops dupes.
  (source redundancy model)

for really fast switchover, can't use messaging, takes
too long
can't repair when failure occurs
need to make before break
can't depend on unicast routing, takes too long
needs to be relatively low cost
incremental deployment is good

MoFRR, depends solely on PIM, doesn't wait for
unicast routing protocol to reconverge
make before break
alternative to source redundancy; don't have to provision
extra sources, no dupe frames
upstream routers don't need to have MoFRR

Disadvantages; depends on equal cost multipath
could work with NECMP
  tweak costs
  using feasible successor technology
extensions for ring tech
redundant data in some parts of network
  not so bad with dense receivers

Allow Dest router to send alternate-join message
along secondary path.
A would have 2 OIFS leading to single reciever
When RPF path is up, dupes come to D from C,
but D RPF fails on packets from B

So, there's some wasted bandwidth, but only as long
as there's no membership on the alternate path.

Local decision to accept packets from alternate
path can go quickly, it's a local decision, no
signalling needed.

more redundant data as you have more ECMP path layers.
But RPF failures help reduce data, but tend to converge
towards a single point of failure.
If NECMP paths are used, though longer, they may be
less congested where data arrives faster, so more
packet loss could occur.

Ring topology extensions
distinguish ring interfaces to be such
allow alt-join to go on longer path
only two interfaces
shortest path is RPF interface
other is alt-interface

Only the immediate dest router sends alt-join;
rest of hops upstream send regular rpf join.

Need recievers to be able to accept join messages
along the alt-join path, even if it is upstream
on the RPF interface for this to work.

routers only forward to RPF interface when data is
received on an alt-RPF interface because its upstream

Doesn't matter if you're on repair path or main path,
you still have to forward alt-join messages around
the ring. If you recieve data on your RPF interface,
you also send it along the alternate path.

It's like counterrotating data on FDDI rings, you're
just picking which path you accept it from.

A cube implements diverse paths...wow, look at the
slide, that's hard to describe.

each pop ends up having connections to each ring
that way.

Failure detection--hardest part to solve.
direct link failures detected fast.
neighbor failures can be detected via bfd fast
upstream router or link failures take time.
use one solution to detect all cases.

Monitor data flow on the RPF path
constant bit-rate apps have expected packet arrival times
use counters to see if packets have been recieved
polling interval is loss budget
if counter doesn't increment within interval, you might
  have a failure; switch to alternate interface, you'll
get data, may be redundant, but that's ok.

MoFRR patent application filed 4/26/2007
extensions for uh...stuff...

Q: Eric notes he's doubled the state in the
network; no additional s,g state in the network,
no additional entries in the MRIB; you need a new
field in multicast RIB for which interface is your
alt interface.

Q: Anne Johnson, CPAC networks--50ms to 100ms as timing
target; is there actually a timing dependency given that
you're setting up repair path in advance of actually
needing them. But what if you're link is more than
200ms away?
Are there implementations of this that currently
exist? Not yet, but they're working on it.
Sounds like she'll be a beta tester. :smiley:

Randy is back up for the IPv6.
How many people are successfully on IPv6? A bunch.
How many are successfully on IPv4? A bunch, but not
quite as many.

PC reminds you to fill out your survey forms once
you get your connectivity back.

1400 hours resume with lightning talks.