Whats so difficult about ISSU

Kasper_Adel · November 8, 2012, 11:22pm

Hello,

We've been hearing about ISSU for so many years and i didnt hear that any
vendor was able to achieve it yet.

What is the technical reason behind that?

If i understand correctly, the way it will be done would be simply to have
extra ASICs/HW to be able to build dual circuits accessing the same memory,
and gracefully switch from one to another. Is that right?

Thanks,
Kim

Zaid_A_Kahn · November 8, 2012, 11:38pm

Cisco Nexus platform does it pretty well so they have achieved it.

Zaid

Kenneth_McRae · November 8, 2012, 11:42pm

Juniper also offers it on the EX virtual switching platform. Works if you
have the correct version of JunOS.

Alex4 · November 9, 2012, 12:19am

http://www.juniper.net/techpubs/en_US/junos/topics/concept/issu-oveview.html

The Juniper ISSU guide.

You need two things:

1. Separation of the control plane and forwarding plane
2. 2 routing engines in the same chassis -- the non active RE upgrades first, then when its up and running the active one goes into upgrade mode and control fails over to the secondary RE which is running the upgraded version of the software.

I assume it works on any vendor that has 2 REs in the same chassis and the fwd and control planes are separated, and there is a redundancy protocol running between the two REs(like Graceful Switchover on Juniper gear).

Phil · November 9, 2012, 12:48am

The major vendors have figured it out for the most part by moving to stateful synchronization between control plane modules and implementing non-stop routing.

ALU has supported ISSU on minor releases for many years and just added support for major releases.

The Cisco Nexus ISSU works well, I've done an upgrade on a 5K switch and it was completely hitless.

Juniper and Cisco with the 9K have gone through some hurdles but ISSU is actually usable now if the software versions support it.

The main remaining hurdle is updating microcode on linecards, they still need to be rebooted after an upgrade.

Phil

Kasper_Adel · November 9, 2012, 12:52am

What i was asking is full ISSU, even with micro code. I assume between
Major release there will be microcode upgrade most of the time.

Kenneth_McRae · November 9, 2012, 12:55am

I have executed successfully on the MX960 with no issues.. EX on the other
hand, really depends on your version of JunOS.

Kenneth_McRae · November 9, 2012, 12:56am

I have performed micro code upgrades using ISSU on the Juniper platform.

Kasper_Adel · November 9, 2012, 1:00am

Does that mean they are the only vendor capable of doing this today?

I am interested in the technology behind this if this is something public,
any ideas?

Thx

Oliver_Garraux · November 9, 2012, 1:22am

I know some people here have mentioned good experiences with ISSU on
Nexus. I don't doubt that it usually works right, but in my latest
experience with upgrading NX-OS on dual-SUP'ed 7k's, it was "hitless"
if, by "hitless", you mean ~20% packet loss while troubleshooting with
TAC before we found that we had to remove and re-apply QoS policies
from every interface.

Also, depending on the update, linecards might have to be reset.

Oliver

Phil · November 9, 2012, 3:12am

Heh you will find vendors avoid using the term hitless. I can't think of any router which supports ISSU that is truly hitless. The ASR9K ISSU states it will sustain less than 6 seconds of loss...

ISSU is still rife with caveats and incompatibilities as well if you are doing more advanced things.

Phil

Mikael_Abrahamsson · November 9, 2012, 4:13am

The major vendors have figured it out for the most part by moving to stateful synchronization between control plane modules and implementing non-stop routing.

NSR isn't ISSU.

ISSU contains the wording "in service". 6 seconds of outage isn't "in service". 0.5 seconds of outage isn't "in service". I could accept a few microseconds of outage as being "ISSU", but tenths of seconds isn't in service.

The main remaining hurdle is updating microcode on linecards, they still need to be rebooted after an upgrade.

... and as long as this is the case, there is no ISSU. There is only "shorter outages during upgrade compared to a complete reboot".

Jonathan_Lassoff · November 9, 2012, 5:15am

This.
There are some wonderfully reconfigurable router hardwares out in the
world, and platforms that can dynamically program their forwarding
hardware make this seem possible.

It's possible to build things such that portions of a single box can
be upgraded at a time. With multiple links, or forwarding-paths out to
a remote destination, it seems to me that if the upgrade process could
just coordinate things and update each piece of forwarding hardware
while letting traffic cut over and waiting for it to come back before
moving on.

I could envision a Juniper M/TX box, where MPLS FRR or an "ae"
interface across FPCs could take backup traffic while a PFE is
upgraded.
Of course, every possible path would need to be able to survive an FPC
being down, and the process would have to have hooks into protocols to
know when everything is switched back.

Saku_Ytti1 · November 9, 2012, 7:36am

I'd say generally code quality in routers is really really bad, I'm not
sure why this is.
I think one problem is, that we start on premise that code will be written
correctly. When we start on that premise, we can do silly things like write
run-to-completion operating systems like IOS and JunOS (rpd). Which means
single guy making one bad judgement call, and whole OS is bad.

Of course run-to-completion is most optimum way to execute code, if your
code is flawless, but that ship has sailed. Possibly when IOS started CPU
time was premium and it was cheaper to through code review money at the
problem.
But today it clearly is cheaper to add power to control plane and have
levels of abstraction in control-plane which saves the system from bad
code, i.e. design your control-plane assuming code you deliver isn't good.

Take a page from erlang team on design principles. I think Arista is
walking the right path. They have (hopefully) stable and simplistic
state-storage process, from which separate processes can download their
states when they crash, which can make crashing virtually transparent to
operator.
However I think Arista is still running single BGPd etc, I think you should
at least rung iBGP and eBGP or maybe even peer gruops in different daemons,
so when you get bad UPDATE, it'll crash your eBGPs or one peer-group,
instead of all neighbours. Or of course if you keep TCP state and various
bgp RIBs in separate location, you won't need to tear down the TCP just
because you crash.

Someone might argue the overhead is too large, but is it though? MX routers
ship with 4 cores RP, out of which you're using 1 core. The overhead isn't
that high.

Some people write positive things about ISSU in reply, only box where I've
seen it work reliably is CAT4500 switches. I've not seen it working in
routers. On MX960 my personal hit miss ratio is like 4/5 ISSU work, 1/5
have failed catastrophically, like suddenly PFE is dropping packets as if
FW filter was applied, while none is. So we've stopped using ISSU.
Point of ISSU is, you're not doing change management notices to your
customers, so then it positively has to work, or you're in breach of
contract.

Juuso_Lehtinen1 · November 9, 2012, 7:36am

In vendor-speak ISSU usually refers to 'minimal traffic impact' upgrade.
Definition of minimal varies from vendor to vendor and from upgrade to
upgrade, depending of which parts of the code need to be upgraded. In
general, traffic loss during ISSU is an order of magnitude less than by
reloading the whole box or line card as with conventional upgrade.

On high level, the ISSU can be divided to two areas:
* Control plane / controller card software upgrade
* Forwarding plane / line card software upgrade

Control card software upgrade is the easy part. In 1+1 controller design,
the standby controller card is upgraded first. Next, control card
switchover is performed. And last, the remaining controller card is
upgraded.

Line card upgrade is the more tricky part. On high level, the line card can
be divided into forwarding plane and control plane (yes - there is CPU
complex on line cards as well). The control plane part of the line card can
be upgraded separately and then restarted. If line-card CPU is responsible
for generating OSPF hellos, the OSPF session might time out during the
restart. However, for most protocols, graceful restart extensions help over
any such issues. While the control plane is rebooting, the forwarding bits
on the line card continue packet forwarding.

The forwarding plane upgrade of the line card is the tricky part. This is
the part that will cause the 'short outage' during ISSU. If the code
upgrade needs to touch microcode or FPGA code, you will be seeing some
traffic loss. It is just the way these chips are built - you cannot
reprogram FPGA without taking the FPGA out of service first. The same
applies to network processors as well.

In theory you could duplicate these forwarding plane chips on line cards
and implement simple switch before the PHY. However, I doubt if any vendor
has gone this way as it would push line card prices much higher.

If your SLAs are built so that no packet loss is acceptable, you need to
work around the ISSU limitations:
* Use line-level protection on adjacent line cards (LAG, APS1+1, MSP1+1) -
when primary card goes down, the backup card will carry the traffic
* When upgrading a transit router, route traffic via redundant path before
starting transit router upgrade

BR,
Juuso

is such that no traffic loss whatsoever is acceptable, be sure to

Pete_Lumbis · November 9, 2012, 1:02pm

I can't speak for JunOS, but none of the "new" IOS operating systems
are run to completion. This includes IOS-XE, XR and NX-OS.

Saku_Ytti1 · November 9, 2012, 1:27pm

Really? I thought IOS XE is Linux control-plane on top of where you have
monolithic IOSd process?
I had chat with Michael Beesley when ASR1k was coming up, and he said Cisco
has plans to remove processes from IOS and directly on top of Linux in XE,
starting with BGP. But I don't think that has materialized?

To me JunOS and IOS XE look very much same, NIX control-plane and magic
process with has its own memory management and cooperative
multitasking/scheduling?

Pete_Lumbis · November 9, 2012, 6:33pm

I apologize, I realized I forgot a critical word in my reply.

The new Cisco OSes are /NOT/ run to completion.

For IOS-XE we have Linux in charge of the scheduler with a
multi-threaded IOSd process responsible for the control plane. I'm
not familiar with movements to put processes directly on top of the
kernel, but this would be a lot more like the NX-OS model where a
process like BGP can crash without taking down the system (or the
critical IOSd process for example). The down side of this model is
that control plane scaling, due to message passing, starts to have a
lot of overhead. You can see this in the fact that the NX-OS routing
scale is not where IOS-XE is.

-Pete

Saku_Ytti1 · November 9, 2012, 9:00pm

I apologize, I realized I forgot a critical word in my reply.

The new Cisco OSes are /NOT/ run to completion.

I did not notice that :). I assumed not was there, and was arguing that I
thought IOS XE still is. I know XR and NX-OS aren't.

For IOS-XE we have Linux in charge of the scheduler with a
multi-threaded IOSd process responsible for the control plane. I'm

I'm sceptical if this means there isn't normal IOS run-to-completion
scheduler, certainly not all ios processes are separate threads to linux
kernel? But I guess this is moving target. Would be interesting to hear how
many threads, what are threads relative priorities, what runs in each
thread etc.
But anyhow just to hear it is threaded, is good news. Does this mean, IOSd
can capitalize on multiple cores? (Something JunOS cannot do today)

critical IOSd process for example). The down side of this model is
that control plane scaling, due to message passing, starts to have a
lot of overhead. You can see this in the fact that the NX-OS routing
scale is not where IOS-XE is.

Yup, luckily you guys stopped freescale pq3 and switch to xeon in ng nexus
sup (unfortunately you also killed CMP, which I think every vendor should
have). I think the overhead is worth it, built correctly you can scale
horizontally and just keep throwing faster RP CPU at it.

Pete_Lumbis · November 9, 2012, 9:58pm

I do not believe that the linux scheduler is run to completion, but to
be honest I'm not 100% certain. I know a big reason for IOS-XE was to
be able to operate in multicore environments. From a high level you
have IOSd as a process with each traditional process (BGP, OSPF, IP
Input) as a thread within IOSd. Overall IOS-XE is Linux managing a few
processes: IOSd, FMan-RP, CMan-RP (and a few others) FMan deals with
adjacencies and CMan deals with modules/cards and IOSd all the
interesting stuff. Since Linux is the piece actually running the show
IOS-XE gets all the memory management and scheduling benefits that
linux has.