OOB core router connectivity wish list

Mikael_Abrahamsson · January 9, 2013, 2:37pm

I have together with some other people, collected a wish list for OOB support, mainly aimed for core routers. This is to replace the legacy serial port usually present on core routing equipment and to move/collapse all its functionality to an ethernet only port. Some equipment already have an mgmt ethernet port, but usually this can't do "everything", meaning today one has to have OOB ethernet *and* OOB serial which just brings more pain than before.

I would like to post it here to solicit feedback on it. Feel free to use it to tell your vendor account teams you want this if you feel it useful. I've already sent it to one vendor.

http://swm.pp.se/oob.txt

Priorities:

[P1] -> must have, otherwise not useful
[P2] -> would be very useful, to most operators
[P3] -> nice to have, useful to some

From the OOB ethernet port it should be possible to:

[P1]: Powercycle the RP, switchfabrics and linecards (hard, as in they might be totally dead and I want to cut power to it via the back plane. Also useful for FPGA upgrades).

[P1]: Connect to manage the RP(s) and linecards (equivalent of todays "connect" on GSR and ASR9k or connecting to RP serial port).

[P2]: It should be possible to connect to the OOB from the RP as well (to diagnose OOB connectivity problems).

[P2]: Upload software to the RP or otherwise make information available to the RP (for later re-install/turboboot for example). RP should have access to local storage on the OOB device to transfer configuration or software from the OOB device to the RP).

[P2]: Read logs and other state of the components in the chassis (displays and LEDs) plus what kind of card is in each slot.

[P1]: The OOB port should support (configurable), telnet, ssh and optionally [P3] https login (with a java applet or equivalent to give CLI access in the browser) with ACLs to limit wherefrom things can be done. OOB should support ssh key based logins to admin account.

[P1]: The IP address of the OOB port should be set via DHCP/DHCPv6/SLAAC and should have both IPv4 and IPv6 support. If not both, then IPv6 only.

[P1]: It should be possible to transfer data using tftp, ftp and scp (ftp client on the OOB device, scp being used to transfer data *to* the device (OOB being scp server).

[P2]: OOB device should have tacacs and radius and [P1] local user/password database support for authentication. [P3] OOB should support ssh-key based authentication.

[P3] Chassis should have a character display or LEDs with configurable blink pattern from OOB, to aid remote hands identification.

[P3] OOB should have two USB ports, one to use to insert storage to transfer files to/from device. The other should be USB port that presents itself as ether USB serial port, or USB ethernet port, where the OOB device would have built in DHCP/DHCPv6 server to give IPv4v6 access to a laptop connected to the OOB so the onsite engineer can then use ssh/telnet to administrate the OOB. Optionally this port could be ethernet port (compare todays CON and AUX ports).

[P2] OOB should have procedure to factory default its configuration, perhaps physical button that can be pressed and held for duration of time. The fact that this is done should be logged to the RP.

[P3] OOB should have possibility to show power supply and environmental state.

[P3] The factory default configuration should not include an empty or obvious login password. The factory-default login password should be the MAC address (without punctuation) of the OOB ethernet interface, which should be printed on the chassis next to the OOBE port.

Saku_Ytti1 · January 9, 2013, 3:58pm

equipment already have an mgmt ethernet port, but usually this can't
do "everything", meaning today one has to have OOB ethernet *and*
OOB serial which just brings more pain than before.

The key difference is, that those are not OOB at all, they are on-band as
they fate-share the control-plane.
Having real OOB port is demand everyone should put in their RFQ/RFP.

http://swm.pp.se/oob.txt

Fully support. In essence, all CSCO needs to do, is bring CMP back that
they had in Nexus7k/SUP1 and SUP2T but removed again in Nexus7k/SUP2 citing
thermal, pintcount and lack of customer adoption.
Other vendors need to check out CMP and copy it.

I emailed freescale, since they are the ones who can solve the thermal and
pincount problem by implementing mgmt board directly. They replied
'something similar will be implemented in the next generation of our
multicore processors' (big kudos to large semi to replies quickly and
cluefully to non-customer queries)
Once there is hardware for this, getting vendors to implement software
should be easy peasy.

William_Herrin · January 9, 2013, 4:18pm

I have together with some other people, collected a wish list for OOB
support, mainly aimed for core routers.

Hi Mikael,

I generally agree but have several quibbles:

[P1]: The IP address of the OOB port should be set via DHCP/DHCPv6/SLAAC and
should have both IPv4 and IPv6 support. If not both, then IPv6 only.

(a) This is a P2 not a P1. Asking the OOB to be critically dependent
on an external network element is dubious to begin with but even if
desired it's usable without.

About the only time you'd strictly *need* dynamic configuration in an
OOB is when directly connecting it to a commodity Internet link. If
you're willing to give your poorly secured and rarely updated OOB a
public IP address, you're a braver man than I am. If you are that
"brave" then you'll need a more robust set of dynamic configuration
tools than just the ones you've listed and you'll also need a dynamic
dns client or some other mechanism for the the OOB to let you know
what addresses it ended up on.

(b) IPv6-only in an OOB won't be broadly acceptable for at least
another 5 years if then. You'd be foolish not to include IPv6 support
in a greenfield design -- the writing is on the wall -- but there are
today very few scenarios in which an IPv4 only OOB would not be
usable.

[P1]: It should be possible to transfer data using tftp, ftp and scp (ftp
client on the OOB device, scp being used to transfer data *to* the device
(OOB being scp server).

For security and performance reasons, FTP has no place in a modern
network. If you're still using it anywhere, you're borrowing grief.
Replace with an http/https client.

TFTP has such a strong legacy of use on routers that its presence
remains just barely tolerable. For now.

Have a look at how HP iLO3 makes use of http to implement virtual
media. You can upload an ISO image to a web server somewhere and then
instruct ilo to mount the URL as a virtual dvdrom. Best of all, if
your management session disconnects, the virtual media remains mounted
via the web server.

Regards,
Bill Herrin

Justin_M_Streiner · January 9, 2013, 4:20pm

Ethernet/Serial/USB management is useful, but I would not be in favor of eliminating the serial port entirely, because having an OOB serial console connection has saved me more than a few times. Plus, having working connectivity from something like a terminal server or a headless Linux box has proven its value countless times in the past.

I would also add the following to your list:

If $vendor's device provides an IP-based management interface over an ethernet port, it should provide both a web-based interface and a CLI. The web interface should be as platform-agnostic as possible (not restrict people to specific platforms and browsers), and be as gentle as possible in requiring things like Java. Being forced to deal with Java runtime version dependency hell in a critical situation would not fill my heart with joy.

jms

Christopher_Morrow · January 9, 2013, 4:21pm

it's possible that he's thinking of a world where your dhcp is not
'dynamic' but a management system which can keep all the other bits of
information updated (and easily updatable!) for the remote nodes:
  ip address
  def-gw
  dns servers

for instance.

William_Herrin · January 9, 2013, 4:25pm

Sure, but in that scenario you don't *need* a dhcp system, it's merely
a "nice to have." Hence a P2 not a P1.

Regards,
Bill Herrin

Mikael_Abrahamsson · January 9, 2013, 4:47pm

Well, I was actually thinking more about initial factory default configuration.

After I can reach the device, I would like to be able to set a static address. I'll consider adding this to the document.

My grief with this is that if we're going to go into that kind of level, we need a RFC style document with a lot of detail, and that wasn't what I was initially aiming for. I wanted more to spark the discussion and see what came out of it. If there indeed is a lot of interest in this, I'd gladly like to try to create a more detailed document.

I would be very happy if multiple vendors could standardise on a functionality and software though, perhaps even with API. Don't know which standards body would be right for this though.

Saku_Ytti1 · January 9, 2013, 4:48pm

(a) This is a P2 not a P1. Asking the OOB to be critically dependent
on an external network element is dubious to begin with but even if
desired it's usable without.

Agreed that P2 suffices. Usage scenario is installing fresh router. You
order router from vendor to remote location, notsosmarthands plug it to
wires, boom you configure it remotely.

About the only time you'd strictly *need* dynamic configuration in an
OOB is when directly connecting it to a commodity Internet link. If
you're willing to give your poorly secured and rarely updated OOB a
public IP address, you're a braver man than I am. If you are that

This is not absolute truth, but depends on what hat you wear. If you are DC
guy, you have handful of POPs, arranging proper OOB network there is a
breeze.
If you are incumbent, you can't buy anything externally, as everyone buys
from you, so you need to build separate network just for OOB.

All other service providers may have hundreds of pops, you're not going to
build non-revenue generating network to reach all those hundreds of pops,
just to build OOB.
You get cheapest connection you can get there, maybe competitor ADSL, cable
model, 3G, public WLAN, ISDN what ever is available which is not
fate-sharing with your network.
Then plug in say cisco CPE to the OOB port, which offers address via DHCP
and connect over IPSEC DMVPN to your own network. 0 touch installation of
new router. Some might be ghetto and omit the CPE and use IPSEC from the
management plane to openswan linux.

(b) IPv6-only in an OOB won't be broadly acceptable for at least
another 5 years if then. You'd be foolish not to include IPv6 support
in a greenfield design -- the writing is on the wall -- but there are
today very few scenarios in which an IPv4 only OOB would not be
usable.

Agreed. IPv4 would be priority for most.

For security and performance reasons, FTP has no place in a modern
network. If you're still using it anywhere, you're borrowing grief.
Replace with an http/https client.

http(s), scp would be my picks. Hell with FTP.

TFTP has such a strong legacy of use on routers that its presence
remains just barely tolerable. For now.

There is no standard way to send arbitrary size files over TFTP, not worth
the pain.

Mikael_Abrahamsson · January 9, 2013, 4:55pm

Today yes. In 2-4 years when this might be a reality, I don't want IPv4 only device. I rather go for IPv6 only immediately.

Leo_Bicknell1 · January 9, 2013, 5:12pm

I think this list goes too far, and has a decent chance of introducing
other fun failure modes as a result. The goal of OOB is generally
to gain control of a "misbehaving" device. Now, misbehaving can
take many forms, from the device actually being ok and all of it's
circuits going down (fiber cut isolating it), to the device being
very much not ok with a bad linecard trying to lock up the bus,
core dumps, etc.

I'm going to pick on one specific example:

In a message written on Wed, Jan 09, 2013 at 03:37:16PM +0100, Mikael Abrahamsson wrote:

[P1]: Powercycle the RP, switchfabrics and linecards (hard, as in they
might be totally dead and I want to cut power to it via the back plane.
Also useful for FPGA upgrades).

Most Cisco high end devices can do this today from the CLI (test
mbus power off on a GSR, for example). Let's consider what it would
take to move that functionality from the live software to some sort
of "etherent oob" as proposed...

The first big step is that some sort of "computer" to operate the
ethernet oob is required. I think where you're going with this is
some sort of small SoC type thing connected to the mangement buss
of the device, not unlike an IPMI device on a server. Using IPMI
devices on servers as an example this is now another device that
must be secured, upgraded, and has the potential for bugs of its
own. Since only a small fraction of high end users will use the
OOB at all (inband is fine for many, many networks), there will not
be a lot of testing of this code, or demand for features/fixes.

So while I agree with the list of features in large part, I'm not sure I
agree with the concept of having some sort of ethernet interface that
allows all of this out of band. I think it will add cost, complexity,
and a lot of new failure modes.

The reality is the current situation on high end gear, a serial console
plus ethernet "management" port is pretty close to a good situation, and
could be a really good situation with a few minor modifications. My
list would be much simpler as a result:

1) I would like to see serial consoles replaced with USB consoles. They
   would still appear to be serial devices to most equipment, but would
   enable much faster speeds making working on the console a much more
   reasonable option. For bonus points, an implementation that presents
   2-4 serial "terminals" over the same USB cable would allow multiple
   people to log into the device without the need for any network
   connectivity.

This would also allow USB hubs to be used to connect multiple devices
in a colo, rather than the serial terminal servers needed today.

2) I would like to see "manangement" ethernets that live in their own
   walled off world out of the box. Yes, I know with most boxes you can
   put them in a VRF or similar, but that should be the default. I
   should be able to put an IP and default route on a management ethernet
   and still have a 100% empty (main) routing table. This would allow
   the management port to be homed to a separate network simply and
   easily.

3) I would like to see "legacy protocols" dumped. TFTP, bye bye. FTP,
bye bye. rcp, bye bye. HTTP, HTTPS, and SCP should be supported
for all operations at all levels of the OS.

In this ideal world, the deployment model is simple. A small OOB
device would be deployed (think like a Cisco 1900, or Juniper SRX
220), connected to a separate network (DSL, cable modem, cell modem,
ethernet to some other provider, or gasp, even an old school analog
modem). Each large router would get an ethernet port and usb console
to that device. SSH to the right port would get the USB console,
ideally with the 2-4 consoles exposed where hitting the same port just
cycles through them.

At that point all of the functionality described in the original
post should be available in the normal CLI on the device. File
transfer operations should be able to specify the management port
"copy [mangement]http://1.2.3.4/newimage.code flash:" to use that
interface/routing table.

I also think on most boxes this would require no hardware changes. The
high end boxes have Ethernet, they have USB...it's just updating the
software to make them act in a much more useful way, rather than the
half brain-dead ways they act now...

Saku_Ytti1 · January 9, 2013, 5:34pm

It already exists, CMP is its name, and it is great. Server people woke up
to this over decade ago, so should networking people.

Failure modes are somewhat uninteresting, as long as it does not fate-share
with control-plane.

Having RS232 or USB console on forwarding-plane is not OOB. And even OOB
version of these is of limited value, you can't send images over them, you
can't multiplex over them and RS232 OOB 'server' costs more than switch. So
you get less and you pay more.
HW + SW wise it's extremely simple contraption, all the code and HW needed
is proven.

RS232 on-band management is case of 'well it's always done like this, so it
must be the right way to do it'

Mikael_Abrahamsson · January 9, 2013, 5:39pm

of the device, not unlike an IPMI device on a server. Using IPMI

IPMI is exactly what we're going for.

In this ideal world, the deployment model is simple. A small OOB
device would be deployed (think like a Cisco 1900, or Juniper SRX
220), connected to a separate network (DSL, cable modem, cell modem,
ethernet to some other provider, or gasp, even an old school analog
modem). Each large router would get an ethernet port and usb console
to that device. SSH to the right port would get the USB console,
ideally with the 2-4 consoles exposed where hitting the same port just
cycles through them.

This is added cost and complexity. Sometimes there is only 1-2 devices in the pop, and now there is need to install a serial console router with DC (limits options) just to connect to the serial console, which might not work anyway because the control plane might be so screwed up that it actually needs power cycling.

So I want to retire serial ports in the front to be needed for normal operation. Look at the XR devices from Cisco for instance. For "normal maintenance" you pretty much require both serial console (to do rommon stuff one would imagine shouldn't be needed) and also mgmt ethernet (to use tftp for downloading software when you need to turbo-boot because the system is now screwed up because the XR developer ("install") team messed up the SMUs *again*).

For instance, if you have single RP the upgrade instructions for 4.2.1 lists going into rommon and doing "boot -s", *or* power cycling the box, after FPGA upgrade.

TGLASSEY · January 9, 2013, 5:54pm

I think this list goes too far, and has a decent chance of introducing
other fun failure modes as a result. The goal of OOB is generally
to gain control of a "misbehaving" device. Now, misbehaving can
take many forms, from the device actually being ok and all of it's
circuits going down (fiber cut isolating it), to the device being
very much not ok with a bad linecard trying to lock up the bus,
core dumps, etc.

I'm going to pick on one specific example:

In a message written on Wed, Jan 09, 2013 at 03:37:16PM +0100, Mikael Abrahamsson wrote:

[P1]: Powercycle the RP, switchfabrics and linecards (hard, as in they
might be totally dead and I want to cut power to it via the back plane.
Also useful for FPGA upgrades).

Most Cisco high end devices can do this today from the CLI (test
mbus power off on a GSR, for example). Let's consider what it would
take to move that functionality from the live software to some sort
of "etherent oob" as proposed...

Install an Embedded Peer (a Bus Level Peering Card like a SlotServer or the Fuji Module and provide console level access through the peer. Then the Peer itself becomes the controller interface.

Todd

Leo_Bicknell1 · January 9, 2013, 6:18pm

In a message written on Wed, Jan 09, 2013 at 06:39:28PM +0100, Mikael Abrahamsson wrote:

IPMI is exactly what we're going for.

For Vendors that use a "PC" motherboard, IPMI would probably not be
difficult at all!

I think IPMI is a pretty terrible solution though, so if that's your
target I do think it's a step backwards. Most IPMI cards are prime
examples of my worries, Linux images years out of date, riddled with
security holes and universally not trusted. You're going to need a
"firewall" in front of any such solution to deploy it, so you can't
really eliminate the extra box I proposed just change its nature.

I also still think there's a lot of potential here to take gigantic
steps backwards. Replacing a serial console with a Java applet in
a browser (a la most IPMI devices) would be a huge step backwards.
Today it's trival to script console access, in a Java applet world,
not so much.

Having a IPMI like device with dedicated ethernet and connection to the
management bus would allow it to have a web interface to do things like
power cycle individual line cards and may be a win, but I would posit
these things are to work around horribly broken upgrade procedures that
vendors have not given enough thought. They could be solved with more
intelligent software in the ROM and on the main box without needing any
add on device.

So I want to retire serial ports in the front to be needed for normal
operation. Look at the XR devices from Cisco for instance. For "normal
maintenance" you pretty much require both serial console (to do rommon
stuff one would imagine shouldn't be needed) and also mgmt ethernet (to
use tftp for downloading software when you need to turbo-boot because the
system is now screwed up because the XR developer ("install") team messed
up the SMUs *again*).

Your vendor is going to hire those same developers to write the code for
your OOB device. The solution here is not bad developers writing and
deploying even more code, it's to demand your vendors uplevel their
developers and software.

Ever have these problems on Vendor J? No, the upgrade process there is
smooth as silk. Not to say that vendor is perfect, they just have
different warts.

Saku_Ytti1 · January 9, 2013, 6:41pm

I also still think there's a lot of potential here to take gigantic
steps backwards. Replacing a serial console with a Java applet in
a browser (a la most IPMI devices) would be a huge step backwards.
Today it's trival to script console access, in a Java applet world,
not so much.

P1 requirement was ssh.

Ever have these problems on Vendor J? No, the upgrade process there is
smooth as silk. Not to say that vendor is perfect, they just have
different warts.

I'm getting maybe bit too far from topic.

We have different opinion of smooth or silk. I hate how in J once you do
the upgrade, current config is stored with it, so when you finally boot,
you're using that configuration.
So this means, you can't install new image to all boxes at once you decide
new standard release and then reload then when you get maint window,
without having any extra work during expensive maint hours. Also I've seen
very poor hit-miss ratio with ISSU, so I can't use it at all.

Dobbins_Roland · January 9, 2013, 11:17pm

Flow telemetry export - many of these so-called 'management' ports can't be used to export flow, oddly enough.

Randy_Carpenter · January 10, 2013, 3:05am

My main requirements would be:

1. Something that is *not* network (ethernet or otherwise) (isn't that the point of OOB?)
2. Something that is standard across everything, and can be aggregated easily onto a "console server" or the like

I don't really see what is wrong with with keeping the serial port as the standard.

Things like servers and RAID cards and such are coming with "BIOS"es that are graphical and even require a mouse to use. What use is that when I need to get into the BIOS from a remote site that is completely down?

Likewise OS vendors are increasingly dropping support for installing OSes via serial port (RHEL, VMWare, etc.)

At leaset with RHEL, you can make your own boot image that gets rid of the asinine splash screen (which is the only thing that causes the requirement for a full VGA console)

It might be nice to have a "management-only" port of some sort to do more advanced things that serial cannot do, but the serial port is ubiquitous already, and I don't see any reason to remove it as the very low-level access method.

thanks,
-Randy

Chris_Adams3 · January 10, 2013, 3:15am

Once upon a time, Randy Carpenter <rcarpen@network1.net> said:

Likewise OS vendors are increasingly dropping support for installing OSes via serial port (RHEL, VMWare, etc.)

At leaset with RHEL, you can make your own boot image that gets rid of the asinine splash screen (which is the only thing that causes the requirement for a full VGA console)

RHEL installs with a serial console just fine. You also don't have to
"make your own boot image" to get a non-graphical boot.

Warren_Bailey1 · January 10, 2013, 3:16am

Uplogix has a pretty rad solution..

Randy_Carpenter · January 10, 2013, 3:55am

Probably a bit off topic for this thread, but...

If I boot the default install disc/image on any of my servers (mostly Supermicro), it hangs at a blank screen when isolinux loads. If you get rid of the splash screen, it works fine. This has been an issue since RHEL4, I think.

Maybe other server manufacturers handle the video a little differently, and are able to get past the splash screen.

-Randy