Managing IOS Configuration Snippets

Howdy network operator cognoscenti,

I'd love to hear your creative and workable solutions for a way to track
in-line the configuration revisions you have on your cisco-like devices.
Let me clearify/frame:

You have a set of tested/approved configurations for your routers which use
IOS style configuration. These configurations of course are always refined
and updated. You break these pieces of configuration into logical sections,
for example a configuration file for NTP configuration, a file for control
plane filter and store these in some revision control system. Put aside for
the moment whether this is a reasonable way to comprehend deployed
configurations. What methods do some of you use to know which version of a
configuration you have deployed to a given router for auditing and update
purposes? Remarks are a convenient way to do this for ACLs - but I don't
have similar mechanics for top level configurations. About a decade ago I
thought I'd be super clever and encode versioning information into the snmp
location - but that is just awful and there is a much better way everyone
is using, right? Flexible commenting on other vendors/platforms make this a
bit easier.

Assume that this version encoding perfectly captures what is on the router
and that no person is monkeying with the config... version 77 of the
control plane filter is the same everywhere.

I started a long email that really should just be a blog post. I need to get a blog or something.

Short story is this:

NETCONF is probably the future of change management on all types of routers and switches. It's not supported everywhere yet and is missing lots of features but they're working on it. Look at the talk given at NANOG60 for more information.

There is a puppet module that is also incomplete. I'm not sure this is the right way to go (http://puppetlabs.com/blog/puppet-network-device-management)

Most people roll their own solution. If you're looking to do that consider using augeas for parsing the configuration files. It can be really useful for documenting changes, and probably to diff parts of the config. You might also consider rabbitmq or another message queue to handle scheduling and deploying the changes. It can retry failed updates. You should work towards all or nothing commits (not all cisco gear supports this, but you can fake it in a couple of ways. Ultimately you want to rollback to a known good configuration if things go wrong)

If you have money and want this right now:

Consider looking at Tail-F's NCS, which according to marketing presentations appears to do everything I want right now. I'd like to believe them but I don't have any money so I can't test it out. :slight_smile:

Cheers,
Robert

I should amend that even though I recommend all this I haven't used any of it for networking. I guess those are more shiny ball ideas than actual things I've used. We have perl scripts that wrap an in-house API to access our IPAM which generates initial configuration. The template files are a mix of m4 and Template::Toolkit.

We use basically one-off perl scripts for auditing sections of the configs to find discrepancies. We use rancid to collect configs. We just started using netdot which is nice for topology discovery. TACACS and DHCP logs are parsed and stored in logstash. All of those tools provide the who, what, where and when but not the why. The why would require a bit more custom stuff and forcing people to use a frontend interface instead of directly touching the routers. We aren't ready for that yet.

Robert - all great suggestions. Big cross-vendor configuration generation
and deployment is outside the scope of what I was hoping for here. The goal
is to have the version information somehow encoded into the configuration,
and I'm not sure that NETCONF has anything to say about that matter.
Certainly the same problem of which-versions-are-where exists in the
puppet/chef world and there are platform specific ways to answer those
questions. Deep analysis of the router configuration itself can give pretty
strong hints about which version are deployed, but lets assume full config
digestion and comparison is out of the question. From some off-list
responses I am hearing that some folks do similar kludges with other text
fields, wether they be remark/banner/snmp-foo/interface descriptions.

Robert - all great suggestions. Big cross-vendor configuration generation
and deployment is outside the scope of what I was hoping for here. The goal
is to have the version information somehow encoded into the configuration,
and I'm not sure that NETCONF has anything to say about that matter.
Certainly the same problem of which-versions-are-where exists in the
puppet/chef world and there are platform specific ways to answer those

puppet solves this by comparing a complete md5(file) with deployed
md5(file)... not as simple to do that on:
  access-list 150 permit icmp any any
  access-list 150 permit tcp any eq 80 any
  access-list 150 deny ip any any

it'd be super nice if you could grab out just the hermetic bit of
config you care about, and md5sum() that, eh? provided your stored
config was written out in the IOS version (specific?) spacing/etc
manner, of course.

questions. Deep analysis of the router configuration itself can give pretty
strong hints about which version are deployed, but lets assume full config
digestion and comparison is out of the question. From some off-list
responses I am hearing that some folks do similar kludges with other text
fields, wether they be remark/banner/snmp-foo/interface descriptions.

this makes me sad... but go 'state of the art network equipment!'

is it time to start asking vendors for more operable configuration
storage and access?

We are evaluating a piece of software called Skybox:

  http://www.skyboxsecurity.com/

It's geared to security analytics, but it does allow you to
define configurations that are expected on a device, what
software version it is running, whether commands that aren't
there are, and those that should be there aren't, e.t.c.

It supports all major network equipment vendors, and also
allows for simple or complex regular expressions that can be
used to search configuration files more easily.

It is an offline system, so all you do is regularly present
it with a text file of the device's running configuration,
and it will do the necessary checks per the policy you have
defined.

Based on the configuration files it has, it can also create
a visual model of your network. Not something you'd rely on
given you have other tools for that, but kind of cool,
nonetheless.

Worth a look, I'd say.

Mark.

For a large install I set up a solution that might help. I utilized a
Mediawiki install and its API to create, update and pull the
configuration on many IOS devices. A wiki page for the host name was
dynamically created and the configuration was placed there daily or
hourly. This allowed support to review the configuration and advise
customers quicker. Additional hacks for updating the devices via the
wiki were used. The goal was transparency for the support team and the
side effect was wiki page history showing what day and what lines
changed. As mentioned the answer to your question would likely make a
good article.

To clarify a bit, systems to grab or store the running config or keep track
of intent. Let's assume that comparing the deployed configuration of an
individual device to intent derived from a bunch of configuration bits from
an RCS system is *hard*.

For example, let's say you have a vty configuration which has a couple
sections, line vty 0 2 and line vty 3 5. Someone updates this configuration
in your RCS which removes the access-class from line vty 0 2 and adds it to
the access-class for line vty 3 5. Let's also assume that you have *lots*
of devices and *lots* of configurations and you cannot reasonably
egrep/regexp your way to success here.

I thank you all for your responses. I was hoping that someone trick I was
not seeing and would say "oh, you just need to do..."

Tail-F is probably least bad option out there.

In configuration management, this is super easy:

DB => Template => Network

This is super hard:

Network => DB

The first one keeps all platform specific logic in flat ascii files filled
with variables from template.
When you introduce new product, feature, vendor to network, you only add new
ascii templates, extremely easy, no platform-specific logic in DB.

The second one every little change in network, requires parser changes trying
to model it back to DB. This is not sustainable. We can kid ourselves that
NetCONF/YANG will solve this, but they won't. SNMP is old technology, when new
feature comes to vendor, it may take _years_ before MIB comes. There is no
reason to suspect you will be able to get feature out via NetCONF just because
it is there. And if you can't do it 100% then you have to write parser which
can understand it.

You only need the second one, in case 100% is not from DB. But it is actually
trivial to produce 100% from DB. You don't want DB to model base
configuration, that's lot of work for no gain, that'll come from template or
at most DB vendor-specific-blob.
Then after you push configuration from DB to network, you immediately collect
configuration and create relation of DB-config 2 network-config, now you can
keep ensuring network has correct config. If it does not have, you don't know
why not, you can't fix the error itself, but you can repovision whole box, so
you do get configuration conformance check, it's just very crude.

But the alternative, trying to understand network config, is just never ending
path to to pain. If someone is going to do it, model it to python or ruby ORM
and put it in github so others can contribute and we don't need to do it
alone.

Agree with this.

We started out with rancid, quickly moved to a homebrew scp and git backed
system with webgit/cgit as the user interface. If you are lucky your
network equipment supports "advanced features" like ssh keys. If not, you
might be stuck using sshpass to ease config collection.

Built a config parsing system that would decompose monolithic configs into
configlet files. Md5sum the file and use as part of the filename. You can
then see "version" information for parts of the config tree. Quickly
realized that maintaining this system is a full time job, due to the
advanced status of network equipment software...

Now looking at Tail-F NCS. Demo is impressive. I'm hopeful.

Stating the obvious: the software running on most network equipment is of
poor quality. The tools to manage this are a combination of high quality
engineers and homebrew tools. Vendor tools are of a similar quality to the
equipment software. I'd like to think "SDN" is an attempt to improve this,
but I have my doubts.

We've gone off the rails a bit here. The 'in-line' bit was really at the
heart of my question. With the number of responses so far it's starting to
sound like there is not an answer other than kludge.

"workable solutions for a way to track *in-line* the configuration
revisions you have on your cisco-like devices"

This could be on brocade/hp/arista/ios, so "ask the vendor to..." is not a
(short term) solution. Assume we've got rancid-git, super awesome network
engineers who write configuration bits and test and review them, and a
super dooper tempating config pushy tool with retries and a blinky
dashboard.

Now, I hand you the 'show run' output and ask you if version 77 of the vty
config is on this device. Can you answer the question? Now I hand you the
'show run' from 10,000 more device configs - and 100 more configuration
chunks from revision control. Can you still answer the question? Assume a
magical revision-history-aware configuration cross reference parser (while
a noble and lovely goal) is not available.

A couple more thoughts, regarding

Network => DB

I completely agree that trying to use the network config itself as the
authority for what we intend to be on a device is not the right long-term
approach. There is still a problem with Network => DB that I see. Assuming
you have *many* devices, that may or may not be up at a given time, or may
be in various stages of turn-up / burn-in / decom it is expected that a
config change will not successfully make it to all devices. There are other
timing issues, like a config built for a device being turned up, followed
by a push of an update to all devices that "succeeds", followed by the
final turn-up of this device. Even if you have a fancy config pushing
engine, let's just take as a given that you'll need to scrub through your
rancid-git backups to determine what needs to be updated.

Regarding the MD5 approach, let's also think that configlets could have
"no" commands in them. In the NTP example I had before, if we wanted to
remove an NTP server the configlet would need the "no" version, but the
rancid backup obviously would not have this. I'm not trying to work a unit
test assertion framework here either. Some vendors have more robust
commenting, and this can be quite convenient for explicitly stating what
was pushed to the device. What are you using in your network... banner,
snmp-location, hope, prayer?

We don't do this, but the only flexible commenting in IOS style configs is
ACLs.

You could have an ACL that contains remarks only, and include version
information:

ip access-list CFG-VER
remark CFG-VER-NTP 1.0.3
remark CFG-VER-VTY 4.3.2
end

You could break this into individual ACLs if you prefer:

ip access-list CFG-VER-NTP
remark CFG-VER-NTP 1.0.3
end

ip access-list CFG-VER-VTY
remark CFG-VER-VTY 4.3.2
end

Seems ridiculous, but that is the sorry state of the network OS.

For DB => Template => Network it's to me very easy, but yes, each template you
make must have anti-template version.
So let's say you have NTP model, which may contain some access restriction
information, NTP version, NTP peers. When you apply this model to device, then
some platform specific ntp template is called. If you remove this from device,
you need to call 'anti' version of the template. Very simple and easy.

You also wondered how do I know which version of config network device has,
this is hard problem. To know exactly what is wrong and how to address just
that. If you can relax requirement to know if configuration is correct or
incorrect it becomes trivial.
But fixing incorrect is either full reprovision of new config (at least in IOS
and JunOS not a problem, won't break the unchanged bits). Or you have human
resolve it (of course as custom dictates first you punish the responsible
severely but swiftly)

Howdy network operator cognoscenti,

I'd love to hear your creative and workable solutions for a way to track
in-line the configuration revisions you have on your cisco-like devices.

...

Assume that this version encoding perfectly captures what is on the router
and that no person is monkeying with the config... version 77 of the
control plane filter is the same everywhere.

At a previous job, our roll-your-own solution was a template based system(*) generating full configs; all the version history for template sections, per-router local tweaks, and generated results was kept in RCS, and the actual last-configured version, plus any incremental changes, was kept in the login banner.
So at login you'd see something like:

blah blah authorized users blah blah
$Id: routername-confg,v 1.23 2013/08/20 03:07:16 username Exp
INCR: 1.2,1.3,1.4,1.5,1.6

and that version tracking made its way through to rancid for easy offline auditing. This made it nice and easy to tell when and what had been updated, though it still would take a couple steps to identify what exact subsections had been changed over time (since the incremental version tags were specific deltas in per-device configurations. You could probably do it in a more global way too - git commit ids, maybe? - but you also don't want to make the version string too wordy either).

-e

* based on ftp://ftp.cac.washington.edu/pub/config-generator/, but substantially enhanced beyond the last public domain version. I know I'd be really happy if the current version was ever open-sourced...

This has been around for several years now -
http://sourceforge.net/projects/cisco-conf-rep/

Along those same lines, we've been using alias exec for the same thing for a
while:

Alias exec NTP 6500_NTP_V1.0.1
Alias exec bgp 6500_peer_V2.0.0

Thanks,

Chuck

But that's just archiving, like rancid, right? Still doesn't have any correlation to the template-management side of things. While having the backups makes it easy to check for simple things ("do all my routers have the right syslog host set?"), OP's question is about tracking what versions of templates may have been applied to routers; if there's any complex logic (like, "are all active customer routes on this device included in the bcp38 acl on the upstream interface") or site-specific things, that can get a lot harder to audit without the metadata on how the configuration got there.
-e

A lot of template management discussion focusses on using the network
configs as the canonical model of the network.
Storing the network model in the DB (whatever form that takes) is much
more sane.

There is the brownfields issue of populating that database and then
building device state from there, but once that's done a lot of the
problems go away.
A solution like rancid/tail-f then simply becomes the mechanics to
push your device state to the devices.

Some good stuff here:
https://www.nanog.org/meetings/nanog44/presentations/Monday/Gill_programatic_N44.pdf

--Simon

Very cool, thanks Erik. I can think of many ways to encode version
metadata. Probably best to be somewhere in between overly verbose (full
version $Id / date / author for every config chunk) and being unreadable
(base64 encoded gzip of unique configlet identifiers and versions).
Updating a banner feels a bit easier when you are pushing a full
device-specific configuration from a templating system. Regardless of where
it is stored, keeping the metadata in one of these fields (banner for
example) means that checking the contents of the banner configlet now
requires slightly more logic - which is fine.

Chuck, interesting use of alias.

Simon, completely agree that the network itself should not be the intent
store. The real focus here is when your intent is in a DB/templating system
thingy, how do you operationally ensure that intent matches reality. Again,
with many devices going through upgrades, disabled/unreachable devices, new
devices, pre-configured devices. The intent pusher is not blindly and
constantly pushing to all devices, and it's likely not safe to do that.