Best practices inquiry: tracking SSH host keys

We all know that the weakest link of SSH is key management: if
you do not confirm by a secure out of band channel that the
public host key of the device you are connecting to is correct,
then SSH's crypto will not help you.

SSH implements neither a CA hierarchy (like X.509 certificates) nor
a web of trust (like PGP) so you are left checking the validity of
host keys yourself. Still, it's not so bad if you only connect to a
small handful of well known servers. You will either have verified
them all soon enough and not be bothered with it anymore, or system
administrators will maintain a global known_hosts file that lists
all the correct ones.

But it's quite different when you manage a network of hundreds or
thousands of devices. I find myself connecting to devices I've
never connected to before on a regular basis and being prompted
to verify the public host keys they are offering up. This happens
in the course of something else that I am doing and I don't
necesarily have the time to check a host key. If I did have time,
it's hard to check it anyway: the device is just one of a huge
number of network elements of no special significance to me and
I didn't install it and generate its key and I don't know who did.

From time to time I also get hit with warning messages from my

SSH client about a changed host key and it's probably just that
someone swapped out the router's hardware sometime since the
last time I connected and a new key got generated. But I'm not sure.
Worst of all, my problem is repeated for every user because each
user is working with their own private ssh_known_hosts database
into which they accept host keys.

A possible solution is:

- Maintain a global known_hosts file. Make everyone who installs
a new router or turns up SSH on an existing one contribute to it.
Install it as the global (in /etc/) known_hosts file on all the
ssh clients you can.

Pro: The work to accept a new host key is done one, and it's
done by the person who installed the router, who is in the best
position to confirm that no man in the middle attack is taking
place.

Con: You need to make sure updates to this file are authentic
(its benefit is lost if untrusted people are allowed to
contribute), and you need to be sure it gets installed on the
ssh clients people use to connect to the network elements.

Con: If a host key changes but it is found to be benign (such as
the scenario I describe above), users can't do much about it
until the global file is corrected and redeployed (complicated
openssh options which users will generally not know to bypass
the problem notwithstanding).

I'm looking for information on best practices that are in use to
tackle this problem. What solutions are working for you?

Thanks

-Phil

We all know that the weakest link of SSH is key management: if
you do not confirm by a secure out of band channel that the
public host key of the device you are connecting to is correct,
then SSH's crypto will not help you.

SSH's crypto won't help you accomplish what? Using host-based auth
between "trusted" (secured, point of entry) hosts and hosts that don't
have a public facing facility for authentication/login simplifies my
life, I don't _need_ root passwords, and my points of entry are much
easier to monitor for foul play. Requiring everybody who needs
superuser access to use their own ssh private keys allows us to
efficiently manage access without having to hand out passwords to
geographically dispersed locations.

SSH implements neither a CA hierarchy (like X.509 certificates) nor
a web of trust (like PGP) so you are left checking the validity of
host keys yourself. Still, it's not so bad if you only connect to a
small handful of well known servers. You will either have verified
them all soon enough and not be bothered with it anymore, or system
administrators will maintain a global known_hosts file that lists
all the correct ones.

But it's quite different when you manage a network of hundreds or
thousands of devices. I find myself connecting to devices I've
never connected to before on a regular basis and being prompted
to verify the public host keys they are offering up. This happens
in the course of something else that I am doing and I don't
necesarily have the time to check a host key. If I did have time,
it's hard to check it anyway: the device is just one of a huge
number of network elements of no special significance to me and
I didn't install it and generate its key and I don't know who did.
From time to time I also get hit with warning messages from my
SSH client about a changed host key and it's probably just that
someone swapped out the router's hardware sometime since the
last time I connected and a new key got generated. But I'm not sure.
Worst of all, my problem is repeated for every user because each
user is working with their own private ssh_known_hosts database
into which they accept host keys.

This seems like it can easily be fixed via security policy.

A possible solution is:

- Maintain a global known_hosts file. Make everyone who installs
a new router or turns up SSH on an existing one contribute to it.
Install it as the global (in /etc/) known_hosts file on all the
ssh clients you can.

Pro: The work to accept a new host key is done one, and it's
done by the person who installed the router, who is in the best
position to confirm that no man in the middle attack is taking
place.

Con: You need to make sure updates to this file are authentic
(its benefit is lost if untrusted people are allowed to
contribute), and you need to be sure it gets installed on the
ssh clients people use to connect to the network elements.

Con: If a host key changes but it is found to be benign (such as
the scenario I describe above), users can't do much about it
until the global file is corrected and redeployed (complicated
openssh options which users will generally not know to bypass
the problem notwithstanding).

I'm looking for information on best practices that are in use to
tackle this problem. What solutions are working for you?

I'm assuming you have in place procedures that happen on a regular
basis, a finite number of ip addresses that will be connected to by
any and all technicians, and access to standard openssh tools.

Why not, on a regular basis, use ssh-keyscan and diff or something
similar, to scan your range of hosts that DO have ssh on them (maybe
nmap subnet scans for port 22?) to retrieve the host keys, compare
them to last time the scan was run, see if anything changed, cross
reference that with work orders by ip or any other identifiable
information present, and let the tools do the work for you. Cron is
your friend. Using rsync, scp, nfs or something similar it wouldn't be
very difficult to upkeep an automated way of updating such a list once
per day across your entire organization.

If you're worried about password security and having a rogue device
sniff a globally-used password, perhaps it's time to start looking
into private ssh-keys and/or a single point of entry policy. If you're
capable of preventing chaotic access, that might even keep you from
having to deploy your /etc/ssh/ssh_known_hosts to every machine in
your organization.

It really seems to me that you really need to get lazy and see what
kind of automation steps are there for you to exploit for your sanity.
I've only got 20 machines, but I've got very defined paths for how to
get from an untrusted (home) network to a trusted (internal cluster)
network and on top of not being required to type 100 passwords to get
from point-a to point-b, I've now got a simplified way to see who's
been doing what.

Thanks

-Phil

HTH,
Allen

The answer to your question: RFC4255
"Using DNS to Securely Publish Secure Shell (SSH) Key Fingerprints"
http://www.ietf.org/rfc/rfc4255.txt

You will only need to stuff the FP's into SSHFP DNS RR's and turn on
verification for these records on the clients. Done.

In combo with DNSSEC this is a (afaik :wink: 100% secure way to at least get
the finger prints right.

Greets,
Jeroen

Jeroen Massar writes:

The answer to your question: RFC4255
"Using DNS to Securely Publish Secure Shell (SSH) Key Fingerprints"
http://www.ietf.org/rfc/rfc4255.txt

Yes, that's cool if your SSH client supports it (recent OpenSSH's do).

You will only need to stuff the FP's into SSHFP DNS RR's and turn on
verification for these records on the clients. Done.

How do you get the SSH host key fingerprint of a Cisco into SSHFP syntax?

In combo with DNSSEC this is a (afaik :wink: 100% secure way to at least get
the finger prints right.

Exactly.

_wow_.

That's a massive "why not just" paragraph. I can only imagine how
long a paragraph you'd write for finding and removing ex-employee's
public keys from all your systems.

So, here's my "why not just":

  Why not just use Kerberos?

> Why not, on a regular basis, use ssh-keyscan and diff or something
> similar, to scan your range of hosts that DO have ssh on them (maybe

--snip-200-words-or-less---

_wow_.

That's a massive "why not just" paragraph. I can only imagine how
long a paragraph you'd write for finding and removing ex-employee's
public keys from all your systems.

So, here's my "why not just":

  Why not just use Kerberos?

apparently kerberos scares people... I'm not sure I 'get' that, but :frowning: A
corp security group once for a long time 'didnt believe in kerberos',
some people 'get it' some don't :frowning:

I think that one possible answer to this question is that Kerberos
is not well supported (if at all) on most commercial routers and
switches. It would be nice to change that somehow.

Of the routers that we use (cisco, Juniper, foundry, extreme) only
cisco supports Kerberos (specifically Kerberized telnet), and only
in some of their IOS images on some platforms. At least that was the
case last time I checked. I'd love to be corrected ..

The cisco implementation also had some deployment issues for us (poor
integration with authz mechanisms among other things). And during a
competitive eval a few years back, one router vendor even delivered
to us a signed letter from the CEO promising that they'd implement
Kerberized telnet in a few months. They still haven't delivered. That's
the last time we fall for that trick :slight_smile:

I don't know of any vendors that have Kerberized ssh on their
roadmaps. SSH2 with gssapi key exchange, RFC 4462 would be ideal,
which we do run on a variety of UNIX servers here.

As for verifying host keys with SSH, there is one project that
provides x.509 certificate authority integration for openssh:

  http://www.roumenpetrov.info/openssh/

It can even check an OCSP server for revocation status! But
presumably you'll have to get this functionality implemented
on your router's ssh server ..

Kerberos is a single point of failure; that scares people. You *know* you
have to keep the Kerberos server locked down tight, highly available (very
tricky for some ISP scenarios!), etc.

SSH is a distributed single point of failure, just like the old thick
yellow Ethernet. Remember how reliable and easy to debug that was?

More seriously, the original virtue of SSH was that it could be deployed
without centralized infrastructure. That's great for many purposes; it's
exactly what you don't want if you're an ISP managing a lot of servers and
network elements. You really do want a PKI, complete with CRLs. I know
that (most) SSH implementations don't do that -- complain to your vendor.
(Note: the CAs are also single points of failure. However, they can be
kept offline or nearly so, booted from a FooLive CD that logs to a
multi-session CD or via a write-only network port through a tight
firewall, etc. Yes, you have to worry about procedures, physical access,
and people, but you *always* have to worry about those.

    --Steven M. Bellovin, http://www.cs.columbia.edu/~smb

Speaking purely from a system administration point of view, Kerberos
is also a nightmare. Not only does the single-point-of-failure
induce red flags in most SAs I know (myself included), but having
to "kerberise" every authentication-oriented binary on the system
that you have is also a total nightmare. Kerberos 4 is also
completely incompatible with 5. Let's also not bring up the issue
of globally-readable Kerberos tickets laying around /tmp on
machines which use Kerberos, okay? :wink:

Admittedly, the rebuttals to this are a) "most things use PAM which
can use Kerberos transparently" and b) "most network utilities
these days support Kerberos". I run into things every day that
don't support neither Kerberos or PAM.

The bottom line is that SSH is "easier", so more people will use
it. That may not be the best attitude, I'll admit, but that's
reality.

At my current workplace, our SAs + developers wrote a distributed
key system (client + daemon) that runs on all of our machines. It
handles distribution and receiving of SSH keys, creating home dirs,
and deciding who gets their public key stuck into
/root/.ssh/authorized_keys as well. I haven't looked, but it wouldn't
surprise me if something like this was already available via
SourceForge or some other open-source publishing medium.

>
>
> > So, here's my "why not just":
> >
> > Why not just use Kerberos?
> >
>
> apparently kerberos scares people... I'm not sure I 'get' that, but :frowning: A
> corp security group once for a long time 'didnt believe in kerberos',
> some people 'get it' some don't :frowning:
>
Kerberos is a single point of failure; that scares people. You *know* you
have to keep the Kerberos server locked down tight, highly available (very
tricky for some ISP scenarios!), etc.

remote datacenters, firewall/ipf/ipfw/iptables/blah, disable local
console, only absolutely necessary user accounts... there are other
protections, but really, make 10 copies spread them around your 'network'.
It's not that bad, really.

SSH is a distributed single point of failure, just like the old thick
yellow Ethernet. Remember how reliable and easy to debug that was?

More seriously, the original virtue of SSH was that it could be deployed
without centralized infrastructure. That's great for many purposes; it's
exactly what you don't want if you're an ISP managing a lot of servers and
network elements. You really do want a PKI, complete with CRLs. I know

ssh+kerb works, well... so do kerberized r* services... I'm not sure I see
how they are that different from PKI. There may be some advantages to PKI,
but there are risks and operational concerns as well. I suppose people
should pick what works for them...

that (most) SSH implementations don't do that -- complain to your vendor.
(Note: the CAs are also single points of failure. However, they can be
kept offline or nearly so, booted from a FooLive CD that logs to a
multi-session CD or via a write-only network port through a tight
firewall, etc. Yes, you have to worry about procedures, physical access,
and people, but you *always* have to worry about those.

right, just like kerberos... I do admit I'm a fan of kerberos, run it at
home even. anyway :slight_smile: there are obviously many ways to skin this cat.

> > apparently kerberos scares people... I'm not sure I 'get' that, but :frowning: A
> > corp security group once for a long time 'didnt believe in kerberos',
> > some people 'get it' some don't :frowning:
> >
> Kerberos is a single point of failure; that scares people. You *know* you
> have to keep the Kerberos server locked down tight, highly available (very
> tricky for some ISP scenarios!), etc.

Speaking purely from a system administration point of view, Kerberos
is also a nightmare. Not only does the single-point-of-failure
induce red flags in most SAs I know (myself included), but having
to "kerberise" every authentication-oriented binary on the system
that you have is also a total nightmare. Kerberos 4 is also
completely incompatible with 5. Let's also not bring up the issue
of globally-readable Kerberos tickets laying around /tmp on
machines which use Kerberos, okay? :wink:

these really are issues of 1994 (or before) most things people care about
are kerberized or could be substituted with things that are kerberized.

Admittedly, the rebuttals to this are a) "most things use PAM which
can use Kerberos transparently" and b) "most network utilities
these days support Kerberos". I run into things every day that
don't support neither Kerberos or PAM.

I've not run into them, but I've not been looking hard since most of what
I do uses it...

The bottom line is that SSH is "easier", so more people will use
it. That may not be the best attitude, I'll admit, but that's
reality.

ssh+kerb works, even out of the box without the nasty patch-foo you used
to have to live with. It even uses kerb tickets to make up host keys on
the fly (in v2), so you don't have to worry about someone stealing your
host key and finding a way into your tunnel that way anymore.

At my current workplace, our SAs + developers wrote a distributed
key system (client + daemon) that runs on all of our machines. It

anyone do a security assessment of that? :frowning: is it better/worse than the
alternatives? I honestly don't know, I'm just asking to make a point.
Folks have been beating on kerberos for a long time...

anyway :slight_smile: cats with skin, there are many ways to remove said skin.

Speaking purely from a system administration point of view, Kerberos
is also a nightmare. Not only does the single-point-of-failure
induce red flags in most SAs I know (myself included),

If a deployed kerberos environment has a single point of failure then its been deployed poorly. Kerberos has replication mechanisms to provide redundancy. The only think you can't replicate in K5 is the actual master, meaning that if the master is down you can't change passwords, create users, etc. While thats a single point of failure its not typically a real-time critical one.

but having
to "kerberise" every authentication-oriented binary on the system
that you have is also a total nightmare.

As you pointed out, one trivial rebuttal to that is PAM, another is GSSAPI and SASL. Authentication oriented systems shouldn't be hard coding a single auth method these days, they should be using an abstraction layer GSSAPI or SASL. If they are then the GSSAPI Kerberos auth mechanisms should just work. GSSAPI/SASL enabled versions of many major applications are available (Thunderbird, Mail.app, openssh, putty, oracle calendar). (Sadly Microsoft applications are fairly lacking in this category, which is surprising considering that AD servers use Kerberos heavily under the hood.)

Kerberos 4 is also
completely incompatible with 5.

Not true. With a correctly setup environment K5 tickets can be used to get K4 ticket automatically for those few legacy applications that require K4. But really there are very few K4 only applications left.

Let's also not bring up the issue
of globally-readable Kerberos tickets laying around /tmp on
machines which use Kerberos, okay? :wink:

Again, thats an indicator of a poorly setup system. Ticket files should be readable only by the user. If they're readable by anyone else except root something isn't setup right. And on OS'es that support it the tickets are often stored in a more protected location. i.e. on OSX the tickets are stored in a memory-based credential cache.

The bottom line is that SSH is "easier", so more people will use
it. That may not be the best attitude, I'll admit, but that's
reality.

I think the bottom line for the original poster was that ssh was the only secure mechanism support by the devices he was using. For network switches this is common. I think the only answer there is to either make gathering the ssh key from the device part of your build/deployment process, or design your network in a way that reduces the opportunity for man-in-the-middle ssh key exchange attacks and pray.

-David

Furthermore, it isn't impossible to design a multi-master Kerberos
service. I can think of a number of designs, but it would have to
be done carefully. I've heard people talking about this in the
past, but I haven't yet seen any implementations.

--Shumon.

The problem is how do you ensure that you've distributed the most
current CRLs to all your SSH clients. You might need to deploy
a redundant highly available set of OCSP responders. Which means
that at least a part of your centralized infrastructure is now
online and inline :slight_smile: Admittedly not the part that necessarily
needs access to the CA's private key, so not terrible from a
security paranoia point of view.

We already have a deployed key management infrastructure at our
site (Kerberos). If it were (practically) possible to authenticate
login sessions to routers with it, we'd definitely use it. I can't
see us deploying a PKI just to authenticate SSH host keys.

There is the general chicken-and-egg concern about using network
based authentication services to access critical network hardware.
But I think many (most?) of us have other means to access routers
during catastrophic failures or unavailability of the former. We
have an out of band ethernet connected to the router consoles, which
can be dialed into (needs authentication with a hardware token).

--Shumon.

>
> SSH is a distributed single point of failure, just like the old thick
> yellow Ethernet. Remember how reliable and easy to debug that was?
>
> More seriously, the original virtue of SSH was that it could be deployed
> without centralized infrastructure. That's great for many purposes; it's
> exactly what you don't want if you're an ISP managing a lot of servers and
> network elements. You really do want a PKI, complete with CRLs. I know
> that (most) SSH implementations don't do that -- complain to your vendor.
> (Note: the CAs are also single points of failure. However, they can be
> kept offline or nearly so, booted from a FooLive CD that logs to a
> multi-session CD or via a write-only network port through a tight
> firewall, etc. Yes, you have to worry about procedures, physical access,
> and people, but you *always* have to worry about those.
>
> --Steven M. Bellovin, Steven M. Bellovin

The problem is how do you ensure that you've distributed the most
current CRLs to all your SSH clients. You might need to deploy
a redundant highly available set of OCSP responders. Which means
that at least a part of your centralized infrastructure is now
online and inline :slight_smile: Admittedly not the part that necessarily
needs access to the CA's private key, so not terrible from a
security paranoia point of view.

CRLs contain serial numbers and the date of the next-to-be-issued CRL.
You'll always know if you haven't gotten one. What you do then is a
matter of policy -- it's perfectly reasonable to accept keys even if
you've missed an update or two. I'll further assert that the need for
really prompt certificate revocation is often greatly overstated.

Someone you don't want to have one obtains a private key at time T0. You
discover this at time T1, T1 > T0. You go through assorted internal
proceses, including the time to generate and push the next CRL; that
happens at T2, T2 > T1. Most of the time, T1-T0 > T2-T1. That is, the
key will be compromised for longer (and probably much longer) than it
takes to send out a CRL. But the window of avoidable trouble is T2-T1.
Furthermore, this being NANOG, the real issue is whether or not the the
bad guy can *cause* network trouble during that interval -- ordinary
network failures are presumably rare enough that the odds on trouble
happening during that interval *and* the bad guy trying something are low.
Most of their trouble probably happened during [T0,T1], a much longer
time. Trying to optimize the rest of the infrastructure to avoid [T1,T2]
trouble isn't worth it.

We already have a deployed key management infrastructure at our
site (Kerberos). If it were (practically) possible to authenticate
login sessions to routers with it, we'd definitely use it. I can't
see us deploying a PKI just to authenticate SSH host keys.

Why not? PKIs don't have to be big and scary, especially if it's a "pki"
instead of a "PKI". Assertion: with a few scripts to invoke OpenSSL,
anyone capable of running a Kerberos server is capable of running their
own special-purpose pki for this purpose.

There is the general chicken-and-egg concern about using network
based authentication services to access critical network hardware.
But I think many (most?) of us have other means to access routers
during catastrophic failures or unavailability of the former. We
have an out of band ethernet connected to the router consoles, which
can be dialed into (needs authentication with a hardware token).

But the inband schemes are better, or you wouldn't bother with them.

    --Steven M. Bellovin, Steven M. Bellovin

> We already have a deployed key management infrastructure at our
> site (Kerberos). If it were (practically) possible to authenticate
> login sessions to routers with it, we'd definitely use it. I can't
> see us deploying a PKI just to authenticate SSH host keys.

Why not? PKIs don't have to be big and scary, especially if it's a "pki"
instead of a "PKI". Assertion: with a few scripts to invoke OpenSSL,
anyone capable of running a Kerberos server is capable of running their
own special-purpose pki for this purpose.

Not fear but operational overhead. I guess I was trying to state
that I already have one key management system, and would like to
avoid running another one.

But the point is rather moot. SSH implementations on most routers
support neither Kerberos (gss-keyex) nor server authentication with
certificates, never mind about revocation lists! We could fix that
by switching to open source routers :slight_smile:

Judging from this thread, it appears that some of us are already
using or or planning to use some low tech (and not as automatable)
method to verify the server's public key. Some of these methods may
qualify as lowercase "pki", depending on your definition.

> There is the general chicken-and-egg concern about using network
> based authentication services to access critical network hardware.
> But I think many (most?) of us have other means to access routers
> during catastrophic failures or unavailability of the former. We
> have an out of band ethernet connected to the router consoles, which
> can be dialed into (needs authentication with a hardware token).
>
But the inband schemes are better, or you wouldn't bother with them.

Obviously! :slight_smile:

--Shumon