Hi,
I believe, based on what i have heard, that some operators turn on
cryptographic authentication because the internet checksum that OSPF,
etc use for packet sanity is quite weak and offers trifle little
protection against lot of known errors like:
- re-ordering of 2-byte aligned words
- various bit flips that keep the 1s complement sum the same (e.g.
0x0000 to 0xffff and vice versa)
So a corrupted packet could still pass the ethernet CRC checks and IP
and OSPF checksums. Or it could be valid till the ethernet CRC check
is done and gets corrupted after that (PCI transmission errors, DMA
errors, memory issues, line card corruption and last but not the
least, CRCs and internet checksums could miss wire-corrupted packets)
Currently an operator can do the following:
- Use the poor internet checksum OR
- Turn on cryptographic authentication in the routing protocols to
catch all such bit errors which could be caused by line card
corruption, etc.
One can go through http://portal.acm.org/citation.cfm?id=294357.294364
to understand the issues with the internet checksums.
I would be interested in knowing if operators use the cryptographic
authentication for detecting the errors that i just described above.
You could send me a mail offline and i will consolidate the responses
and send a summary on the list in a few days time.
Cheers, Manav
Additionally, one might venture to understand the effects of such mechanisms and
why knob's such as IS-IS's "ignore-lsp-errors" were added ~15 years ago. LSP
corruption storms driven by receivers that purge corrupted LSPs and originators that
re-originate and flood on receipt of said purged LSPs are very problematic and
otherwise difficult to identify in practice.
Coincidentally, it's also why logging LSPs that trigger such errors is important, whether
you ignore them or propagate them.
-danny
I would be interested in knowing if operators use the cryptographic
authentication for detecting the errors that i just described above.
Additionally, one might venture to understand the effects of such mechanisms and
why knob's such as IS-IS's "ignore-lsp-errors" were added ~15 years ago. LSP
corruption storms driven by receivers that purge corrupted LSPs and originators that
re-originate and flood on receipt of said purged LSPs are very problematic and
otherwise difficult to identify in practice.
Coincidentally, it's also why logging LSPs that trigger such errors is important, whether
you ignore them or propagate them.
I really wish there was a good way to (generically) keep a 4-6 hour buffer of all control-plane traffic on devices. While you can do that with some, the forensic value is immense when you have a problem.
- Jared
I really wish there was a good way to (generically) keep a 4-6 hour buffer of all control-plane traffic on devices. While you can do that with some, the forensic value is immense when you have a problem.
Buffering for 4-6 hours worth of control traffic is HUGE! What about
mirroring your control traffic arriving on your network ports to some
other dedicated port?
Manav
If 4-6 hours of *control-plane* traffic on a given device is 'HUGE!', for some reasonable modern value of 'HUGE!', then there's definitely a problem on the network in question.
;>
Buffering for 4-6 hours worth of control traffic is HUGE!
If 4-6 hours of *control-plane* traffic on a given device is 'HUGE!', for some reasonable modern value of 'HUGE!', then there's definitely a problem on the network in question.
With BFD alone (assuming 20 sessions, 50ms timer) you will have
400pps. In 6 hours you will have around 8000K BFD packets. Add OSPF,
RSVP, BGP, LACP (for lags), dot1AG, EFM and you would really get a
significant number of packets to buffer.
Cheers, Manav
Which isn't a 'HUGE!' amount of packets.
;>
Not precisely what you're looking for, but you can monitor the OSPF
database in other ways. See some of early OSPF work described here for
instance:
<http://www2.research.att.com/~ashaikh/presentations.php>
I had written a simple utility to grab the LSA counts and checksum
values from a set of routers.when I converted a RIP network to OSPF.
The network consisted of about 25 routers and 300 routes. It was
invaluable to as a sanity check to see if all routers were in
agreement.
Packet Design's Route Explorer may be a commercial implementation of
this sort of thing. I've only an early version of that at an earlier
NANOG and have never used it. It seemed like cool technology at the
time, but don't take that as an endorsement.
Ob op note: I do recall one older IOS router where it would never have
exactly the same checksum values as the other. After manually
inspecting the routes I had concluded that it was an artifact of the
IOS code being run, which was an old 11.x train and the only one in the
net at the time.
John
Yup, but when trying to figure out the root cause of some problem, having a few gigs of data would be helpful.
In the event people have not noticed, hard drives are semi-popular in routers now, so assuming you have some variable amount of disk space greater than 8MB for an image is feasible.
- Jared
on at least one platform you can get some details with traceoptions, no?
Hi,
I received 7 replies of which 3 stated that they were using crypto to
only detect the issues that i have described in my email below.
Another 3 said that they were using it for authentication and 1 person
replied saying that they were using crypto for both authentication and
integrity.
Folks who are using cryptographic authentication mechanisms only for
integrity may want to look at
http://www.ietf.org/id/draft-jakma-ospf-integrity-00.txt
Cheers, Manav