I'm looking for some good reference materials to do some
"reliability engineering" calculations and projections.
This is to justify increased redundancy, and I want to
include quantifiable numbers based on MTBF data and other
reliability factors, kind of a scientific justification
instead of just the typical emotional appeal using
I'd appreciate references on how to do this in a network
environment (what data to collect, how to collect it, how to
analyze, etc). Also any data (or rules of thumb) on typical
MTBFs for network events that I won't find on vendor product
slicks (like what's the MTBF on IOS, or human-caused service
outages of various types, etc).
If someone has put together something remotely like this
that they'd care to share, that'd be incredibly helpful.