Reportable Metrics

I am trying to take stock of all the network tools we employ on our network
and come up with a concise list of metrics I can compile from these to
to management on a rolling basis to reflect the health of our enterprise

I've started a list and I was hoping others might help me out by adding to
If a measurement you sugggest is not achieveable with a tool I have, I'll
pester you and find out what you are using. But to keep it simple, I'm just
looking for things to measure. Assuming a monthly reporting
schedule, here's the list I have so far:

1. Uptime per WAN or Internet circuit
2. # and average length of outages
3. Bandwidth utilization per WAN/Internet circuit and "important" VLANs
4. Overall Network Latency, RTT measured from various parts of network
(cisco IPM)
to various other parts
5. Top talkers per WAN circuit
6. Top destinations per WAN circuit
7. Top 10 most utilized WAN circuits (% burst above CIR, etc)
7. Protocol distribution per WAN circuit
8. Syslog/Sniffer alarms by severity
9. Application Response time for key Apps (eg, SAP, HTTP)
10. Security Incidents
11. TACACs reports on number of logins, changes, etc
12. Bandwidth/Latency trending

What am I missing?


If you're going through all this trouble, you might as well measure some
interface statistics such as:

- CRC errors, these are an important clue indicating lower layer problems
- collisions, if you use any non-switched ethernet (or even if it's
switched: duplex mismatch is a bad thing and it happens)
- over- and underruns, these indicate (transient) high CPU loads
- input/output drops, to see if you are experiencing congestion


- router CPU load

These should all be easy to measure with MRTG if you can find out how to
read the info from the box using SNMP.