Arista hardware health and environmental nagios plugin

Hello All,

Does anyone have a ready to use nagios/icinga plugin for hardware health
and temperature monitoring of arista devices that they are willing to
share? (7050, 7280 and 7500)

With google searches I can't find any available.

Arista TAC replied: "nagios does snmp, so that should fit you needs"

There is https://github.com/ncsa/nagios-plugins which should be able to be
augmented to do the extra checks.
And with pyeapi it shouldn't be rocket science either. (for a developer,
which I am not)

If I were to request our devops department to build it it would probably
put in back of a very long queue.

So if there is anyone out there that is willing to share it would be
greatly appreciated.

Thanks,

Bas

Get the MIBS of the devices you want to monitor, then build SNMP sense
programs to pull the information you need. The NAGIOS manuals should
describe how to do this.

Hello All,

Thanks for your replies.

Especially the lmgtfy and RTFM.. most helpful. :slight_smile:

I had hoped not to have to re-invent the wheel.

Bas

Bas,

Arista EOS supports ENTITY-SENSOR-MIB and exposes temperature sensors, etc,
via that MIB so you should be able to use any NAGIOS plugins that can pull
ENTITY-SENSOR-MIB data for environmental monitoring. For example,
https://exchange.nagios.org/directory/Plugins/Hardware/Others/check_
entPhySensorValue/details
I haven't used that specific NAGIOS plugin myself -- it just turned up when
I searched and looked like it would do the job.

To find the index of the temp sensor(s) you want to monitor (e.g. CPU, back
panel, front panel, etc) you can drop into a bash shell on your Arista
switches and run something like "snmptable localhost
ENTITY-MIB::entPhysicalTable" and look at the entPhysicalDescr column to
see the available sensors. The actual sensor values are provided in
ENTITY-SENSOR-MIB::entPhySensorTable.

The indices in entPhySensorTable are constructed by
adding entPhysicalContainedIn + entPhysicalParentRelPos. For example, on my
switch I see a sensor named "Back-panel temp sensor" with
entPhysicalContainedIn=1100006000 and entPhysicalParentRelPos=3 so the
index into the ENTITY-SENSOR-MIB::entPhySensorTable would be 1100006000+3 =
1100006003:

$ snmpwalk localhost ENTITY-SENSOR-MIB::entPhySensorTable |grep 100006003
ENTITY-SENSOR-MIB::entPhySensorType.100006003 = INTEGER: celsius(8)
ENTITY-SENSOR-MIB::entPhySensorScale.100006003 = INTEGER: units(9)
ENTITY-SENSOR-MIB::entPhySensorPrecision.100006003 = INTEGER: 1
ENTITY-SENSOR-MIB::entPhySensorValue.100006003 = INTEGER: 326
ENTITY-SENSOR-MIB::entPhySensorOperStatus.100006003 = INTEGER: ok(1)
ENTITY-SENSOR-MIB::entPhySensorUnitsDisplay.100006003 = STRING: Celsius
ENTITY-SENSOR-MIB::entPhySensorValueTimeStamp.100006003 = Timeticks:
(1063007379) 123 days, 0:47:53.79
ENTITY-SENSOR-MIB::entPhySensorValueUpdateRate.100006003 = Gauge32: 5000
milliseconds

The entPhySensorValue value of 326 means 32.6 degrees Celsius because
entSensorPrecision=1 (meaning entPhySensorValue equals "degrees C times
10").

Nathan

See it as tweaking the wheel...

     Now a perl script (with caching) to monitor VCP ports on QFX5100's is re-inventing the wheel, just because their engineers opted out of the usual way to handle network interfaces.

     They could have simply named them VCP-<Member ID>/0/x instead of naming them all VCP-255/0/x

Hello,

Wiadomość napisana przez bas <kilobit@gmail.com> w dniu 19.05.2017, o godz. 21:34:

I had hoped not to have to re-invent the wheel.

Some custom scripts I use on 7050SX: GitHub - piwanejko/Arista-monitoring-tools: Scripts for monitoring Arista switches.
Nagios checks:

CPU1 temperature check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.4.100006001'!'550'!'600'
CPU1 load check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.25.3.3.1.2.1'!'70'!'90'
CPU2 temperature check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.4.100006002'!'550'!'600'
CPU2 load check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.25.3.3.1.2.2'!'70'!'90'
CPU3 temperature check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.4.100006003'!'550'!'600'
CPU3 load check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.25.3.3.1.2.3'!'70'!'90'
CPU4 temperature check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.4.100006004'!'550'!'600'
CPU4 load check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.25.3.3.1.2.4'!'70'!'90'
Fan tray 1 status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100601111'!''!'1'
Fan tray 2 status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100602111'!''!'1'
Fan tray 3 status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100603111'!''!'1'
Fan tray 4 status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100604111'!''!'1'
Lower board temperature check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.4.100006011'!'500'!'600'
PSU1 fan status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100711211'!''!'1'
PSU1 in current status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100711103'!''!'1'
PSU1 in voltage status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100711105'!''!'1'
PSU2 fan status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100721211'!''!'1'
PSU2 in current status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100721103'!''!'1'
PSU2 in voltage status check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.5.100721105'!''!'1'
SUP temperature check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.4.100006005'!'550'!'600'
Upper board temperature check_snmp_sw!2c!'public'!'.1.3.6.1.2.1.99.1.1.1.4.100006009'!'500'!'600'
Uptime check_snmp_sw!'2c'!'public'!'.1.3.6.1.2.1.1.3.0'!'@60000:70000'!'60000:'

check_snmp_sw -> check_snmp -H $HOSTADDRESS$ -P $ARG1$ -C $ARG2$ -o $ARG3$ -w $ARG4$ -c $ARG5$

I also made custom script to check discs and memory utilization, but it's too old and terribly written to be shared.

Best regards,