do folk have experience with platforms where ifIndexes are not stable
across reboots etc? how do you deal with it? do some of those
platforms trap on change?
randy, who hates ifIndex changes
do folk have experience with platforms where ifIndexes are not stable
across reboots etc? how do you deal with it? do some of those
platforms trap on change?
randy, who hates ifIndex changes
Most platforms I’ve worked with have a method to make the indexes persistent, often by additional command-line options.
-Steve
Once upon a time, Randy Bush <randy@psg.com> said:
do folk have experience with platforms where ifIndexes are not stable
across reboots etc? how do you deal with it? do some of those
platforms trap on change?
Is there any good excuse that SNMP client software can't handle a basic
design of SNMP - indexed tables? ifIndex is far from the only index in
SNMP, and many of them still change today at various times.
It isn't that hard to fetch the indexed field in a bulk get, rewalking
the table if you don't get what you expected. Cricket did this in 1999.
I see this all the time. Especially in module chassis. It seems like sometimes it has to do with when each board goes to a ready state as the system boots. We also see renumbering due to virtual interface and board additions. While you are running they seem to get the next ifindex available but when you reboot the seem to be in the order they come up or the order they are in the configuration. It is a real pain and some software allows us to rescan a device and other software we have no easy way other than to delete and the re-add the device. I feel your pain on this one.
I have no idea why most NMS systems can't seem to understand this and just rescan at a set interval or after an up/down device event.
Steven Naslund
Chicago IL
Cisco has a feature you can enable called “Interface Index Persistence”:
This solves the problem, at least with Cisco gear.
-mel beckman
It's never going to be provably correct, depending on what stability means.
You fetch relation at t0, then at t1 you fetch data. Was the relation
same at t0 and t1? You can gain some confidence by fetching relation
again at t2 and disregard data if t0 != t2. But this becomes polling
expensive quite fast, and still not provably correct. This may be
nitpicking, but I've always felt uneasy about the lack of guarantee.
I wonder if those who have stable indeces, have them for all cases,
all logical interfaces and virtual interfaces?
Saku,
The issue isn't that ifindexes change during operation. That would truly make SNMP useless. The issue is that they change across reboots. That's where features such as Cisco's Interface Index Persistence helps out.
-mel via cell
Cisco tries very hard to make such useless data occur in XR. If you have a gigE SFP in an SFP+ port, a new ifindex will appear for the resulting GigabitEthernetX port, then it remains even if both the config and SFP have been removed. Automated systems will keep querying it as if it were a downed port, but wait, reboot, and suddenly it vanishes. I went back and forth with TAC for weeks explaining that SNMP interfaces should not disappear as a result of a reboot, I should either be able to remove it, or it's stuck there forever, but a reboot should not cause a change. They didn't care; it is 'by design'.
Saku,
The issue isn't that ifindexes change during operation. That would truly make SNMP useless. The issue is that they change across reboots. That's where features such as Cisco's Interface Index Persistence helps out.
-mel via cell
David,
I do that too, but I’m referring to XR when you use different speed optics in a multi-speed port; if you have a SFP+ port and 10gig SFP, you’ll get one ifindex. New use case requires swapping to a gigE SFP and you’ll get a new ifindex. Take the port out of service, remove the GigE SFP and the related config, yet both ifindexes remain; until the device is reloaded. At that the gigE ifindex goes away leaving just the native-speed ifindex.
It’s a pain for management because we’re forced to make exclusions in our NMS for ifindex’s that may disappear at some point, because they show as down with no way to make that not the case. Worse, if that port is put to use again at the non-native speed, and has such an exclusion in place, we don’t auto learn the new usage because of the exclusion.
I tried to argue with TAC that if the gigE SFP has been removed from the SFP+ port, and its config has been deleted, the corresponding ifindex and related counters should be gone; it no longer exists in any form. If you reload, it will disappear, but that’s the only way.