Hey Everyone -
We have two 7507 routers configured with dual RSP4s w/256MB RAM,
VIP2-50s with 128/8MB RAM, Gig, POSIP OC3 and Fast Ethernet
These routers have run flawlessly for over two years now. But about
two weeks ago, all of a sudden we started having serious crashing
problems with these two routers. The routers will lose bgp
connectivity (one at a time) to our upstreams (configured on each
router). First, we would see a keepalive not sent message, then a bgp
hold timer expire, then the bgp peering session would go down. OSPF
would start crashing, then we would see the memory error messages,
then all interfaces would blink off-line. (Note - we are running the
max memory we can on both the RSPs and the VIPs).
Within 1 minute, the exact same thing would happen to the other
router. Often times we would have to reboot the router to get it to
come back online. We would see the following errors and have to reboot
multiple times to get the router back:
%SYS-2-MALLOCFAIL: Memory allocation of 704 bytes failed from
0x60329F00, alignment 0
Pool: Processor Free: 92744 Cause: Memory fragmentation
Alternate Pool: None Free: 0 Cause: No Alternate pool
-Process= "Pool Manager", ipl= 0, pid= 6
-Traceback= 6038049C 60382200 60329F08 6038DEDC
%TCP-6-NOBUFF: TTY0, no buffer available
-Process= "BGP Router", ipl= 0, pid= 132
%% Low on memory; try again later
GigabitEthernet1/1/0: keepalive not sent
We are running the latest S train IOS patched for the IPV4 issue -
however downgrading to the code we had run for the previous year did
not solve the problem, nor did replacing the RSPs, VIPs and interfaces
with new cards. In addition, we have complied with the Cisco
recommendations for mitigating the effects of the Nachi Worm.
We also shut down one of the routers totally and the other router
still experienced the same issue.
None of these updates or fixes have solved the problem.
I am thinking it may have something to do with all the virus stuff
running around (same thing was crashing my Lucent TNT's), but I cannot
seem to get an answer from Cisco, nor can
I find anyone seeing the same issues.
Hopefully someone here can shed some light on this problem.
Thanks in Advance
I fly because it releases my mind
from the tyranny of petty things . .