I haven't seen specific details posted here, so:
Like many others, we've had a few TNTs online for years without hiccups or
reboots until this week. Beginning late Sunday, we saw seemingly random
blade reboots, and total system crashes. Errors ranged from memory leaks
to infinite loops on the controller blade, but all blades were
susceptible. HDLC2 blades seemed to be particularly vulnerable.
We saw boxes that had been rock-solid for very long periods suddenly
rebooting at periods ranging from 20 minutes to 4 hours, with no obvious
cause (i.e., nothing more specific than the above). Border and core
filtering of icmp echo * did little good.
On the suggestion of some folks on another list, and against my better
judgment, we disabled route caching in order to free up additional
memory (though memory did not appear fragmented). This stabilized all
involved boxes, and surprisingly, did not result in significant
degradation of end user performance.
Granted, it's not a true fix, but it may get you a few extra Z's at night.