Yes, exactly same issue for us, and it has happened in the past a few years ago fortunately. Any chance the route takes a Level 3 (3356) path? I’m just theorizing here, but my belief is they have some kind of link aggregation in the path from TB to 3356 (or maybe just internal near some edge) and some traffic is getting hashed onto a problematic link/interface/linecard, etc. where IPSec gets dropped. One of our locations lost IPSec ability to some normal VPN endpoints but not others. And here’s why I think this is the issue…. if you change the source and/or destination IP address by one, you may find some or all of your sessions magically work again.
In our case, one of our office locations has a static assignment of (fortunately) five IP’s. We only have one external exposed, four site to site VPN’s. Two began failing Saturday morning. I moved the office firewall’s external IP minus 1 and that fixed both, but broke one that had been fine. On the remote end fortunately I have equipment that’s able to override the local IP for VPN traffic, so without impacting other things it talks to, I was able to add a new IP one off from the previous, and use that for traffic just to this office location; that fixed the remaining issue.
If I’d not seen this previously several years ago, and wasted who knows how many hours trying to figure it out, it would have once again taken forever to resolve. Trying to get through their support layer to someone who can really help is impossible. The support is really complete garbage at this point after the Verizon dump; I was going to say service, but that’s been stable outside of these random weird issues that are impossible to resolve with support.
I tried to be a nice guy and raise this through the support channels, but could not make it past the layer where they want me to take our office down to have someone plug a laptop in with our normal WAN IP and “prove” ipsec isn’t working with different equipment. I was like dude I just told you what I did to get it working again, offered packet captures, just escalate it, but ultimately gave up and hung up.