Supposedly Facebook uses this tool internally, but… that doesn’t help much.
I’ve tried it on 4 different platforms/OSes (WSL Ubuntu; RedHat; Debian; OpenBSD), and versions of Go (v1.10 through v1.16), in three very different environments (on-prem public IP; on-prem NAT’d; cloud public IP), and I’ve yet to see it produce any meaningful output – each run/iteration/thread only detects one, single, hop out of the entire chain of routers, making it less than useful. Granted, that’s not a full regression test by any means, but if anyone here has ever used it successfully, could you please let me know what sort of environment you ran it in/on?
I have used it successfully in a test environment that I was using ECMP in. Most of the public networks that I’ve worked with don’t use ECMP as often as other methods for steering traffic (LAGs, BGP MEDs, etc).
What I have seen it fantastically useful for was troubleshooting a transit provider, or for when they were congested or had a flapping core link. Granted I think it’s still subject to ICMP deprioritization (most SP’s use it prodigiously), and most MPLS cores don’t decrement TTL, but it was still useful to be able to show them “no, at this IP, I always drop traffic, when…”
Historically the bufferbloat effort has used irtt, ping, mtr in combination with a set of tcp flows to attempt to induce and graph the problem via the flent tool. I haven’t thought all that much about ecmp or isolating the bloated hop until recently as an outgrowth of apple’s networkQuality effort here:
TCP_INFO, at least in linux, has now accumulated an amazing number of useful looking statistics few are using as yet. monitoring hopcount also, and perhaps changes to the flowid in transit possibly useful? Key to my thinking at the moment, is I think it’s possible, after viewing RTT inflation, to drop the TTL during a fat flow to find where the bloated hop is, and although I started drafting the ideas for the new tool here
Thank you!! Some of those tools are proving much more useful for me than fbtracert. (In particular, traceflow has been updated recently enough that it “just works” in common environments that have Python3. And while it may not be perfect, it’s good enough to show what I need.)
-Adam
(who apparently has lost the skills needed to Google usefully, in his decrepitude)