the little ssh that (sometimes) couldn't



Bush league. I debugged a similar issue on Sprint's network about 15 years ago, also nailing it down to which router/router hop had the problem (a misconfigured interface that couldn't pass certain bit patterns and was causing a particular file we were hosting for a customer to be non-downloadable by any client who's packets used the bad path), also using ping, but with a pattern much more interesting than large packets of nulls...and I had to figure out the problematic pattern before I could do the ping tests.

But if you want really bizzare, this one never got solved to my satisfaction.

When I was working for Sprint about 12 years ago, we had a circuit where the customer complained that we were blocking executable downloads.

We essentially dismissed his complaints because they were ridiculous. We would test his T1 and it would show everything fine. I was willing to entertain his concern because it sounded weird and he had a UNIX box I could login to.

Running wget I saw the same issues. If I zipped a file I could download it without issue, anything that was an exe would not.

We narrowed it down to 2-4 bytes of the exe header that the circuit just wouldn't pass. Called the local telco and had them test the circuit from the customer prem, they found errors on the reverse.

We fixed it and he could download executables again. I got an award for persistence and the customer canceled his account.


I ran into a similar issue with a customer just a few days ago! The
customer's theory was that there was something badly wrong with their
dorky gateway/switch (which we sold and support <sigh>). ssh was
timing out, with a SSH2_MSG_KEX_DH_GEX_GROUP hang/failure during the
ssh protocol exchange. Based on that, some wireshark captures, and
and stray Google droppings, I advised them to ratchet down the MTU to
make things work. Through bisectional MTU settings and pinging, we
arrived at an MTU of 850. And I initially started cursing at the
switch (because that helps move packets, really :slight_smile: ).

Turns out -- the ssh server in question was running RHEL 5.x Linux,
and that was the key. Even though "ip route show cache" looked sane,
"ip route flush cache" (which I had them run, just on a lark) made
the problem go away. So it probably wasn't my switch (unless it had
done something untoward in the distant past that induced some weird
Linux stack bug).

I'm mostly posting this because I was wondering if anyone else had
run into an MTU of 850 before. Is that a "magic number" that rings
any bells (or perhaps has seen the Linux route cache behavior I did).