TCP torture testing

Does anyone know of a good way to simulate oddball TCP happenings like:

* Out of order delivery
* Variable delivery delays
* (Especially) Unusual segmentation e.g. splitting part of a stream that would and should normally be sent in a single segment into several smaller segments sent back-to-back

And especially doing so with traffic from an existing TCP-speaking application i.e. something like a TCP proxy that lets you deliberately mess with the segmentation and delivery order.

My focus here isn't on volume of traffic but rather trying to tickle unusual receive paths in my TCP/IP stack (which is not, for various reasons, a mainstream well-known PC OS one) and how it interacts with the application.

There are lots of so-called "chaotic proxies" out there that do this sort of thing to the degree that it's a bit overwhelming to get started. I'm looking for suggestions of things that have worked well for folks in this regard.

Hello,

Does anyone know of a good way to simulate oddball TCP happenings like:

* Out of order delivery
* Variable delivery delays

I would suggest to take a look at linux tc-netem

* (Especially) Unusual segmentation e.g. splitting part of a stream that
would and should normally be sent in a single segment into several
smaller segments sent back-to-back

And especially doing so with traffic from an existing TCP-speaking
application i.e. something like a TCP proxy that lets you deliberately
mess with the segmentation and delivery order.

This is more difficult because a TCP proxy (as in a userspace
application) does not do the TCP segmenting, the kernel does. Sure the
application may set flags like TCP_NODELAY to toggle Nagle, but beyond
that the application has not really control over TCP segmentation. So
a tool like this would basically need to reimplement TCP in userspace.

Not sure something like this is out there.

Lukas

This is more difficult because a TCP proxy (as in a userspace
application) does not do the TCP segmenting, the kernel does. Sure the
application may set flags like TCP_NODELAY to toggle Nagle, but beyond
that the application has not really control over TCP segmentation.

Well... In theory, TCP closes the segment at the end of the
application's send() and sets the PSH flag. Likewise, on the receiving
side the recv() returns before filling the buffer upon receipt of a
segment with the PSH flag set.

In theory. In practice, it doesn't always work out that way and
applications which depend on a short recv() meaning that was where the
sender's send() ended tend to flake out in unexpected ways.

Does anyone know of a good way to simulate oddball TCP happenings like:

* Out of order delivery
* Variable delivery delays

I would suggest to take a look at linux tc-netem

Yeah, tc will do most of this without too much fuss.

* (Especially) Unusual segmentation e.g. splitting part of a stream that
would and should normally be sent in a single segment into several
smaller segments sent back-to-back

And especially doing so with traffic from an existing TCP-speaking
application i.e. something like a TCP proxy that lets you deliberately
mess with the segmentation and delivery order.

This is more difficult because a TCP proxy (as in a userspace
application) does not do the TCP segmenting, the kernel does. Sure the
application may set flags like TCP_NODELAY to toggle Nagle, but beyond
that the application has not really control over TCP segmentation. So
a tool like this would basically need to reimplement TCP in userspace.

Not sure something like this is out there.

Not only is it more difficult, it's the part that I think is causing me problems. The thing talking to me is segmenting its TCP stream in a way that I suspect is due to setting TCP_NODELAY but then feeding messages piecemeal (e.g. as they're generated by some state machine) into send/write syscalls. The segments are usually sent back-to-back with no meaningful delay and are of consistent layout, but they're tiny - one is only 4 bytes, and they're not all the same size.

I suspect I have an issue somewhere with my buffer handling, TCP re-assembly, etc. but don't have a good place to look without being able to re-create it while speaking a protocol that I actually speak (which precludes some common things like HTTP, in this case). I may end up modifying an open-source implementation of the protocol (which thankfully exists) to basically do the same thing by the same means.

Well... In theory, TCP closes the segment at the end of the
application's send() and sets the PSH flag. Likewise, on the receiving
side the recv() returns before filling the buffer upon receipt of a
segment with the PSH flag set.

Every segment this thing sends has PSH set which again makes me think that they've got TCP_NODELAY set but are sending their messages piecemeal across multiple send/write calls.

The actual high-level messages are fairly small at typically less than 100B. Most implementations end up sending the entire message in a single TCP segment.

In theory. In practice, it doesn't always work out that way and
applications which depend on a short recv() meaning that was where the
sender's send() ended tend to flake out in unexpected ways.

I don't think that's the issue in this case, but it's a useful thing to go looking for.

If you want to go nuts, check out Scapy

Shopify built a go app called Toxiproxy that would allow for injecting TCP oddballs into an http stream.

https://github.com/Shopify/toxiproxy

Toxiproxy is a framework for simulating network conditions. It’s made specifically to work in testing, CI and development environments, supporting deterministic tampering with connections, but with support for randomized chaos and customization. Toxiproxy is the tool you need to prove with tests that your application doesn’t have single points of failure. We’ve been successfully using it in all development and test environments at Shopify since October, 2014. See our blog post on resiliency for more information.

–Pete

In my lab work, I use Ostinato to craft
(a) packet streams and
(b) sequences of packet streams.

Ostinato is stateless,
so you’ll have to co-ordinate the 3-way handshake yourself to open the connection.
That is:
you’d have to send the SYN segment (segment 1) from Ostinato,
investigate the SN sent by your OS (segment 2), and then
use AN = SN+1 in the reply in the acknowledgement segment (segment 3)
in the AN field sent by Ostinato to the peer.

Your OS (acting as server) would need to tolerate lengthy delays
between its opening SYN (segment 2) and the corresponding ACK (segment 3).
You can mitigate these delays by preparing all the fields beforehand,
leaving only the peer’s OS’s SN to be captured.
AFAIK, these delays run between 30s and 90s,
giving you some time to copy and paste from your capture program to Ostinato’s UI.

Cheers,

Etienne