>> The Endace DAG cards claim they can move 7 gbps over a PCI-X bus from
>> the NIC to main DRAM. They claim a full 10gbps on a PCIE bus.
>
> I wonder, has anyone heard of this used for IDS? I've been looking at
> building a commodity SNORT solution, and wondering if a powerful network
> card will help, or would the bottleneck be in processing the packets and
> overhead from the OS?
The first bottleneck is the interrupts from the NIC. With a generic
Intel NIC under Linux, you start to lose a non-trivial number of
packets around 700mbps of "normal" traffic because it can't service
the interrupts quickly enough.
Most modern high performance network cards support MSI (Message Signaled
Interrupts) which generate real interrupts only in an intelligent
basis. and only at a controlled rate. Windows, Solaris and FreeBSD have
support for MSI and I think Linux does, too. It requires both hardware
and software support.
With MSI, TSO, LRO, and PCI-E with hardware that supports these, 9.5
Gbps TCP flows between systems is possible with minimal tuning. That
puts the bottleneck back on the forwarding software in the CPU to do
the forwarding at high rates.
"ethtool -c". Thanks Sargun for putting me on to "I/O Coalescing."
But cards like the Intel Pro/1000 have 64k of memory for buffering
packets, both in and out. Few have very much more than 64k. 64k means
32k to tx and 32k to rx. Means you darn well better generate an
interrupt when you get near 16k so that you don't fill the buffer
before the 16k you generated the interrupt for has been cleared. Means
you're generating an interrupt at least for every 10 or so 1500 byte
packets.
"ethtool -c". Thanks Sargun for putting me on to "I/O Coalescing."
But cards like the Intel Pro/1000 have 64k of memory for buffering
packets, both in and out. Few have very much more than 64k. 64k means
32k to tx and 32k to rx. Means you darn well better generate an
interrupt when you get near 16k so that you don't fill the buffer
before the 16k you generated the interrupt for has been cleared. Means
you're generating an interrupt at least for every 10 or so 1500 byte
packets.
This is not true in the bus master dma mode how the cards are usually used. The mentioned memory is used only as temporary storage until the card can DMA the data into the buffers in main memory. Most Pro/1000 cards have buffering capability up to 4096 frames.
I'll confess to some ignorance here. We're at the edge of my skill set.
The pro/1000 does not need to generate an interrupt in order to start
a DMA transfer? Can you refer me to some documents which explain in
detail how a card on the bus sets up a DMA transfer?
The pro/1000 does not need to generate an interrupt in order to start
a DMA transfer? Can you refer me to some documents which explain in
detail how a card on the bus sets up a DMA transfer?
Busmaster DMA does not generate an interrupt on the host CPU. The
interrupt is used to trigger processing on the host CPU; it can be
deferred until several frames have been written.
The pro/1000 does not need to generate an interrupt in order to start
a DMA transfer? Can you refer me to some documents which explain in
detail how a card on the bus sets up a DMA transfer?
The driver provides the adapter a ring buffer of memory locations to busmaster dma the data into (which does not require interrupting the CPU). The interrupts are triggered after the DMA completes and in moderation controllable by the driver. For FreeBSD the default maxes interrupts out at 8000 per second and on some of the adapters there are firmware optimizations for lowering the latency from the obvious maximum of 125 microseconds. When an interrupt is fired the driver restocks the ring buffer with new addresses to put data into, for one or for 4000 frames, depending on how many were used up.
With IOAT and various offloads this gets somewhat more complicated and more effective.