latency (was: RE: cooling door)

Understandably, some applications fall into a class that requires very-short
distances for the reasons you cite, although I'm still not comfortable with the
setup you've outlined. Why, for example, are you showing two Ethernet switches
for the fiber option (which would naturally double the switch-induced latency),
but only a single switch for the UTP option?

Now, I'm comfortable in ceding this point. I should have made allowances for this
type of exception in my introductory post, but didn't, as I also omitted mention
of other considerations for the sake of brevity. For what it's worth, propagation
over copper is faster propagation over fiber, as copper has a higher nominal
velocity of propagation (NVP) rating than does fiber, but not significantly
greater to cause the difference you've cited.

As an aside, the manner in which o-e-o and e-o-e conversions take place when
transitioning from electronic to optical states, and back, affects latency
differently across differing link assembly approaches used. In cases where 10Gbps
or greater is being sent across a "multi-mode" fiber link in a data center or
other in-building venue, for instance, "parallel optics" are most ofen used,
i.e., multiple optical channels (either fibers or wavelengths) that undergo
multiplexing and de-multiplexing (collectively: inverse multiplexing or channel
bonding) -- as opposed to a single fiber (or a single wavelength) operating at
the link's rated wire speed.

By chance, is the "deserialization" you cited earlier, perhaps related to this
inverse muxing process? If so, then that would explain the disconnect, and if it
is so, then one shouldn't despair, because there is a direct path to avoiding this.

In parallel optics, e-o processing and o-e processing is intensive at both ends
of the 10G link, respectively. These have the effect of adding more latency than
a single-channel approach would. Yet, most of the TIA activity taking place today
that is geared to increasing data rates over in-building fiber links continues to
favor multi-mode and the use of parallel optics, as opposed to specifying
single-mode supporting a single channel. But singlemode solutions are also
available to those who dare to be different.

I'll look more closely at these issues and your original exception during the
coming week, since they represent an important aspect in assessing the overall
model. Thanks.

Frank A. Coluccio
DTI Consulting Inc.
212-587-8150 Office
347-526-6788 Mobile

On Sat Mar 29 20:30 , Mikael Abrahamsson sent:

Understandably, some applications fall into a class that requires very-short
distances for the reasons you cite, although I'm still not comfortable with the
setup you've outlined. Why, for example, are you showing two Ethernet switches
for the fiber option (which would naturally double the switch-induced latency),
but only a single switch for the UTP option?

Yes, I am showing a case where you have switches in each rack so each rack is uplinked with a fiber to a central aggregation switch, as opposed to having a lot of UTP from the rack directly into the aggregation switch.

Now, I'm comfortable in ceding this point. I should have made allowances for this
type of exception in my introductory post, but didn't, as I also omitted mention
of other considerations for the sake of brevity. For what it's worth, propagation
over copper is faster propagation over fiber, as copper has a higher nominal
velocity of propagation (NVP) rating than does fiber, but not significantly
greater to cause the difference you've cited.

The 2/3 speed of light in fiber as opposed to propagation speed in copper was not in my mind.

As an aside, the manner in which o-e-o and e-o-e conversions take place when
transitioning from electronic to optical states, and back, affects latency
differently across differing link assembly approaches used. In cases where 10Gbps

My opinion is that the major factors of added end-to-end latency in my example is that the packet has to be serialisted three times as opposed to once and there are three lookups instead of one. Lookups take time, putting the packet on the wire take time.

Back in the 10 megabit/s days, there were switches that did cut-through, ie if the output port was not being used the instant the packet came in, it could start to send out the packet on the outgoing port before it was completely taken in on the incoming port (when the header was received, the forwarding decision was taken and the equipment would start to send the packet out before it was completely received from the input port).

By chance, is the "deserialization" you cited earlier, perhaps related to this
inverse muxing process? If so, then that would explain the disconnect, and if it
is so, then one shouldn't despair, because there is a direct path to avoiding this.

No, it's the store-and-forward architecture used in all modern equipment (that I know of). A packet has to be completely taken in over the wire into a buffer, a lookup has to be done as to where this packet should be put out, it needs to be sent over a bus or fabric, and then it has to be clocked out on the outgoing port from another buffer. This adds latency in each switch hop on the way.

As Adrian Chadd mentioned in the email sent after yours, this can of course be handled by modifying or creating new protocols that handle this fact. It's just that with what is available today, this is a problem. Each directory listing or file access takes a bit longer over NFS with added latency, and this reduces performance in current protocols.

Programmers who do client/server applications are starting to notice this and I know of companies that put latency-inducing applications in the development servers so that the programmer is exposed to the same conditions in the development environment as in the real world. This means for some that they have to write more advanced SQL queries to get everything done in a single query instead of asking multiple and changing the queries depending on what the first query result was.

Also, protocols such as SMB and NFS that use message blocks over TCP have to be abandonded and replaced with real streaming protocols and large window sizes. Xmodem wasn't a good idea back then, it's not a good idea now (even though the blocks now are larger than the 128 bytes of 20-30 years ago).

swmike@swm.pp.se (Mikael Abrahamsson) writes:

...
Back in the 10 megabit/s days, there were switches that did cut-through,
ie if the output port was not being used the instant the packet came in,
it could start to send out the packet on the outgoing port before it was
completely taken in on the incoming port (when the header was received,
the forwarding decision was taken and the equipment would start to send
the packet out before it was completely received from the input port).

had packet sizes scaled with LAN transmission speed, i would agree. but
the serialization time for 1500 bytes at 10MBit was ~1.2ms, and went down
by a factor of 10 for FastE (~120us), another factor of 10 for GigE (~12us)
and another factor of 10 for 10GE (~1.2us). even those of us using jumbo
grams are getting less serialization delay at 10GE (~7us) than we used to
get on a DEC LANbridge 100 which did cutthrough after the header (~28us).

..., it's the store-and-forward architecture used in all modern equipment
(that I know of). A packet has to be completely taken in over the wire
into a buffer, a lookup has to be done as to where this packet should be
put out, it needs to be sent over a bus or fabric, and then it has to be
clocked out on the outgoing port from another buffer. This adds latency in
each switch hop on the way.

you may be right about the TCAM lookup times having an impact, i don't know
if they've kept pace with transmission speed either. but someone's theory
here yesterday that software (kernel and IP stack) architecture is more
likely to be at fault, there are still plenty of "queue it here, it'll go
out next time the device or timer interrupt handler fires" and this can be
in the ~1ms or even ~10ms range. this doesn't show up on file transfer
benchmarks since packet trains usually do well, but miss an ACK, or send
a ping, and you'll see a shelf.

As Adrian Chadd mentioned in the email sent after yours, this can of
course be handled by modifying or creating new protocols that handle this
fact. It's just that with what is available today, this is a problem. Each
directory listing or file access takes a bit longer over NFS with added
latency, and this reduces performance in current protocols.

here again it's not just the protocols, it's the application design, that
has to be modernized. i've written plenty of code that tries to cut down
the number of bytes of RAM that get copied or searched, which ends up not
going faster on modern CPUs (or sometimes going slower) because of the
minimum transfer size between L2 and DRAM. similarly, a program that sped
up on a VAX 780 when i taught it to match the size domain of its disk I/O
to the 512-byte size of a disk sector, either fails to go faster on modern
high-bandwidth I/O and log structured file systems, or actually goes slower.

in other words you don't need NFS/SMB, or E-O-E, or the WAN, to erode what
used to be performance gains through efficiency. there's plenty enough new
latency (expressed as a factor of clock speed) in the path to DRAM, the
path to SATA, and the path through ZFS, to make it necessary that any
application that wants modern performance has to be re-oriented to take
modern (which in this case means, streaming) approach. correspondingly,
applications which take this approach, don't suffer as much when they move
from SATA to NFS or iSCSI.

Programmers who do client/server applications are starting to notice this
and I know of companies that put latency-inducing applications in the
development servers so that the programmer is exposed to the same
conditions in the development environment as in the real world. This
means for some that they have to write more advanced SQL queries to get
everything done in a single query instead of asking multiple and changing
the queries depending on what the first query result was.

while i agree that turning one's SQL into transactions that are more like
applets (such that, for example, you're sending over the content for a
potential INSERT that may not happen depending on some SELECT, because the
end-to-end delay of getting back the SELECT result is so much higher than
the cost of the lost bandwidth from occasionally sending a useless INSERT)
will take better advantage of modern hardware and software architecture
(which means in this case, streaming), it's also necessary to teach our
SQL servers that ZFS "recordsize=128k" means what it says, for file system
reads and writes. a lot of SQL users who have moved to a streaming model
using a lot of transactions have merely seen their bottleneck move from the
network into the SQL server.

Also, protocols such as SMB and NFS that use message blocks over TCP have
to be abandonded and replaced with real streaming protocols and large
window sizes. Xmodem wasn't a good idea back then, it's not a good idea
now (even though the blocks now are larger than the 128 bytes of 20-30
years ago).

i think xmodem and kermit moved enough total data volume (expressed as a
factor of transmission speed) back in their day to deserve an honourable
retirement. but i'd agree, if an application is moved to a new environment
where everything (DRAM timing, CPU clock, I/O bandwidth, network bandwidth,
etc) is 10X faster, but the application only runs 2X faster, then it's time
to rethink more. but the culprit will usually not be new network latency.

From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf Of
Paul Vixie
Sent: Sunday, March 30, 2008 10:35 AM
To: nanog@merit.edu
Subject: Re: latency (was: RE: cooling door)

swmike@swm.pp.se (Mikael Abrahamsson) writes:

> Programmers who do client/server applications are starting to notice
this
> and I know of companies that put latency-inducing applications in the
> development servers so that the programmer is exposed to the same
> conditions in the development environment as in the real world. This
> means for some that they have to write more advanced SQL queries to
get
> everything done in a single query instead of asking multiple and
changing
> the queries depending on what the first query result was.

while i agree that turning one's SQL into transactions that are more
like
applets (such that, for example, you're sending over the content for a
potential INSERT that may not happen depending on some SELECT, because
the
end-to-end delay of getting back the SELECT result is so much higher
than
the cost of the lost bandwidth from occasionally sending a useless
INSERT)
will take better advantage of modern hardware and software architecture
(which means in this case, streaming), it's also necessary to teach our
SQL servers that ZFS "recordsize=128k" means what it says, for file
system
reads and writes. a lot of SQL users who have moved to a streaming
model
using a lot of transactions have merely seen their bottleneck move from
the
network into the SQL server.

I have seen first hand (worked for a company and diagnosed issues with their
applications from a network perspective, prompting a major re-write of the
software), where developers work with their SQL servers, application
servers, and clients all on the same L2 switch. They often do not duplicate
the environment they are going to be deploying the application into, and
therefore assume that the "network" is going to perform the same. So, when
there are problems they blame the network. Often the root problem is the
architecture of the application itself and not the "network." All the
servers and client workstations have Gigabit connections to the same L2
switch, and they are honestly astonished when there are issues running the
same application over a typical enterprise network with clients of different
speeds (10/100/1000, full and/or half duplex). Surprisingly, to me, they
even expect the same performance out of a WAN.

Application developers today need a "network" guy on their team. One who
can help them understand how their proposed application architecture would
perform over various customer networks, and that can make suggestions as to
how the architecture can be modified to allow the performance of the
application to take advantage of the networks' capabilities. Mikael (seems
to) complain that developers have to put latency inducing applications into
the development environment. I'd say that those developers are some of the
few who actually have a clue, and are doing the right thing.

> Also, protocols such as SMB and NFS that use message blocks over TCP
have
> to be abandonded and replaced with real streaming protocols and large
> window sizes. Xmodem wasn't a good idea back then, it's not a good
idea
> now (even though the blocks now are larger than the 128 bytes of 20-
30
> years ago).

i think xmodem and kermit moved enough total data volume (expressed as
a
factor of transmission speed) back in their day to deserve an
honourable
retirement. but i'd agree, if an application is moved to a new
environment
where everything (DRAM timing, CPU clock, I/O bandwidth, network
bandwidth,
etc) is 10X faster, but the application only runs 2X faster, then it's
time
to rethink more. but the culprit will usually not be new network
latency.
--
Paul Vixie

It may be difficult to switch to a streaming protocol if the underlying data
sets are block-oriented.

Fred Reimer, CISSP, CCNP, CQS-VPN, CQS-ISS
Senior Network Engineer
Coleman Technologies, Inc.
954-298-1697

I was definately not complaining, I brought it up as an example where developers have clue and where they're doing the right thing.

I've too often been involved in customer complaints which ended up being the fault of Microsoft SMB and the customers having the firm idea that it must be a network problem since MS is a world standard and that can't be changed. Even proposing to change TCP Window settings to get FTP transfers quicker is met with the same sceptisism.

Even after describing to them about the propagation delay of light in fiber and the physical limitations, they're still very suspicious about it all.

Thanks for the clarification; that's why I put the "seems to" in the reply.

Fred Reimer, CISSP, CCNP, CQS-VPN, CQS-ISS
Senior Network Engineer
Coleman Technologies, Inc.
954-298-1697