Verizon EVDO Issues

Alex_Harrowell · April 8, 2009, 10:27am

Do they maintain a continuous data link in normal operation (like, say,
connectivity for a LAN, or backhaul for a camera or some such), or do they
request the data link when they need to send [whatever] (like a discrete SCADA
system)? My (user only) experience is that cellular data service doesn't
handle long sessions well.

Nathan_Ward · April 8, 2009, 10:40am

I've had great success with it. We have done live audio streaming over IP through a cellular service before. 64kbps ogg encoding.

About 7 or so hours in one session.

We used to do a cheap live broadcast from an outdoor event for a radio station.

Fouant_Stefan · April 8, 2009, 3:00pm

Any good clueful network Engineers from Equinix on-list? If so, please
contact me off-line as I noticed some oddball network behavior at some
of your peering points.

Regards,

Stefan Fouant: NeuStar, Inc.
Principal Network Engineer
46000 Center Oak Plaza Sterling, VA 20166
[ T ] +1 571 434 5656 [ M ] +1 202 210 2075
[ E ] stefan.fouant@neustar.biz [ W ] www.neustar.biz

Niels_Bakker · April 8, 2009, 4:17pm

* Stefan.Fouant@neustar.biz (Fouant, Stefan) [Wed 08 Apr 2009, 17:04 CEST]:

Any good clueful network Engineers from Equinix on-list? If so, please contact me off-line as I noticed some oddball network behavior at some of your peering points.

You do realise that the people who run an Internet exchange only manage the Ethernet switch and have no influence on participants' routing, right?

If you're seeing odd things on your router directly connected to the IX switch you should have a better way of contacting your vendor than through the nanog mailing list.

-- Niels.

Fouant_Stefan · April 8, 2009, 5:17pm

Niels - this was an issue with the internet exchange netblock being
leaked out to upstream providers and causing peering adjacencies to be
established through indirect paths. It wasn't an issue with the router
and it wasn't an issue with a peer.

Thanks for your concern though... I think we got it handled now

Stefan Fouant: NeuStar, Inc.
Principal Network Engineer
46000 Center Oak Plaza Sterling, VA 20166
[ T ] +1 571 434 5656 [ M ] +1 202 210 2075
[ E ] stefan.fouant@neustar.biz [ W ] www.neustar.biz

From: Niels Bakker [mailto:niels=nanog@bakker.net]
Sent: Wednesday, April 08, 2009 12:17 PM
To: nanog@nanog.org
Subject: Re: Equinix contact

* Stefan.Fouant@neustar.biz (Fouant, Stefan) [Wed 08 Apr 2009, 17:04
CEST]:
>Any good clueful network Engineers from Equinix on-list? If so,
please
>contact me off-line as I noticed some oddball network behavior at

some

>of your peering points.

You do realise that the people who run an Internet exchange only

manage

the Ethernet switch and have no influence on participants' routing,
right?

If you're seeing odd things on your router directly connected to the

IX

Seth_Mattinen · April 8, 2009, 6:41pm

Alexander Harrowell wrote:

Been troubleshooting a very strange problem for a couple of weeks now.

I have a few hundred systems deployed throughout the United States
utilizing EVDO connectivity with Verizon as a carrier. They are stationary.

Over the past few weeks clusters of them in SF and Lewisville TX and a
few other areas have been failing intermittently. They are offline for
several days, then online for a few days then go offline again. They are
running Linux and PPPD.

Do they maintain a continuous data link in normal operation (like, say,
connectivity for a LAN, or backhaul for a camera or some such), or do they
request the data link when they need to send [whatever] (like a discrete SCADA
system)? My (user only) experience is that cellular data service doesn't
handle long sessions well.

I have a few Sprint EVDO cards. They go into standby when nothing is
actively going on and fire up within seconds when there is something to
do. I regularly use everything from SSH to streaming video without any
issues. I only notice the delay with SSH when I don't type anything for
a few minutes and it has to come active again, but I can leave it idle
for hours and it never drops.

As far as the OP goes, let them replace the cards if they think that's
the problem. You and I may suspect something else is up, but if that's
on their checklist, it is what it is.

~Seth

Charles_N_Wyble3 · April 8, 2009, 8:37pm

Do they maintain a continuous data link in normal operation (like, say, connectivity for a LAN, or backhaul for a camera or some such), or do they request the data link when they need to send [whatever] (like a discrete SCADA system)? My (user only) experience is that cellular data service doesn't handle long sessions well.

Continuous operation. They have been working fine for some time. We have about 20 locations that aren't working, and over 200 that are working just fine.

Rob_Seastrom2 · April 9, 2009, 11:15am

Seth Mattinen <sethm@rollernet.us> writes:

I have a few Sprint EVDO cards. They go into standby when nothing is
actively going on and fire up within seconds when there is something to
do. I regularly use everything from SSH to streaming video without any
issues. I only notice the delay with SSH when I don't type anything for
a few minutes and it has to come active again, but I can leave it idle
for hours and it never drops.

Interesting. When I got my Sprint EVDO card (u727) a year and a half
ago, they were pretty nasty about gunning down (bidirectional spoofed
RST coming out of the middle of the network somewhere) any TCP
sessions that were idle for ten minutes or more. Quite repeatable and
verified on the downlow by People With Insight that this was in fact
expected behavior from boxes that were in the middle of the network
due to "politics" (unlike Verizon, Sprint appears to put no
restrictions on inbound connections to the evdo-host). Putting this:

ServerAliveInterval 60

in ~/.ssh/config was an effective work-around. I have not revisited
the issue to see if Sprint has corrected this behavior. Perhaps
budget constraints or customer complaints have caused Sprint to
revisit the necessity of having extraneous hardware in their network.

-r

Daniel_Senie2 · April 9, 2009, 2:31pm

We observe this same kind of behavior with firewalls in the path watching for dead sessions they can clean up. Appears they send RSTs to both end points when they decide a session has gone away, as that'll let end hosts figure it out sooner. Same workaround of turning on keep=alives once a minute solves this too. The behavior in the case of firewalls makes sense, as state tables have to be cleaned up eventually.

Steven_Bellovin · April 9, 2009, 2:55pm

I use a Verizon Wireless u727; before that, I used a PCMCIA card. I've
never had problems with drops on idle. *However* -- if there was a
packet from the wrong IP address, the older card would drop the
connection -- apparently, that behavior was required by the spec. (I
haven't checked if the newer one will do that.) So, if the
EVDO connection dropped while I had, say, an IMAP or ssh session open,
and I dialed back in, the next TCP packet would cause EVDO to drop
again... I finally "fixed" it by creating ipfilter rules in my ppp-up
script to block all "bad" packets from going out.

--Steve Bellovin, http://www.cs.columbia.edu/~smb

Rob_Seastrom2 · April 9, 2009, 3:12pm

"Steven M. Bellovin" <smb@cs.columbia.edu> writes:

Seth Mattinen <sethm@rollernet.us> writes:

> I have a few Sprint EVDO cards. They go into standby when nothing is
> actively going on and fire up within seconds when there is
> something to do. I regularly use everything from SSH to streaming
> video without any issues. I only notice the delay with SSH when I
> don't type anything for a few minutes and it has to come active
> again, but I can leave it idle for hours and it never drops.

Interesting. When I got my Sprint EVDO card (u727) a year and a half
ago, they were pretty nasty about gunning down (bidirectional spoofed
RST coming out of the middle of the network somewhere) any TCP
sessions that were idle for ten minutes or more. Quite repeatable and
verified on the downlow by People With Insight that this was in fact
expected behavior from boxes that were in the middle of the network
due to "politics" (unlike Verizon, Sprint appears to put no
restrictions on inbound connections to the evdo-host). Putting this:

ServerAliveInterval 60

in ~/.ssh/config was an effective work-around. I have not revisited
the issue to see if Sprint has corrected this behavior. Perhaps
budget constraints or customer complaints have caused Sprint to
revisit the necessity of having extraneous hardware in their network.

I use a Verizon Wireless u727; before that, I used a PCMCIA card. I've
never had problems with drops on idle. *However* -- if there was a
packet from the wrong IP address, the older card would drop the
connection -- apparently, that behavior was required by the spec. (I
haven't checked if the newer one will do that.) So, if the
EVDO connection dropped while I had, say, an IMAP or ssh session open,
and I dialed back in, the next TCP packet would cause EVDO to drop
again... I finally "fixed" it by creating ipfilter rules in my ppp-up
script to block all "bad" packets from going out.

Interesting. I never had that behavior exhibited on my old PCMCIA
card on Verizon or on my u727 on Sprint. What OS platform were
you on lappie-wise?

I've thought on a couple of occasions that a "geek bake-off" between
EVDO and 3G providers looking for technical jack moves on the
providers' part would make for a nice NANOG lightning talk. Sadly, I
haven't the time to devote to such an endeavor.

-r

Rob_Seastrom2 · April 9, 2009, 3:45pm

Daniel Senie <dts@senie.com> writes:

We observe this same kind of behavior with firewalls in the path
watching for dead sessions they can clean up. Appears they send RSTs
to both end points when they decide a session has gone away, as
that'll let end hosts figure it out sooner. Same workaround of turning
on keep=alives once a minute solves this too. The behavior in the case
of firewalls makes sense, as state tables have to be cleaned up
eventually.

While I agree with you that the behavior makes perfect sense, I submit
that the controls are often set improperly (by default or due to
configuration by underskilled technicians) - that is to say, without
taking into account the likely behavior of TCP when the connection is
in fact still open. Consider the default keepalive interval on a
selection of operating systems:

FreeBSD - 7200 seconds:
root@clack [17] # sysctl -a | grep keepidle
net.inet.tcp.keepidle: 7200000
root@clack [18] #

MacOSX - 7200 seconds:
[Superfly:~] root# sysctl -a | grep keepidle
net.inet.tcp.keepidle: 7200000
[Superfly:~] root#

Windows XP - 7200 seconds:

(notice a pattern here?)

Seems to me that a well-engineered firewall will have enough memory in
it that (in the application for which it is specified, with
anticipated traffic levels) it doesn't have to be over-aggressive and
try cleaning up flows that haven't seen any traffic in less than, say,
two hours and ten minutes.

-r

Steven_Bellovin · April 9, 2009, 4:28pm

> I use a Verizon Wireless u727; before that, I used a PCMCIA card.
> I've never had problems with drops on idle. *However* -- if there
> was a packet from the wrong IP address, the older card would drop
> the connection -- apparently, that behavior was required by the
> spec. (I haven't checked if the newer one will do that.) So, if
> the EVDO connection dropped while I had, say, an IMAP or ssh
> session open, and I dialed back in, the next TCP packet would cause
> EVDO to drop again... I finally "fixed" it by creating ipfilter
> rules in my ppp-up script to block all "bad" packets from going out.

Interesting. I never had that behavior exhibited on my old PCMCIA
card on Verizon or on my u727 on Sprint. What OS platform were
you on lappie-wise?

I run NetBSD but I know that the problem also showed up on Linux -- a
friend who worked for an equipment vendor also saw it, and he checked
the actual EVDO specs.

We suspect the problem doesn't show up for Windows users because
Windows appears to terminate all connections with extreme prejudice
when the link goes away, so there won't be any TCP transmissions to
induce the failure.

I've thought on a couple of occasions that a "geek bake-off" between
EVDO and 3G providers looking for technical jack moves on the
providers' part would make for a nice NANOG lightning talk. Sadly, I
haven't the time to devote to such an endeavor.

-r

--Steve Bellovin, Steven M. Bellovin

Rob_Seastrom2 · April 9, 2009, 5:27pm

"Steven M. Bellovin" <smb@cs.columbia.edu> writes:

Interesting. I never had that behavior exhibited on my old PCMCIA
card on Verizon or on my u727 on Sprint. What OS platform were
you on lappie-wise?

I run NetBSD but I know that the problem also showed up on Linux -- a
friend who worked for an equipment vendor also saw it, and he checked
the actual EVDO specs.

We suspect the problem doesn't show up for Windows users because
Windows appears to terminate all connections with extreme prejudice
when the link goes away, so there won't be any TCP transmissions to
induce the failure.

Didn't have the problem manifest itself on MacOSX (10.4, 10.5) either...

-r

Joe_Provo4 · April 9, 2009, 7:33pm

Daniel Senie <dts@senie.com> writes:
> We observe this same kind of behavior with firewalls in the path
> watching for dead sessions they can clean up. Appears they send RSTs
> to both end points when they decide a session has gone away, as
> that'll let end hosts figure it out sooner. Same workaround of turning
> on keep=alives once a minute solves this too. The behavior in the case
> of firewalls makes sense, as state tables have to be cleaned up
> eventually.

Ish. 3360 argues against extraneous RSTs in general, in addition to some
specific cases (response to malformed or unknown TCP options, etc).

While I agree with you that the behavior makes perfect sense, I submit
that the controls are often set improperly (by default or due to
configuration by underskilled technicians) - that is to say, without
taking into account the likely behavior of TCP when the connection is
in fact still open. Consider the default keepalive interval on a
selection of operating systems:

FreeBSD - 7200 seconds:
root@clack [17] # sysctl -a | grep keepidle
net.inet.tcp.keepidle: 7200000
root@clack [18] #

MacOSX - 7200 seconds:
[Superfly:~] root# sysctl -a | grep keepidle
net.inet.tcp.keepidle: 7200000
[Superfly:~] root#

Windows XP - 7200 seconds:
TCP/IP and NBT configuration parameters for Windows XP - Windows Client | Microsoft Learn

(notice a pattern here?)

You mean adherance to the minimum per Host Requirements (1122)?

Seems to me that a well-engineered firewall will have enough memory in
it that (in the application for which it is specified, with
anticipated traffic levels) it doesn't have to be over-aggressive and
try cleaning up flows that haven't seen any traffic in less than, say,
two hours and ten minutes.

TCP vs application keepalives have been a religious topic for ages.
It would seem that generous host idle windows in the modern Internet
(increased speed, throughput, mobility, avilability and hostility
since 1122 was written) are a bit odd.

Joe, who thinks purposefully long-lived TCP applications should
certainly have their own keep-alives