Is there a method or tool(s) to prove network outages?

Notify_Me · December 1, 2013, 4:56pm

Hi Everyone

Please I have a very problematic radio link which goes out and back on
again every few hours.
The only way I know this is happening is from my gateway device: a Sophos
UTM that sends email anytime there's been an outage.

The ISP refuses to accept this as outage/instability proof, and I'm
wondering if there's something I can run behind the gateway UTM that can
provide output information over time.
They seem to be a primarily Windows+Cisco shop (as is common here in the
4th world). We are primarily Linux.
Is there some set of command incantations I can run who's output I can
collect and send to them (besides some sort of sustained ping)?

Thanks in advance!

Sina

Joel_Jaeggli · December 1, 2013, 5:19pm

Hi Everyone

Please I have a very problematic radio link which goes out and back on
again every few hours.
The only way I know this is happening is from my gateway device: a Sophos
UTM that sends email anytime there's been an outage.

The ISP refuses to accept this as outage/instability proof, and I'm
wondering if there's something I can run behind the gateway UTM that can
provide output information over time.
They seem to be a primarily Windows+Cisco shop (as is common here in the
4th world). We are primarily Linux.
Is there some set of command incantations I can run who's output I can
collect and send to them (besides some sort of sustained ping)?

Given a measurement target on the customer side and smokeping instance
on your side you can actively measure the availability/latency/loss
rates between them.

Dobbins_Roland · December 1, 2013, 5:20pm

Do you have wireless CPE within your span of administrative control?

Don_Bowman · December 1, 2013, 5:20pm

Hi Everyone

Please I have a very problematic radio link which goes
out and back on again every few hours.
The only way I know this is happening is from my gateway
device: a Sophos UTM that sends email anytime there's been an outage.

The ISP refuses to accept this as outage/instability
proof, and I'm wondering if there's something I can run
behind the gateway UTM that can provide output information
over time. They seem to be a primarily Windows+Cisco shop
(as is common here in the 4th world). We are primarily Linux.
Is there some set of command incantations I can run who's output I can
collect and send to them (besides some sort of sustained ping)?

I'm a big fan of smokeping personally. (SmokePing - About SmokePing)
it will install on practically any linux device, and i've even installed
it on ~$50 consumer NAS or router type devices (e.g. a pgoplug nas)
if it has enough ram (128MB is dicey but works).

shows you loss + latency.

you set it up to ping e.g. the near and far end of the radio link, and then
maybe a few sentinel sites.

it can do ICMP and also TCP.

Dobbins_Roland · December 1, 2013, 5:23pm

I think he's actually the end-customer, and he's saying that his upstream transit ISP won't accept non-RF-specific diags . . .

Joel_Jaeggli · December 1, 2013, 5:38pm

Given a measurement target on the customer side and smokeping instance on your side you can actively measure the availability/latency/loss
rates between them.

I think he's actually the end-customer, and he's saying that his upstream transit ISP won't accept non-RF-specific diags . . .

and if you don't control any of the air interfaces you don't get that.

Notify_Me · December 1, 2013, 5:39pm

No, I don't.

Notify_Me · December 1, 2013, 5:42pm

I'm actually halfway through trying to setup a smokeping appliance.

Andrew_D_Kirch1 · December 1, 2013, 6:40pm

Sina,

I'd recommend using Zenoss to monitor the remote end of the link at least with /Status/Ping. You'll get alerts when Zenoss can't ping across the link, and may be able to set up SNMP traps on your router for the link itself going down.

DISCLOSURE: I work for Zenoss, however I used Zenoss core long before they decided to pay me money.

Good luck with dealing with your ISP, it's _ALWAYS_ a pain in situations like this.

Andrew

Notify_Me · December 1, 2013, 6:57pm

Thanks a lot, ill definitely consider it.

Warren_Bailey1 · December 1, 2013, 7:44pm

Ask them for a plot of your snr (signal to noise ratio) and your rsl (receive signal level). In the RF realm, it's pretty difficult to fabricate receive power.

_Matt_Palmer · December 1, 2013, 7:50pm

I'm surprised nobody's mentioned the root question to answer before you go
off spending time setting up anything in particular: what *will* the ISP
accept (or be forced to accept) as outage/instability proof? Contracts are
your first line of defence, but it's nigh-on universal that they don't cover
these sorts of situations well enough. So you probably need to have a
discussion, as a follow-on from being told that your UTM's e-mails *aren't*
sufficient, to determine what *is* sufficient.

Once you've got that, only then can you evaluate appropriate methods of
gathering the necessary data to support a claim of an outage. I like the
*idea* of smokeping, but when gathering data on complete service loss (which
was my use case for it as well) I found its methods of collecting and
displaying that data to be very suboptimal and counter-intuitive.

For something small and once-off like this, I'd probably just break out my
text editor and script up something that would collect the relevant data and
process it into the acceptable form.

- Matt

Notify_Me · December 1, 2013, 8:02pm

Hmm. Great points. Didn't think of that.

George_Herbert · December 1, 2013, 8:10pm

This.

They may not cooperate, in which case, you have to force proof down their throats.

I would go with the Zenoss (or Zabbix, or...) option - a free to use, professionally supported, professional grade commonly used monitoring package that would meet anyone's basic "credible tool" definition plus neat GUI to send a snapshot of the results.

Use it to perform various tests of the net - pings, http gets of some small target, starting pings with the next hop outside your premise and working outwards to the outside world. Don't overwhelm your net with tests, but test as often as needed to demonstrate an issue.

-george william herbert
george.herbert@gmail.com

William_Waites1 · December 1, 2013, 8:14pm

Is "every few hours" regular/cyclical? Does the radio link cross a
tidal body of water?

-w

Bandy_Rush1 · December 1, 2013, 8:20pm

if you do not control the rf end [0], then i assume the upstream
supplies it and is really selling you connectivity to behind the rf cpe.
so you should show you do not have that connectivity, the rf is a red
herring. use smokeping or any other tool from immediately behind the rf
cpe with a target of the first layer three hop beyond your network.

but, if the upstream is in solid denial and is not actually accepting
that they have a contractual obligation, measurement and technology are
not going to help you.

randy

Notify_Me · December 1, 2013, 8:25pm

Its cyclical, but I have not tried to graph/measure its repetition before now (when I noticed the emails filling up my inbox).
Body of tidal water..could be, but I wasn't involved in the installation so I can't actually tell where the antennas are pointing.

William_Waites1 · December 1, 2013, 10:02pm

This is speculation until you have measurements, but if this is the
case I'd wager you are having reflected signal interference off of the
water. The water acts like a mirror and as it moves up and down the
reflected signal will move in and out of phase with the main
signal. At certain points you'll get near complete cancellation and
the link will fail.

See section 4 here for some explanations, fig 5 and 6 for what you
could expect the graphs of signal strength, time, link capactity to
look like:

http://homepages.inf.ed.ac.uk/mmarina/papers/mobicom_winsdr08.pdf

But not having access to the RF part you can't measure this
directly. If you can get tide tables for a nearby location, what you
could do is say that signal strength is 1 if the link is working and 0
if it is not. Measure for a while then scatterplot that against the
level of the tide. If the measurements of 0 group tightly together
in a few spots then you know definitely what is happening. Perhaps
that plot together with a pointer to a nice academic paper would be
enough to convince the provider of what is happening.

What could you do about this?

If you are lucky and the interference does not complete a full cycle
from destructive to constructive and back with the largest amplitude
of the tides that you experience in that place, you could try moving
the antenna up or down. How much depends on the frequency and
distances involved but I'd try 25cm increments up to a couple of
meters if you can. You'll still get degradation but can hopefully
avoid the deep nulls that take the link out completely.

If you are able and willing to replace the end-site radios or antennas
with your own, and the link uses some sort of 2xN MIMO, you could
arrange vertical spacing between the antennas so that you have a good
signal at one antenna when the other one is experiencing a null. This
should get you on average half the best-case throughput the equipment
is capable of but it should get you that consistently. The actual
spacing depends on the distances and heights involved.

-w

Warren_Bailey1 · December 1, 2013, 11:26pm

I would hold off on considering Multipath as a problem until you see the
RSL. There is no reason to go to the worst case scenario. In addition to
that, there are some mitigation techniques we use (OFDM, XPIC, etc.) that
will help null out some multi path should that be the case. With that
being said, you should probably hire an RF Engineer rather than try to
attempt this yourself. If you guys are having path problems, talk to the
guy who designed the path. If there ³wasn¹t² a ³guy² who ³designed² the
path - this is what you get.

//warren

Dobbins_Roland · December 2, 2013, 2:15am

That was my point - if the upstream won't accept ICMP pings, what's the likelihood he'll accept anything else, either?