Wireless STM-1 link

Hi all,

I'm encountering a problem with a wireless STM-1 link which has a switch
connected at each end.

The wireless link has Gigabit Ethernet interfaces and so have my switches.

When I ping between the 2 switches via that wireless link I'm getting a lot
of pings that are lost.

The wireless link is not saturated but I'm thinking it could have to do
something with the gigabit interfaces and only having 155Mbps on the link
itself?

All ideas welcome.

Regards,

Rens

Sounds like this might be an Ethernet negotiaton problem

All the interfaces are forced to 1Gbps and full duplex.

Maybe I should give some extra info.
All the traffic seems to pass ok via that link but I have seen that often
OSPF adjacencies go down/up , I suspect that the HELLO packets are being
dropped that pass via that link.

That's why I started to look a little deeper and do some ping tests.

I'm assuming that you have checked all of the wireless parameters (noise floor, signal strength, jitter, etc)? I've seen behavior like this on links where the noise floor has risen to the point where the true signal cannot be distinguished from background noise.

Josh

Rens wrote:

Yes all the radio RF levels are 100% ok.

Rens,

Does not sound like the symptoms for what I want to write about, but this is
something you need to consider in any way:

When you run sub-rate links (i.e. 1GE interface with really 155Mbps as the
service) you need to make sure that you do not try to push more traffic than
the link can take.
This is mostly relevant for traffic bursts, which happen all the time with
IP traffic. So even if on average you do not use the bandwidth, still you
have short bursts whenever you start a transaction (like a file transfer
etc).

In order to avoid packets being dropped due to this burst on the link, the
1GE equipment before the link should be doing egress shaping to the rate
(sometimes even it is good to choose a rate slightly lower then the actual
rate) of the link.
This would make sure that the network equipment manages the packet drops (if
you have a child QOS policy) and you do not get random tail drops of the
burst.

This means that you need to choose the right network device that actually
supports egress shaping. Be aware that many L2/L3 switches do not support
this.

Arie

All the interfaces are forced to 1Gbps and full duplex.

This takes the interface out of spec, IIRC. Try with auto-negotation
enabled.

I have tried both actually, forced and auto, same issue

Whats the utilization of the link at the time that you're seeing problems?

Between 20 & 80 Mbps, no real relation between the problem and the time of a
day/higher/lower traffic

I thought that with 1000T, you need to keep autonegotiation in place:

  http://etherealmind.com/2008/07/15/ethernet-autonegotiation-works-why-how-standard-should-be-set/

    "A major problem is that many people are also hard setting
    Gigabit Ethernet , and this is causing major problems. Gigabit
    Ethernet must have auto-negotiation ENABLED to allow negotiation
    of master / slave PHY relationship for clocking at the physical
    layer. Without negotiation the line clock will not establish
    correctly and physical layers problems can result."

Further:

  http://en.wikipedia.org/wiki/Autonegotiation

    "The debatable portions of the autonegotiation specifications were
    eliminated by the 1998 version of IEEE 802.3. In 1999, the negotiation
    protocol was significantly extended by IEEE 802.3ab, which specified the
    protocol for gigabit Ethernet, making autonegotiation mandatory for
    1000BASE-T gigabit Ethernet over copper."

Note the 'mandatory'...

Brian Reichert wrote:

All the interfaces are forced to 1Gbps and full duplex.

I thought that with 1000T, you need to keep autonegotiation in place:

  http://etherealmind.com/2008/07/15/ethernet-autonegotiation-works-why-how-standard-should-be-set/

    "A major problem is that many people are also hard setting
    Gigabit Ethernet , and this is causing major problems. Gigabit
    Ethernet must have auto-negotiation ENABLED to allow negotiation
    of master / slave PHY relationship for clocking at the physical
    layer. Without negotiation the line clock will not establish
    correctly and physical layers problems can result."

Further:

  Autonegotiation - Wikipedia

    "The debatable portions of the autonegotiation specifications were
    eliminated by the 1998 version of IEEE 802.3. In 1999, the negotiation
    protocol was significantly extended by IEEE 802.3ab, which specified the
    protocol for gigabit Ethernet, making autonegotiation mandatory for
    1000BASE-T gigabit Ethernet over copper."

Note the 'mandatory'...

I'm in the "it's not 1996 anymore, let autonegotiation do it's
job" camp. I occasionally see folks who religiously "lock down"
all ports only to create the very duplex mismatches they are trying
to avoid. Engineers, equipment, port positions and operating systems
can change over time defeating even the best laid plans for total
port control.

- Kevin

Seems everyone has focused on GE as the problem. You can quickly rule that
out by looking at interface error counters and doing PING tests from the
wireless router/device to something on the local network on both sides. If
OSPF is flapping because of missed HELLO packets then I'm thinking you have
a problem with either saturation on the link or actual wireless issues.
When PING does work what do the times look like? I'd look at static routing
for a bit (if practical) or changing your OSPF HELLO intervals to see if
that does anything. Here's a good link on troubleshooting
OSPF adjacency changes:
http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a0080094050.shtml

Kenny Sallee wrote:

Seems everyone has focused on GE as the problem. You can quickly rule that
out by looking at interface error counters and doing PING tests from the
wireless router/device to something on the local network on both sides. If
OSPF is flapping because of missed HELLO packets then I'm thinking you have
a problem with either saturation on the link or actual wireless issues.
When PING does work what do the times look like? I'd look at static routing
for a bit (if practical) or changing your OSPF HELLO intervals to see if
that does anything. Here's a good link on troubleshooting
OSPF adjacency changes:
Troubleshoot OSPF Neighbor Problems - Cisco
  
I'd like to second the above. Wireless can, and often does, suffer from isses that other media such as copper and fiber media do not, and you need to be looking closely at the device's RF statistics (combined with your own monitoring of link rssi, error blocks, retrans, and others... you are monitoring and graphing this, yes?). Some of the variations you can expect in wireless include -

    Interference (if using unliscensed band gear - do NOT assume your little corner of the world doesn't have anyone else using the band occasionally!)
    Thermal inversion fade
    Water build up - especially inside of antennas and antenna elements, this can take your -36 rssi and make it drop to -86 and then all of a sudden come back in the space of 30 seconds. This can be the hardest problem to find - look at your connectors, the seal up job, anywhere they would have had to seal would be a place of penetration.
    Birds, trucks, anything causing occasional multipath reflections or blockage between the two sides

    Also it is my direct experience that wireless devices from all manufacturers also are more bug ridden and usually have far more exotic corner cases where their gear just does the wrong thing occasionally. Corrupt frames at the RF layer may not be detected due to various mac layer defeciencies, with the result being incorrect reassembly and framing of the junk as an ethernet frame and even including a valid fcs in the ethernet header but corrupt junk in the packet itself. Sometimes the RF device's own bridging tables get corrupted as a result, causing you to lose connectivity as bridge entries are relearned. There's all kinds of stuff that can go wrong here that is not your ordinary every day cisco 4-byte asn bug variety.

    My advice only is, be suspecious and be a good detective.

Mike-

I totally agree with everything that Mike has posted here... one thing I
wanted to add is that a wireless link is only as good as it's
engineered. We have many rock solid wireless links in use here - with
proper engineering and ongoing maintenance we very rarely have issues.
We do have some links that were not engineered to proper levels
(sometimes where a business decision overrode a technical decision for
example) and they do have blips every so often. Maintenance is so
important after a link is established as stuff breaks, wears down,
leaks, and moves.

Paul

When I do a lot of pings with small packet size I get drops.

I'm think this is because of the flow control that I activated and the link
can't handle it this fast and drops them.

This at least is what the vendor says => dropping the low priority ping
packets is normal behavior.

I have the ability to enable 802.1p on the link, is there a way to
prioritize the OSPF hello packets with this?