Lucent GBE (4 x VC4) clues needed

(oops technical question in nanog, wearing my asbestos suit)

Consider this topology

GSR - 3750 --(GE over 4xVC4) - NSE100 - NSE100 --(GE over 4xVC4) -- 3550 - GSR

All other fibres are dark fibres, except marked.

When we ping either NSE100 <-> GSR leg, when there is no background traffic
there is no packet loss. If there is even few Mbps, lets say 10Mbps of
background traffic we get 1-5% packet loss on 1500 bytes, and bit
less packet loss on small packets. As background traffic increases
packet loss quickly increases.

We tried to replace (GSR-3750) with 7600, but same issue
persisted.

We've measured both Lucent GBE legs with having loop in other end
and pushing tests from EXFO and Smartbits gear through the loop,
no errors can be detected in RFC tests.

There isn't very much that can be configured in the Lucent, and we've
tried pretty much every setting. We've tried to set autonego on
and off in every gear in the path, without any changes to observed
behaviour. We've also tried to use use 1xVC4, without any changes
to the behaviour. All VC4's in given leg are using same path.
Even though we test the packet loss pinging from router link to
router link, same packet loss is experienced for transit traffic
also. We've tried to turn PXF off in NSE100. Packets between
NSE100 <-> NSE100 over dark fibre are not lost.

We're pretty much utterly without clues. All I can think off is
some obscure IFG issue, that is, NSE100 would have less than
perfect timing for IFG which would confuse Lucent regarding
what is part of which frame. Does stuff like this really
happen?

NSE100 drops bad IP packets in PXF and there is only shared
counter, so I can't tell if I get CRC for IP, I just
loose the packets. But IS-IS is not handled in PXF, and
I get %CLNS-4-LSPCKSUM and %CLNS-3-BADPACKET messages
over both Lucent legs, but not between the NSE100's.
So I assume the packets are not dropped, but broken.

I swear next time I'll complain about some political issue,
thanks,

Silly question (considering that you stated that IS-IS is borked also,
which is not handled by PXF - but did you try disabling PXF?

There's a reason why Cisco discontinued every product that "features"
it. It's broken.

> traffic also. We've tried to turn PXF off in NSE100. Packets

Silly question (considering that you stated that IS-IS is borked also,
which is not handled by PXF - but did you try disabling PXF?

Not silly question at all, it was just longer mail that many people
care to read (including me).

There's a reason why Cisco discontinued every product that "features"
it. It's broken.

It's not broken, it's just ciscos name for NPU, two PXF's doesn't mean
they have anything in common, apart being NPU. In essence, CRS-1
uses NPU's afaik, of course cisco doesn't call them PXF, due to
bad publicity. Cooler word for NPU style design is probably
cell processor, makes me feel warm already about my NSE100's.
Yes, you can design broken NPU, NSE-1 was good example of that :).

Thanks,

Saku Ytti wrote:

(oops technical question in nanog, wearing my asbestos suit)

Consider this topology

GSR - 3750 --(GE over 4xVC4) - NSE100 - NSE100 --(GE over 4xVC4) -- 3550 - GSR

All other fibres are dark fibres, except marked.

When we ping either NSE100 <-> GSR leg, when there is no background traffic
there is no packet loss. If there is even few Mbps, lets say 10Mbps of background traffic we get 1-5% packet loss on 1500 bytes, and bit
less packet loss on small packets. As background traffic increases
packet loss quickly increases.

[SNIP]

There isn't very much that can be configured in the Lucent, and we've
tried pretty much every setting. We've tried to set autonego on
and off in every gear in the path, without any changes to observed
behaviour.

Did you try power cycling the Lucents after changing the auto-neg settings? I've seen some broken autoneg implementations in the past on managed media converters that didn't change settings immediately. It's worth a shot as you seem to be all out of other ideas :wink:

Sam

I brought the adjacent ports in IP gear down and up. We could verify
from management interface to the lucent that autonegotiation wasn't
performed after down/up, while we could observe before down/up that
autonegotiation was marked being done even though we had configure
cisoc interfaces as 'force-up'. So clearly it needed to see link
down/up.
We didn't powercycle lucent, as it would mean bringing down tens
of 10G waves. But taking the GBE module out/in would have been
option (three countries are involved, so bit inconvenient, but
possible). Country A - Country B is one lucent leg.
Country B - Country C is another lucent leg.

Anyhow thanks for the thoughts, any help I can get is much
appreciated :). Of course we have full support agreement
to both vendors, which we probably have to try sooner
or later, but it'll be long battle on who's problem it
really is.

This should have been Nortel GBE, not Lucent my bad.

Anyhow, just wanted for sake of archive report that it's the Nortel
4xVC4 that corrupts packets, it mostly seems to corrupt source MAC
and always same bits, that is, any L2 will learn mostly same MAC with
few different vendor codes, we can also see this in wireshark on
fibresplitter. (It's not limited strictly to source MAC, but it's
not random by any means)
It's not broken hardware (unless by design), as it can be seen in both of
the production legs and we've recreated the same problem in lab.
Most likely software issue in Nortel.

> Consider this topology
>
> GSR - 3750 --(GE over 4xVC4) - NSE100 - NSE100 --(GE over 4xVC4) -- 3550 - GSR

This should have been Nortel GBE, not Lucent my bad.

My first best guess was right, it was lucent system after all.

We've now solved the issue, problem is in GBE card in Lucent in
hardware revision S1:7, which is broken by design. S1:3, S1:6
work and we should be able to test S1:8 soon, but we expect
it to work also.

Symptoms were that it flipped bits (but not randomly, just
couldn't figure out why certain places saw bit flips) and
calculated new, correct CRC to the ethernet frame, after
it had flipped bit.