question concerning traceroute?

Darrell_Carley1 · October 17, 2002, 2:31pm

I am trying to troubleshoot a latency issue for some of our networks, and was wondering about this…Knowing that routing isn’t always symmetrical, is it possible for a traceroute to traverse a different reverse path, than the path that it took to get there? …or will it provide a trace of the path the packet took to reach the destination?

According to definition, is should take the same path, but are there any other cases that I should be aware of?

Darrell

Martin1 · October 17, 2002, 2:43pm

$author = "Darrell Carley" ;

I am trying to troubleshoot a latency issue for some of our networks,
and was wondering about this.Knowing that routing isn't always
symmetrical, is it possible for a traceroute to traverse a different
reverse path, than the path that it took to get there? .or will it
provide a trace of the path the packet took to reach the destination?
According to definition, is should take the same path, but are there any
other cases that I should be aware of?

a traceroute shows the outbound route. it's possible for the the probe
packets to follow one path and the returning icmp packets to take another
path. a looking glass in the AS your tracing to is a good way to see what
the return path is...

marty

Alex_Yuriev2 · October 17, 2002, 2:49pm

According to definition, is should take the same path, but are there any
other cases that I should be aware of?

According to the definition, it is going to show you the path the packets
took from you to the destination, not from the destination back.

Alex

Arnold_Nipper1 · October 17, 2002, 2:47pm

alex@yuriev.com wrote:

> According to definition, is should take the same path, but are there any
> other cases that I should be aware of?

According to the definition, it is going to show you the path the packets
took from you to the destination, not from the destination back.

Unless you did "- g",

Arnold

Alex_Yuriev2 · October 17, 2002, 2:58pm

Not correct. -g specifies loose source routing on the way *there*, not back.

Alex

Jared_Mauch · October 17, 2002, 2:56pm

Alex,

I think the intention was to indicate that you can
traceroute -g <remote-router-before-host> <your-local-ip>

to get the path to and back. -g requires an argument obviously..

- Jared

Andy_Johnson · October 17, 2002, 3:00pm

There used to be an old flag you could set on an ICMP_ECHO request to record the path the echo reply takes back (ping -R or -r?), but apparently its not used much anymore. Probably just as well… it could only hold ~8 hops…

Andy

Alex_Yuriev2 · October 17, 2002, 3:21pm

> > > > According to definition, is should take the same path, but are there any
> > > > other cases that I should be aware of?
> > >
> > > According to the definition, it is going to show you the path the packets
> > > took from you to the destination, not from the destination back.
> > >
> >
> > Unless you did "- g",
>
> Not correct. -g specifies loose source routing on the way *there*, not back.

I think the intention was to indicate that you can
traceroute -g <remote-router-before-host> <your-local-ip>

to get the path to and back. -g requires an argument obviously.

That, obviously, is correct.

  However, the remote ip in this case is your local IP, so you are
still getting a path to the destination.

  Even more importantly, LSR relies on every router on a forward path
between <your-local-ip> and <remote-router-before-host> allowing LSR, which
is an invalid assumption.

Thanks,
Alex

Dave_Howe · October 17, 2002, 3:44pm

at Thursday, October 17, 2002 3:58 PM, alex@yuriev.com
<alex@yuriev.com> was seen to say:

Unless you did "- g",

Not correct. -g specifies loose source routing on the way *there*,
not back.

No, you can get both if you ping *yourself* with the actual destination
as -g. this gives you both legs of the trip.

Lane_Patterson1 · October 18, 2002, 12:47am

I am trying to troubleshoot a latency issue for some of our networks,
and was wondering about this.Knowing that routing isn't always
symmetrical, is it possible for a traceroute to traverse a different
reverse path, than the path that it took to get there? .or will it
provide a trace of the path the packet took to reach the destination?
According to definition, is should take the same path, but are there any
other cases that I should be aware of?

Something else to be aware of is the effect of ECMP on traceroutes--where
the source/dest IP (among other hash inputs) can impact which of several
parallel equal cost paths you take thru a backbone. ECMP is fairly common,
so I would suspect a fairly large percentage of paths are subject to it.

-Lane

Marshall_Eubanks3 · November 27, 2002, 3:17pm

Anyone have any idea what really happened :

http://www.boston.com/dailyglobe2/330/science/Got_paper_+.shtml

<snip>
It was too late. Somewhere in the web of copper wires and glass fibers that
connects the hospital's two campuses and satellite offices, the data was stuck
in an endless loop. Halamka's technicians shut down part of the network to
contain it, but that created a cascade of new problems.

The entire system crashed, freezing the massive stream of information -
prescriptions, lab tests, patient histories, Medicare bills - that shoots
through the hospital's electronic arteries every day, touching every aspect of
care for hundreds of patients.
...
The crisis had nothing to do with the particular software the researcher was
using. The problem had to do with a system called ''spanning tree protocol,''
which finds the most efficient way to move information through the network and
blocks alternate routes to prevent data from getting stuck in a loop. The large
volume of data the researcher was uploading happened to be the last drop that
made the network overflow.

Regards
Marshall Eubanks

Stephen_J_Wilcox1 · November 27, 2002, 3:25pm

Hmm, well until the comment about STP it sounded like the guy did something
stupid on a program/database on a mainframe..

I cant see how STP could do this or require that level of DR. Perhaps its just
the scapegoat for the Doc's mistake which he didnt want to admit!

STeve

Joe_Abley3 · November 27, 2002, 3:53pm

If it's anything like any other layer-2 IT network meltdown I've seen, it'll be some combination of:

  + no documentation on what the network looks like, apart from a large
    yellow autocad diagram which was stapled to the wall in the basement
    wiring closet in 1988

+ a scarcity of diagnostic tools, and no knowledge of how to use the
ones that do exist

+ complete ignorance of what traffic flows when the network is not
broken

+ a cable management standard that was first broken in 1988 and has
only been used since to pad out RFPs

  + consideration to network design which does not extend beyond the
    reassuring knowledge that the sales guy who sold you the hardware
    is a good guy, and will look after you

  + random unauthorised insertion of hubs and switches into the fabric
    by users who got fed up of waiting eight months to get another
    ethernet port installed in their lab

+ customers who have been trained by its vendors to believe that
certification is more important than experience

  + customers who believe in the cost benefit of a large distributed
    layer-2 network over a large distributed (largely self-documenting)
    layer-3 network.

Just another day at the office.

Joe

Stephen_J_Wilcox1 · November 27, 2002, 4:06pm

Sure, which is why

"Within a few hours, Cisco Systems, the hospital's network provider, was loading
thousands of pounds of network equipment onto an airplane in California, bound "

seems somewhat excessive!

and

"The crisis began on a Wednesday afternoon, Nov. 13, and lasted nearly four
days"

sounds like an opportunity for any consultants on nanog who have half a clue
about how to setup a LAN!

Steve

Eric2 · November 27, 2002, 4:10pm

Anyone have any idea what really happened :
http://www.boston.com/dailyglobe2/330/science/Got_paper_+.shtml

I know someone who worked on it, but I've avoided asking what
really happened so I don't freak out the day the ambulence drives
me up to their emergency room The other day, I did forward the article
over to our medical school in the hopes that they might "check" their
network for similar "issues" before something happens

I don't know which scares me more: that the hospital messed up spanning-tree
so badly (which means they likely had it turned off) that it imploded
their entire network. Or that it took them 4 days to figure it out.

Eric

Chris_Kilbourn · November 27, 2002, 5:13pm

If it's anything like a former employer I used to work for, it's possible the physical wiring plant is owned/managed by the telco group which jealously guards its infrastructure from the networking group.

A subnet I used to work on was dropped dead for a day when a telco-type punched a digital phone down into the computer network causing a broadcast storm. It took half a day just to get the wiring map, then another half day to track down the offending port because the tech in the network group dispatched to solve the problem did not have a current network map.

The subnet in question contained a unix cluster with cross-mounted file systems that processed CAT scans for brain trauma research. The sysadmin of that system told me that they lost a week's worth of research because of that cock-up.

Hospitals are very soft targets network-wise, with hundreds, if not thousands of nodes of edge equipment unmanned for hours long stretches. On a regular basis, I saw wiring closets propped open and used as storage space for other equipment.

Track down a pair of scrubs, and you can walk just about anywhere in a hospital without being challenged as long as you look like you know where you are going and what you are doing.

Ten years later, there are still routers there that I can log into as the passwords have never been changed because the administrators of them were reorganized out or laid off and the equipment was orphaned.

Minimal social engineering plus a weak network security infrastructure is a disaster waiting to happen for any major medical facility.

Daniel_Golding1 · November 29, 2002, 5:22pm

Marshall,

"It was Dr. John Halamka, the former emergency-room physician who runs
Beth Israel Deaconess Medical Center's gigantic computer network"

It appears what really happened is that they put an emergency room doctor
in charge of a critical system in which he, in all likelyhood, had
limited training. In the medical system, he was trusted because of he was
a doctor. The sad thing about this is that there seems to be no
realization that having experienced networking folks in this job might
have averted a situation that could have been (almost certainly
was?) deleterious to patient care.

We all know folks who are unemployed thanks to the telecom meltdown, so
its not like this institution couldn't have hired a competant network
engineer on the cheap.

Sorry for the rant - I just hate to see the newspaper missing the point,
here. They didn't have one quote from an actual networking expert. It does
look like Cisco took the oportunity to sell them some stuff - looks like
someone got something out of this - too bad it wasn't the patients

- Dan

Marshall_Eubanks3 · November 29, 2002, 5:38pm

Radia Pearlman lives only a few miles away - they could have asked her for a quote

However, I would not be too harsh towards Dr. John - it is common practice in specialty organizations to put a member of the club in charge of every department, even if most of the decisions are actually made by the staff, as he or she is supposed to better understand the needs (and lingo) of the organization.

In the military, for example, an Officer is always in charge of an installation -
literally the CO - even if he and his aide are the _only_ military personnel stationed there - which happens sometimes with highly technical activities.

So I would not assume that the good Doctor is actually the one configuring the network.

I wonder if Cisco will be moving them from an enormous flat Layer 2 network to
a more sensible Layer 3 IP network.

Marshall

Stephen_Sprunk2 · November 29, 2002, 7:21pm

Thus spake "Eric Gauthier" <eric@roxanne.org>

> Anyone have any idea what really happened :
> http://www.boston.com/dailyglobe2/330/science/Got_paper_+.shtml

I can't speak to exactly what happened because of NDA, but I think I can
help NANOGers understand the environment and why this happens in general.

I know someone who worked on it, but I've avoided asking what
really happened so I don't freak out the day the ambulence drives
me up to their emergency room The other day, I did forward the article
over to our medical school in the hopes that they might "check" their
network for similar "issues" before something happens

I see a lot of Fortune 500 networks in my job, and I'd say at least 75% of
them are in the same state: a house of cards standing only because new cards
are added so slowly. Any major event, whether a new bandwidth-hungry
application or a parity error in a router, can bring the whole thing down,
and there's no way to bring it back up again in its existing state.

No matter how many powerpoint slides you send to the CIO, it's always a
complete shock when the company ends up in the proverbial handbasket and
you're looking at several days of downtime to do 4+ years of maintenance and
design changes. And, what's worse, nobody learns the lesson and this
repeats every 2-5 years, with varying degrees of public visibility.

This is a bit of culture shock for most ISPs, because an ISP exists to serve
the network, and proper design is at least understood, if not always adhered
to. In the corporate world, however, the network and support staff are an
expense to be minimized, and capital or headcount is almost never available
to fix things that are "working" today.

I don't know which scares me more: that the hospital messed up
spanning-tree so badly (which means they likely had it turned off) that
it imploded their entire network. Or that it took them 4 days to figure
it out.

It didn't take 4 days to figure out what was wrong -- that's usually
apparent within an hour or so. What takes 4 days is having to reconfigure
or replace every part of the network without any documentation or advance
planning.

My nightmares aren't about having a customer crater like this -- that's an
expectation. My nightmare is when it happens to the entire Fortune 100 on
the same weekend, because it's only pure luck that it doesn't.

S

Stephen_Sprunk2 · November 29, 2002, 7:29pm

Thus spake "Daniel Golding" <dgold@FDFNet.Net>

It appears what really happened is that they put an emergency room doctor
in charge of a critical system in which he, in all likelyhood, had
limited training. In the medical system, he was trusted because of he was
a doctor. The sad thing about this is that there seems to be no
realization that having experienced networking folks in this job might
have averted a situation that could have been (almost certainly
was?) deleterious to patient care.

I think it's safe to say there was competent staff involved before the
incident and everyone knew exactly how bad the network was and how likely a
failure was. It's very rare for people to not know exactly how bad off they
are.

The question is whether management considers this worth spending resources,
either money or manpower, to fix.

S