largest OSPF core

lorddoskias · September 2, 2010, 12:20pm

I'm just curious - what is the largest OSPF core (in terms of number of routers) out there?

Nick_Hilliard3 · September 2, 2010, 3:11pm

You don't expect anyone to actually admit to something like this?

Nick

Deepak_Jain · September 2, 2010, 6:12pm

Subject: Re: largest OSPF core

> I'm just curious - what is the largest OSPF core (in terms of number
of
> routers) out there?

You don't expect anyone to actually admit to something like this?

For giggles:

Network World April 9, 1990 (page 59):

"There is no practical limit to the number of interconnected networks OSPF and Dual Intermediate System-to-Intermediate System can support"...

"From the onset, OSPF was intended to be short-term, for IP-only"..

"Dual routing is intended to be more of a long-term solution because there will be very few pure OSI or TCP/IP routing environments in the future."

Valdis_Kletnieks · September 2, 2010, 6:23pm

Well, they were half-right.

Leo_Bicknell1 · September 2, 2010, 6:37pm

In a message written on Thu, Sep 02, 2010 at 03:20:05PM +0300, lorddoskias wrote:

I'm just curious - what is the largest OSPF core (in terms of number
of routers) out there?

I'll admit to having seen a network with over 400 devices in an
OSPF area 0, didn't design it, and in the end didn't get to work
on it.

Far as I know worked just fine though, no issues reported. How
well your IGP scales depends a lot more on what you put in it, and
how dynamic your network situation is than the protocol or number
of devices.

Alex_H_Ryu · September 2, 2010, 6:42pm

I think it is really depending on how your network topology looks like.
If you have top-down design with star topology to limit the network
connections to individual routers, it may scale well.
But if you connect every routers to each other such as full-mesh, it
will be a problem during interface flapping or something like that.

Alex

Christian_Martin · September 2, 2010, 6:55pm

In a message written on Thu, Sep 02, 2010 at 03:20:05PM +0300, lorddoskias wrote:

I'm just curious - what is the largest OSPF core (in terms of number
of routers) out there?

The stability of the topology plays a most prominent role, but it wouldn't surprise me if a OSPF network largely comprised of router LSAs (no redistribution), using today's hardware, could easily scale to 1000 nodes in an area. Some newer techniques may affect this scale in either direction (ie, subsecond hellos and fast convergence would negatively impact scale on some platforms, while using demand circuit emulation on p2p links would impact scale positively).

YMMV...

C

Owen_DeLong · September 2, 2010, 8:34pm

Subject: Re: largest OSPF core

I'm just curious - what is the largest OSPF core (in terms of number

of

routers) out there?

You don't expect anyone to actually admit to something like this?

For giggles:

Network World - Google Books

Network World April 9, 1990 (page 59):

"There is no practical limit to the number of interconnected networks OSPF and Dual Intermediate System-to-Intermediate System can support"...

"From the onset, OSPF was intended to be short-term, for IP-only"..

"Dual routing is intended to be more of a long-term solution because there will be very few pure OSI or TCP/IP routing environments in the future."

---

Technology prognosticators shouldn't try their hands in Vegas. Just saying.

With respect to these OSPF questions, how many people are running two OSPF processes on each router (v4 and v6) to support dual stack rather than migrating (or just enjoying their existing) ISIS (OSI) implementations?

You left out the option of using ospf3 to do both v4 and v6. Works on juniper and foundry at least.

Owen

Deepak_Jain · September 2, 2010, 9:32pm

.

>
> With respect to these OSPF questions, how many people are running two
OSPF processes on each router (v4 and v6) to support dual stack rather
than migrating (or just enjoying their existing) ISIS (OSI)
implementations?
>
You left out the option of using ospf3 to do both v4 and v6. Works on
juniper and foundry at least.

Owen

Thank you. Apparently Cisco supports it (or something like it) too.

Deepak

Anderson_Charles_R · September 2, 2010, 9:41pm

Seems silly to migrate your existing OSPFv2 to an extra instance of
OSPFv3, leaving 2 separate OSPFv3 instances. Why not just stick with
your existing OSPFv2 and add OSPFv3 for IPv6? Or if you want to
migrate your IPv4 IGP, go directly to IS-IS so you can have a single
link-state database, single process, etc. for both IPv4 and IPv6.

Mark_Smith1 · September 2, 2010, 9:50pm

Presuming OSPF and IS-IS SPF costs are fairly similar, the following
page from "The complete IS-IS routing protocol" (really quite a good
book, a bit of a shame that there are occasional minor errors that
better technical editing would have picked up) shows that relatively
modern (although a number of years old now) routers can perform SPF
calcs on SPF databases with 10000 routers and 25000 links in less than
a second. From that, it would seem that areas / levels are obsolete for
most networks for the purposes of reducing SPF calculation time. Still
possibly useful for route aggregation, although if BGP is carrying
nearly all your routes, that may not be that useful either.

http://books.google.com.au/books?id=NxIadsCKZxMC&lpg=PP1&dq="IS-IS"&pg=PA481#v=onepage&q&f=false

Bandy_Rush1 · September 2, 2010, 11:35pm

The stability of the topology plays a most prominent role, but it
wouldn't surprise me if a OSPF network largely comprised of router
LSAs (no redistribution), using today's hardware, could easily scale
to 1000 nodes in an area.

i believe the original poster asked about actual operating deployment,
not theory.

and, i suspect one wants to know about full mesh under real load, i.e.
topology change, which can be exciting when one gets to a network of
significant size.

randy

Christian_Martin · September 3, 2010, 1:40am

The stability of the topology plays a most prominent role, but it
wouldn't surprise me if a OSPF network largely comprised of router
LSAs (no redistribution), using today's hardware, could easily scale
to 1000 nodes in an area.

i believe the original poster asked about actual operating deployment,
not theory.

and, i suspect one wants to know about full mesh under real load, i.e.
topology change, which can be exciting when one gets to a network of
significant size.

Randy,

Fair enough. 7 years ago, I was privy to an OSPF BB of 300 or so routers supporting a BGP overlay. No NBMA, passive broadcast subnetworks, all running on systems without the capacity to offload adjacency maintenance into linecards. I'd argue that this type of network is also uninteresting from a NANOG viewership POV.

I also operated a network that supported over 70 OSPF VRF instances on a single PE. CPU loads were higher, but we didn't observe intractable workloads. And this was with a 500 route limit per VRF, with who knows what kinds of messiness running in those VRFs. (and yea there were sham links and router LSAs flying around!!)

There are many variables, and several studies have tried to capture, algorithmically and in terms of computational complexity, a formulaic approach to determining the boundaries of OSPF network scalability. Admittedly, these approaches can be very approximate in nature. But the point stands.

Stable topologies absent large, frequent, compulsively updated data can scale extremely well. Unstable topologies with lots of leaf data (20,000 type 5 LSAs, for example), don't.

The most interesting point to make, however, is how much legacy thinking in this area continues to be stranded in a rut that emerged 15 years ago. It is not uncommon to hear network folks cringe at the thought of an OSPF area exceeding 100 routers. Really? When simulations using testing tools show that properly tuned OSPF implementations (with ISPF, PRC, etc) comprised of 1000 can run full SPFs in 500 ms?

That said, my experience, as stated above, is that 300 routers is completely workable.

Cheers
Chris

Christopher_Morrow · September 3, 2010, 1:45am

In a message written on Thu, Sep 02, 2010 at 03:20:05PM +0300, lorddoskias wrote:

I'm just curious - what is the largest OSPF core (in terms of number
of routers) out there?

I'll admit to having seen a network with over 400 devices in an
OSPF area 0, didn't design it, and in the end didn't get to work
on it.

I know of a large enterprise with ~4k devices in area-0, according to
their vendor^H^H^H^H^Hdesigner that was all perfectly fine.

Far as I know worked just fine though, no issues reported. How
well your IGP scales depends a lot more on what you put in it, and
how dynamic your network situation is than the protocol or number
of devices.

I think the only reason the one I saw worked at all was it was
relatively stable. If things happened though (like say the code-red
incident in ... whenever that was) the network turned into a steaming
pile of fail.

really, not a good plan, of course as Leo says ISIS probably gets
super unhappy if a large percent of interfaces start to go
bouncey-bouncey.

-Chris

Leo_Bicknell1 · September 3, 2010, 2:02am

In a message written on Thu, Sep 02, 2010 at 09:40:39PM -0400, Christian Martin wrote:

The most interesting point to make, however, is how much legacy
thinking in this area continues to be stranded in a rut that emerged
15 years ago. It is not uncommon to hear network folks cringe at
the thought of an OSPF area exceeding 100 routers. Really? When
simulations using testing tools show that properly tuned OSPF
implementations (with ISPF, PRC, etc) comprised of 1000 can run
full SPFs in 500 ms?

I do think a lot of the thinking is out of date. I strongly agree
that all the references I know of about scaling are based on the
CPU and RAM limitations of devices made in the 1990's. Heck, a
"branch" router today probably has more CPU than a backbone device
of that era.

The larger issue though is that as an industry we are imprecise.
If you talk about when a routing protocol /fails/, that is can't
process the updates with the available CPU before the session times
out, you're probably talking a network of 250,000 routers on a low
end device. Seriously, how large does a network need to be to keep
OSPF or ISIS CPU-busy for 10-20 seconds? Huge!

Rather, we have scaling based on vauge, often unstated rules. One
vendor publishes a white paper based on devices running only the
IGP and a stated convergence time of 500ms. Another will assume
the IGP gets no more than 50% of the CPU, and must converge in
100ms.

Also, how many people have millisecond converging IGP's, but routers
with old CPU's so BGP takes 3-5 MINUTES to converge? Yes, for some
people that's good, if you have lots of internal VOIP or other
things; but if 99% of your traffic is off net it really doesn't
matter, you're waiting on BGP.

Lastly, the largest myth I see in IGP design is that you can't
redistribute connected or statics into your IGP, those go into BGP
so the IGP only has to deal with loopbacks. As far as I understand
the computational complexity of OSPF and IS-IS depends solely on
the number of links running the protocol, so having these things
in or out makes no difference in that sense. It does increase the
amount of data that needs to be transferred by the IGP's which does
slow them a bit, but with modern link speeds and CPU's it's really
a non-issue.

I'm not saying it's "smart" to redistribute connected and static,
it really does depend on your environment. However there seems to
be a lot of folks who automatically assume the network is broken
if it has such things in the IGP, and that's just silly. Plenty
of networks have that data in the IGP and deliver excellent routing
performance.

Fortunately we've gotten to the point where 95% of the networks
don't have to worry about these things, it works however you want
to do it. However for the 5% who need to care, almost none of them
have engineers who actually understand the programming behind the
protocols. How many network architects could write an OSPF
implementation or understand your boxes architecture?

Warren_Kumari · September 3, 2010, 3:22pm

I'm just curious - what is the largest OSPF core (in terms of number of
routers) out there?

You don't expect anyone to actually admit to something like this?

Of course I do -- 'tis much for your reputation to have wrangled a poorly designed, ugly network under control than to have only worked at places with smooth sailing.... I *don't* expect the owner / designers of these to come forward, rather those who inherited a pile of choss to share war stories...

I worked on a network that had >350 routers in an (non-zero) area. Now, ~350 routers in an area doesn't sound *that* impressive, but on average these devices had 6 interfaces in OSPF, and many of these links were of the form:

[router A]-- {GRE} --- [firewall]-- {GRE in IPSEC} -------[Internet]------- {GRE in IPSEC} ---[firewall]---{GRE} --- [router B]

Routers A and B would form an OSPF adjacency. Much of this was an overlay network (over the Internet) and so the firewalls would build IPSec tunnels. Of course, said firewalls would not pass OSPF, so we had to build GRE tunnels between routers A and D and run OSPF over those -- traffic would enter the router, get encapsulated in GRE and then the GRE would be encapsulated in IPSec and tossed into the void....
In other places (in the same OSPF area) we would purchase parallel T1 / E1s that we would run MLPPP over, and / or plain DS3s.
Oh, did I mention that network was primarily to support international call centers that had been outsourced to wherever was *really* cheap, and that many places with very cheap labor have very poor infrastructure? It was not uncommon to have interfaces that would bounce 5 or 10 times a day*....

W

*: And yes, we did 'ave to get up out of shoebox at twelve o'clock at night and lick road clean wit' tongue. We had two bits of cold gravel, worked twenty-four hours a day at mill for sixpence every four years, and when we got home our Dad would slice us in two wit' bread knife.

Truman_Boyes · September 10, 2010, 5:10am

I have seen (as a consultant, not operator) a production SP network that had over 800 routers in the backbone area. The LSDB was rather small as the network only carried links and loopbacks for P/PE routers. All other prefixes were in MP-BGP.

Truman