I'm in the process of deploying an anycast DNS service internally. We're
on a pretty provider-like network, where we run MPLS to provide several
network overlays for different services. iBGP is used to distribute
routing information, and ISIS is used as IGP. In one of the VRFen we
would like to place name servers using a common IP address. To get speedy
network updates when outages occur we'll be using OSPF on the name servers
to inject the routes into the IGP. The P/E router then redistributes the
route into the right VRF. (the name server OSPF process is not aware of
MPLS; it just talks to a router.)
So far so good. This works.
Trouble is, we find that (untweaked) cost and metric are such that all
nodes are equal. The last resort (peer router ID) gets invoked and all
traffic goes to one single instance. Of course, when that instance falls
off the net recalculation takes place and another node steps in, but
I'd like true path lengths (IGP hop count) to influence more than iBGP
(route-reflector-style) selection.
Any clues?
Oh, all-cisco, all ASR1000 series. All links GE. ~90 routers in IGP.
Trouble is, we find that (untweaked) cost and metric are such that all
nodes are equal. The last resort (peer router ID) gets invoked and all
traffic goes to one single instance. Of course, when that instance falls
off the net recalculation takes place and another node steps in, but
I'd like true path lengths (IGP hop count) to influence more than iBGP
(route-reflector-style) selection.
Any clues?
Oh, all-cisco, all ASR1000 series. All links GE. ~90 routers in IGP.
Since you mention route-reflector route selection - are you already using per-VRF, per-PE route distinguishers for that L3VPN instance?
If not, I'd recommend doing so - this will cause your RR to see all paths as unique routes, distributing all of them (instead just the best one from the RR perspective) to RR clients. As result all PEs will always have all paths for this particular prefix (and can then take the best path decision based on local IGP metric to the respective BGP next hops).
Doing that can also significantly improve reconvergence times for certain failure scenarios (e.g. ingress PE failure), as PEs can start using alternative paths (already available in local BGP RIB) as soon as the IGP nexthop for the failed PE is invalidated and do not need to wait for BGP RR reconvergence.
Since you mention route-reflector route selection - are you already
using per-VRF, per-PE route distinguishers for that L3VPN instance?
Problem solved - what I did not tell (shame on me) was that there are two
islands of IGP (growing pains...) redistributing to each other... The
metric in that redistribution was too low, resulting in artificially
"cheap" paths to the wrong places.
Thanks all who made me think a second round and solve this.