I have a very peculiar situation here that i seem to have difficulty
explaining in such a way for people to understand. I just got off the phone
to the Juniper Devs after about 4 hours with no result. They understand the
problem but can't seem to think of a working solution (last solution led to
the primary firewall hard crashing and then failing over after a commit
(which also makes me wonder what made the primary crash and not the
secondary)). I am wondering if there is anyone "creative" on the list who
has encountered and worked around this problem before...
Here goes *sigh*
ISP1 - 1.1.1.0/24
ISP2 - 2.2.2.0/24
ISP1 is the default gateway, ISP2 is a backup provider but which is always
active. Client comes in on ISP1's link, traffic goes back out on ISP1s link.
Client comes in on ISP2's link (non default gateway) but for some reason,
the packets seem to be going back out through the link for ISP1.
So look at it this way:
SYN comes from client at 3.3.3.3 aimed at 2.2.2.2, packet is received by the
firewall. Firewall sends a SYN/ACK but the firewall at 1.1.1.1 sees it in
TCPDump, the firewall at 2.2.2.1 never sees it.
Here's a log snippet (I can send you more if you need:
You will see that the orig and out zones are the same zone, however this was
a last ditch effort (putting both interfaces into one zone, effectively
creating a swamp).
Our current (non-preferred) solution is to put match-all rules on our
Catalyst 6513s and put both providers into a swamp and the switch will then
intercept the packets if they are destined for the wrong interface and send
them out the right one based on a bunch of boolean.
We've tried setting up a virtual instance on the offending interface and a
firewall filter, but this had little to no effect (at one point it stopped
passing the packets to the end machine altogether). We're using small SRX
650ies. Why do we want to do it this way you ask? In the event of a BGP
session failure we need to be able to use our statically routed IPs and rely
on someone else.
With the default gateway, that is the behaviour I would expect--I don't
see how the router could do otherwise. (This assumes that source
routing is not being used.)
Not sure if I correctly undestand you but default route its the route
that the packet must follow if it do not have a specific route for the
destination, so, if the next-hop for the source IP (3.3.3.3) is not in
the route table then the packet will follow the default route (ISP1).
So, this behavior is correct. Just for troubleshooting purpose install
a static route like:
set routing-option static route 3.3.3.0/24 next-hop
<the-correct-gateway-address> (ISP2)
If this works fine then verify the route table, are you using BGP to
receive such routing info? If you are not filtering the update maybe
the sender is.
Not sure if I correctly undestand you but default route its the route
that the packet must follow if it do not have a specific route for the
destination, so, if the next-hop for the source IP (3.3.3.3) is not in
the route table then the packet will follow the default route (ISP1).
So, this behavior will be correct if next-hop for 3.3.3.0/24 is not
installed. Just for troubleshooting purpose install a static route
like:
set routing-options static route 3.3.3.0/24 next-hop
<the-correct-gateway-address> (ISP2)
If this works fine then verify the route table, are you using BGP to
receive such routing info? If you are not filtering the update maybe
the sender is. Verify the received routes using the "show route
protocol bgp receive-protocol bgp x.x.x.x" (x.x.x.x is the bgp
neighbor)
Wow, very fast responses, Thanks Larry Sheldon and Ricardo Tavares!
Not sure if I correctly undestand you but default route its the route
that the packet must follow if it do not have a specific route for the
destination, so, if the next-hop for the source IP (3.3.3.3) is not in
the route table then the packet will follow the default route (ISP1).
Yes I believe that would be the default if the session was initiated on the
inside, but if it comes from outside on a particular interface which is not
the default route, why would the router then send the packet out another
interface? Should the device not route session-based traffic according to
where it originated?
So, this behavior will be correct if next-hop for 3.3.3.0/24 is not
installed. Just for troubleshooting purpose install a static route
like:
set routing-options static route 3.3.3.0/24 next-hop
<the-correct-gateway-address> (ISP2)
Yes sir, this works, but when you change the static route to point
0.0.0.0/0to the next hop on the virtual router for the particular
interface (ISP2) it
starts going over the interface for ISP1 again. I also set
qualified-next-hop for ISP2 in the main routing table to no avail.
If this works fine then verify the route table, are you using BGP to
receive such routing info? If you are not filtering the update maybe
the sender is. Verify the received routes using the "show route
protocol bgp receive-protocol bgp x.x.x.x" (x.x.x.x is the bgp
neighbor)
Yes sir, I have also gone to the extent of deactivating BGP and using only
static routes.
Wow, very fast responses, Thanks Larry Sheldon and Ricardo Tavares!
Not sure if I correctly undestand you but default route its the route
that the packet must follow if it do not have a specific route for the
destination, so, if the next-hop for the source IP (3.3.3.3) is not in
the route table then the packet will follow the default route (ISP1).
Yes I believe that would be the default if the session was initiated on the
inside, but if it comes from outside on a particular interface which is not
the default route, why would the router then send the packet out another
interface? Should the device not route session-based traffic according to
where it originated?
nope, forwarding decisions are made on the basis of the FIB.
if stateful filtering policy and the configuration of the forwarding plane are not congruent then packet will be out of state and likely discarded by your policy.
f the route announce is coming from the BGP neighbor you need to
verify if the next-hop indicated for this route is itself reached by
the router, if by recursion the router do not resolve how to go to the
next-hop then the announced route will be not available. THe bgp
sender must set the next-hop with a reachable address, sometimes this
is achieved by the sender using the next-hop-self in the export
policy, but it is possible other situations where the next-hop is
unreachable.
If the sender is using a specific address for all the next-hops for
all the announced routes you will need just a static route pointing to the
gateway for his next-hop. If the BGP session for some reasons goes
down then the default route will apply and the redundancy through ISP1
will work fine.
I'm close to the edge of what I know (or remember--i've been inactive
for a while) but when a packet arrives on an interface, the routing
engine has to decide where to poke it bqased on what is in the
packet--there is no information as to where it came from, or to what it
is a response. If it isn't in the IP header, it isn't available for
routing decisions. ("Policy routing" can provide additional data, as
can source routing. But one requires a human being to provide the rule,
and the other requires somebody or something else outside the router to
calculate the route. I don't think anybody much allows source routing
anymore.)
Wouldn't simply configure source NAT on firewall 2.2.2.1 resolve the
problem gracefully? when connection requests coming in through ISP2,
source NAT the incoming traffic's source IP with IPs on firewall
inside interface, that way when server replies, firewall 2.2.2.1 will
guarantee to receive the ACK because ACK traffic won't follow default
routing to ISP1.
That would seem to be a good resolution (Firewall/NAT) . Aside from that, perhaps a load
balancer for each segment might help?
One question that comes to mind is why (if ISP2 is a backup) would valid traffic
be using that route?
Unless maybe your loadbalancing using a DNS round robin perhaps to hit the second IP space or loadbalancing
the 2 ISPs?
Another "maybe" resolve would be to multi-home the application to that segment, i.e. 2 nics on the
server, one on the primary network, the other on the secondary with appropriate Def.GWs, of course
since there is little information on the infrastructure here this may not be possible.
I suppose if one were to get really detailed about this, you could look into reverse routing using MAC, but
theoretically that would/could open a whole other set of issues.
As others have pointed out typically interfaces are not kept track of in
state tables. Having said that, I've worked in the past with the ScreenOS
based SSG platforms that do this. So if you're coming from an SSG
background this makes sense.
These devices seem to keep track of source interface in their state tables.
For example I've worked on a one-arm'ed Load Balancer with no Source NAT
such that one would typically require some policy based routing to get the
traffic back to the LB, to be have the Destination NAT handled. However,
with a Juniper SSG, as the router, it's state tables kept track of the
interfaces and routed traffic correctly without any policy based routing
required. When I took over administration of that environment I spent some
time trying to figure out how the routing worked since there was no
configuration such as policy based routes that would make sense.
Having said that, If the JunOS based SRX platform does not do session
tracking in the same was as the SSG platform it would seem that the most
reasonable solution would be to NAT the traffic as has already been pointed
out.
ISP1 is the default gateway, ISP2 is a backup provider but which is always
active. Client comes in on ISP1's link, traffic goes back out on ISP1s link.
Client comes in on ISP2's link (non default gateway) but for some reason,
the packets seem to be going back out through the link for ISP1.
Your BGP config with ISP2 is probably unideal. This has lead to packets coming in via ISP2 despite the fact you prefer to use ISP1. Often, people only do AS Prepend to alter traffic patterns. However, if a packet finds it's way to your directly connected ISP, an AS Prepend is not enough. It will be sent to you based on local preference. Setting appropriate communities with your ISP can often override this preference so they will send the packet towards your ISP1 instead of direct.
Yes I believe that would be the default if the session was initiated on the
inside, but if it comes from outside on a particular interface which is not
the default route, why would the router then send the packet out another
interface? Should the device not route session-based traffic according to
where it originated?
To my knowledge, routers don't generally route based on session. They maintain flow information is cases, but you learn quickly that it's a one way record, and the corresponding flows may have a different path.
There are exceptions, and perhaps some Junipers even support more oddball session based routing, though my m120 and cisco don't seem to.
When you set-up your virtual-instance, was your ISP2 interface a member of that instance? I have a working setup that ran on a J-series running 9.6 something.
This is a Juniper guide I used but it was a little bit different and didn't work for me.
--Routing instance:
routing-instances {
DSL {
instance-type virtual-router;
interface ge-0/0/1.0; // <-- Most important part - ISP2 Interface must be a member to get correct incoming context.
routing-options {
static {
route 0.0.0.0/0 next-hop 172.24.1.1; // ISP2 next hop
}
}
}
}
--Firewall filter:
firewall {
filter DSL {
term DSL {
from {
address {
192.0.1.210/32; // This is the address that will go into the virtual router (ISP2 addresses should go here)
}
}
then {
count source-route;
log;
routing-instance DSL;
}
}
term default { // Everything else uses default routing table.
then {
count defualt-counter;
log;
accept;
}
}
}
}
Jensen Tyler
Network Engineer
Fiberutilities Group, LLC
Yes sir that's what I thought too. The packets are being NATted (and I also
used a bit of DNAT for port forwarding to test the theory) but the result is
the same.
Interesting questions, Answers are below your questions:
That would seem to be a good resolution (Firewall/NAT) . Aside from that,
perhaps a load
balancer for each segment might help?
Possibly but this will cost money to implement and there is no guarantee
that it will work.
One question that comes to mind is why (if ISP2 is a backup) would valid
traffic
be using that route?
Several reasons, but the two main reasons are:
1. Some clients might find one path faster than another (e.g. vpn.example.com vs vpn2.example.com). If they are on the same provider then
chances are that they will have better remote access that way.
2. If BGP fails we want all of our statically routed IP addresses to work
too, this is our solution to be able to guarantee connectivity to payment
processors (so quite important to ensure that we can make money)
Unless maybe your loadbalancing using a DNS round robin perhaps to hit the
second IP space or loadbalancing
the 2 ISPs?
No round robin... This is the last resort if BGP fails
Another "maybe" resolve would be to multi-home the application to that
segment, i.e. 2 nics on the
server, one on the primary network, the other on the secondary with
appropriate Def.GWs, of course
since there is little information on the infrastructure here this may not
be possible.
I suppose if one were to get really detailed about this, you could look
into reverse routing using MAC, but
theoretically that would/could open a whole other set of issues.
I can go extremely detailed offlist but there would be far too much
information to post to NANOG otherwise, and it would probably just annoy
people and result in flaming more than anything.
As others have pointed out typically interfaces are not kept track of in
state tables. Having said that, I've worked in the past with the ScreenOS
based SSG platforms that do this. So if you're coming from an SSG
background this makes sense.
Yes sir I have used SSG for several years but mainly used BSD for the last
decade and most recently OpenBSD. There is an easy fix for this on PF for
OpenBSD and that is to tag the packets from each provider (as in not using
802.1q but a specific function in PF). This works extremely well