Anycast applicable to Radius Server Farm ?

Hi,

we have a radius server farm. there is a L4 switch
installed behind all servers. Incoming AAA packets are
switched by L4 switch to different servers.

In previous days we met a couple of problems with L4
switch which degraded our service a lot. Could it be
possible to implement IPv4 Anycast architecture for
radius server farm? Could it be any problem with AAA
procedure?

Any advice will be highly appreciated

Joe

Date: Mon, 8 May 2006 12:07:13 +0800 (CST)
From: Joe Shen

Could it be possible to implement IPv4 Anycast architecture for
radius server farm?

Yes.

Could it be any problem with AAA procedure?

UDP is anycast-friendly. Your biggest problems are likely to be
authentication database replication/synchronization and merging
accounting records... i.e., nothing really different from standard
RADIUS deployments.

Try ECMP if you want load balancing without the L4-ish gear. This
implies routers between the NASes and RADIUS boxen, but you _did_
specify anycast. :wink:

Load balancing is trickier when RADIUS servers and NASes live on the
same network segment. You'll need something a la Windows Advanced
Server or distributed 802.3ad. I know of no turn-key implementation of
the latter; I played around with it a few years back, but the project
was shelved before completion. Several modern *ix flavors include
rudimentary 802.3ad support, so implementation should be easier these
days.

(Note that MAC-based technology strays away from "anycast" in the sense
that it operates at L2 instead of L3.)

HTH,
Eddy

Hello Joe -

Can you indicate in more detail what the problems were with the L4 switch?

If the loadbalancing is done by source/destination IP address pairs, then you can have problems when a target goes down, as all of the source/destination IP address pairs will get switched to another target which then gets into difficulty and you end up with a cascading failure. It is generally preferable to have the loadbalancing done on a weighted per-packet basis, ideally distributed according to round-trip times.

Also note that you can only do per-packet loadbalancing with simple RADIUS, things like EAP that require multiple exchanges of RADIUS requests typically require state to be maintained in the single RADIUS server that is processing the entire EAP sequence.

regards

Hugh

Hi,

we have a radius server farm. there is a L4 switch
installed behind all servers. Incoming AAA packets are
switched by L4 switch to different servers.

In previous days we met a couple of problems with L4
switch which degraded our service a lot. Could it be
possible to implement IPv4 Anycast architecture for
radius server farm? Could it be any problem with AAA
procedure?

Any advice will be highly appreciated

Joe

__________________________________
Do you Yahoo!?
Yahoo! Movies - Search movie info and celeb profiles and photos.
http://sg.movies.yahoo.com/

NB:

Have you read the reference manual ("doc/ref.html")?
Have you searched the mailing list archive (www.open.com.au/archives/radiator)?
Have you had a quick look on Google (www.google.com)?
Have you included a copy of your configuration file (no secrets),
together with a trace 4 debug showing what is happening?

Can you indicate in more detail what the problems
were with the L4
switch?

We seperate our Radius servers into two farms, each
farm has a L4 switch in front. To our understanding,
radius authentication info. and accounting info. of a
PPPoE session should be processed by the same Radius
server. So, although L4 switch provides a single IP
for BRAS configuration each BRAS is specified a real
server IP in L4 switch. So, there comes the problem:

1) Load is not balanced automatically but by human
estimation; there is server whose load is twice of
some other server.

2) L4 switch becomes bottleneck of service
availability. In past years, L4 switch caused several
times of service failure. Just last friday, L4 switch
does not repond to any network packets while its
ethernet interface seems OK.

3) As L4 switch is the only entrance to a single
server farm, DoS attack or some other kind of software
bug will surely degrade security level. While, a farm
using ECMP rely on server groups to resist DoS attack.

4) Maintence is a little bit costy. Any maintence ,
no matter on radius server or on L4 switch, need a
scheduled time window.

5) Service protection is hard ( as you mentioned as
'cascade' one). As there are two server farms, if one
farm failed it takes ten or more minute to migrate
those Radius traffic to the other farm. This is
unacceptable.

So, we consider to find a more scable, reliable,
secure and automatic multi-farm radius solution.

Joe

> Could it be any problem with AAA procedure?

UDP is anycast-friendly. Your biggest problems are
likely to be
authentication database replication/synchronization
and merging
accounting records... i.e., nothing really different
from standard
RADIUS deployments.

What I met problem to understand is,

1) Is that required to route traffic from a specific
BRAS to exact one server if DB behind radius server is
syncronized periodically

2) There is two Farm, each has several servers. As
number of paths supported by cisco/Juniper router is
limited ( <= 8 or 16), we could not mix those server
into one farm. is there any way to balance load
between two or more farms automatically?

Load balancing is trickier when RADIUS servers and
NASes live on the
same network segment. You'll need something a la
Windows Advanced
Server or distributed 802.3ad. I know of no
turn-key implementation of
the latter;

Do you mean aggregate interfaces of several servers
into one 802.3ad trunk? I think even NASes and radius
live on the same ethernet, OSPF/IS-IS could establish
equal cost paths.

thanks

Joe

Joe Shen wrote:

Can you indicate in more detail what the problems
were with the L4 switch?

We seperate our Radius servers into two farms, each
farm has a L4 switch in front. To our understanding,
radius authentication info. and accounting info. of a
PPPoE session should be processed by the same Radius
server.

I dont think its true. If the auth radius server fails to respond, authentication and accounting will then go to the next configured server

So, although L4 switch provides a single IP
for BRAS configuration each BRAS is specified a real
server IP in L4 switch. So, there comes the problem:

1) Load is not balanced automatically but by human
estimation; there is server whose load is twice of
some other server.

See if you can extract load from the radius server using snmp or something and make your l4 switch utlilize that.

2) L4 switch becomes bottleneck of service
availability. In past years, L4 switch caused several
times of service failure. Just last friday, L4 switch
does not repond to any network packets while its
ethernet interface seems OK.

Add a couple of the actual servers IPs to the aaa servers the NAS's use

3) As L4 switch is the only entrance to a single
server farm, DoS attack or some other kind of software
bug will surely degrade security level. While, a farm
using ECMP rely on server groups to resist DoS attack.

Your firewalls should be protecting your radius servers from DoS -- unless you really expect the world to communicate with them. Spoofed sources however could be hard to protect against.

4) Maintence is a little bit costy. Any maintence ,
no matter on radius server or on L4 switch, need a
scheduled time window.

5) Service protection is hard ( as you mentioned as
'cascade' one). As there are two server farms, if one
farm failed it takes ten or more minute to migrate
those Radius traffic to the other farm. This is
unacceptable.

Let the nas do it. they fail over much faster than that.

Whatever you choose, try to combine the ability of the nas to failover radius servers into your redundancy plan.

Hello Joe -

Can you indicate in more detail what the problems
were with the L4
switch?

We seperate our Radius servers into two farms, each
farm has a L4 switch in front. To our understanding,
radius authentication info. and accounting info. of a
PPPoE session should be processed by the same Radius
server. So, although L4 switch provides a single IP
for BRAS configuration each BRAS is specified a real
server IP in L4 switch. So, there comes the problem:

Normal RADIUS does not require authentication and accounting for a single session to go to the same RADIUS server.

1) Load is not balanced automatically but by human
estimation; there is server whose load is twice of
some other server.

You should use a loadbalancer that can distribute RADIUS requests on a per-request basis according to round-trip times which will be a reasonable indication of server load. Ie. the fastest round-trip time will be from the least-loaded server.

2) L4 switch becomes bottleneck of service
availability. In past years, L4 switch caused several
times of service failure. Just last friday, L4 switch
does not repond to any network packets while its
ethernet interface seems OK.

I suggest you find a better loadbalancer. Contact me off list if you need suggestions.

3) As L4 switch is the only entrance to a single
server farm, DoS attack or some other kind of software
bug will surely degrade security level. While, a farm
using ECMP rely on server groups to resist DoS attack.

You should design your system with two loadbalancers, and configure your NAS equipment to use one as primary and the other as secondary. You should configure half of your NAS equipment to use loadbalancer A as primary, and the other half of your NAS equipment to use loadbalancer B as primary (and the converse for secondary).

4) Maintence is a little bit costy. Any maintence ,
no matter on radius server or on L4 switch, need a
scheduled time window.

A design as above will have no single point of failure.

5) Service protection is hard ( as you mentioned as
'cascade' one). As there are two server farms, if one
farm failed it takes ten or more minute to migrate
those Radius traffic to the other farm. This is
unacceptable.

If you set your RADIUS timeouts and retries on the NAS equipment sensibly, depending on what end-user devices are being used (PC modems, DSL modems, GPRS WAP phones, mail servers, web servers ...) any outage should have almost imperceptible impact.

So, we consider to find a more scable, reliable,
secure and automatic multi-farm radius solution.

hope that helps

regards

Hugh

Joe

If the loadbalancing is done by source/destination
IP address pairs,
then you can have problems when a target goes down,
as all of the
source/destination IP address pairs will get
switched to another
target which then gets into difficulty and you end
up with a
cascading failure. It is generally preferable to
have the
loadbalancing done on a weighted per-packet basis,
ideally
distributed according to round-trip times.

Also note that you can only do per-packet
loadbalancing with simple
RADIUS, things like EAP that require multiple
exchanges of RADIUS
requests typically require state to be maintained in
the single
RADIUS server that is processing the entire EAP
sequence.

regards

Hugh

Hi,

we have a radius server farm. there is a L4 switch
installed behind all servers. Incoming AAA packets

are

switched by L4 switch to different servers.

In previous days we met a couple of problems with

L4

switch which degraded our service a lot. Could it

be

possible to implement IPv4 Anycast architecture

for

radius server farm? Could it be any problem with

AAA

procedure?

Any advice will be highly appreciated

Joe

__________________________________
Do you Yahoo!?
Yahoo! Movies - Search movie info and celeb

profiles and photos.

http://sg.movies.yahoo.com/

NB:

Have you read the reference manual ("doc/ref.html")?
Have you searched the mailing list archive
(www.open.com.au/archives/
radiator)?
Have you had a quick look on Google
(www.google.com)?
Have you included a copy of your configuration file
(no secrets),
together with a trace 4 debug showing what is
happening?

--
Radiator: the most portable, flexible and
configurable RADIUS server
anywhere. Available on *NIX, *BSD, Windows, MacOS X.
-
Nets: internetwork inventory and management -
graphical, extensible,
flexible with hardware, software, platform and
database independence.
-
CATool: Private Certificate Authority for Unix and
Unix-like systems.

__________________________________
Do you Yahoo!?
Yahoo! Movies - Search movie info and celeb profiles and photos.
http://sg.movies.yahoo.com/

NB:

Have you read the reference manual ("doc/ref.html")?
Have you searched the mailing list archive (www.open.com.au/archives/radiator)?
Have you had a quick look on Google (www.google.com)?
Have you included a copy of your configuration file (no secrets),
together with a trace 4 debug showing what is happening?