The free DHCP solution, ISC, seems to be having scaling issues (i.e.
handling only about 200 DHCPDISCOVER and 20 DHCPRENEW requests), and I
was wondering if anyone had any open source suggestions of solutions
that could scale much better?
(Ideally, I could find a free version of a solution like Nominum, but
I know that's asking for much.)
Where do you get that ISC DHCPD only handles 200 DHCPDISCOVER / 20 DHCPRENEW
requests? That doesn't sound right. So I wonder what are you measuring?
Is this a number of answers per second your implementation of ISC
DHCPD is providing successfully?
There are architectural facts about any environment besides what
software is performing the DHCP task.
How many I/Os + fsync()'s per second can this DHCP server handle
that does only 20 renews?
We were seeing similar issues with low leases, moved the dhcpd.leases file
to a ramdisk and went from ~200 leases per second to something like 8,000
leases per second.
The free DHCP solution, ISC, seems to be having scaling issues (i.e.
handling only about 200 DHCPDISCOVER and 20 DHCPRENEW requests), and I
was wondering if anyone had any open source suggestions of solutions
that could scale much better?
You are doing something wrong:
* turn off ping-check
* use proper raid controller with battery backup (because isc dhcpd does fsync every time it writes to dhcpd.leases)
* ...
* profit
Yes, blame RFC2131's requirement that a DHCP server is to ensure that any
lease is committed to persistent storage, strictly before a DHCP
server is allowed to
send the response to the request; a fully compliant DHCP server with
sufficient traffic
is bound by the disk I/O rate of underlying storage backing its database.
I do not recommend use of a RAMDISK; it's safer to bend the rule than break it
entirely; a safer way is probably to use a storage system on a battery-backed
NVRAM cache that you configure to ignore SYNC() and lie to the DHCP server
application, allowing the storage system to aggregate the I/O.
Of course, committing to a RAMDISK tricks the DHCP server software.
The danger is that if your DHCP server suffers an untimely reboot, you
will have no transactionally safe record of the leases issued, when the
replacement comes up, or the DHCP server completes its reboot cycle.
As a result, you can generate conflicting IP address assignments, unless you:
(a) Have an extremely short max lease duration (which can increase
DHCP server load), or
(b) Have a policy of pinging before assigning an IP, which limits DHCP server
performance and is not fool proof.
We've recently setup ISC DHCPd with failover for lease information, and LDAP as a configuration source (mostly because of our need for dynamically adding dhcp reservations for cable modems, etc) -- we don't have any performance issues thus far, but I'd imagine in a failover environment, it might be safe to consider a ramdisk for leases. Obvoiusly breaks RFC2131, but...
We've recently setup ISC DHCPd with failover for lease information, and
LDAP as a configuration source (mostly because of our need for
dynamically adding dhcp reservations for cable modems, etc) -- we don't
have any performance issues thus far, but I'd imagine in a failover
environment, it might be safe to consider a ramdisk for leases.
Obvoiusly breaks RFC2131, but...
Use an ssd, all the cool kids with monolithic databases and tpc-c style workloads are doing it and since your storage requirements are negligible it ought to be fairly cheap.
Bandwidth Sustained sequential read: up to 250 MB/s
Sustained sequential write: up to 170 MB/s
Read latency 75 microseconds I/O Per Second (IOPS)
Random 4KB Reads: >35,000 IOPS
Random 4KB Writes: >3,300 IOP
Good luck buying X25-Es; they're out of production and all gone from
supply chain. Replacement 710 and 720 models are ETA in late August
at the moment.
Micron has some large-cap SLC drives in the chain for
September/October/ish timeframes.
Ramdisk with rsync or rdiffbackup to spinning storage will do just fine.
I think a lot of this depends on the target audience of your server.
It sounds like he's in a commercial WAN environment, which of course is what
those rules were written for. But I can't tell you how many service calls I
have to take because of address conflicts on home LANs behind consumer
routers... which don't generally cache the assignments at all, IME.
What I hate is my cable provider re-numbering without winding down
the lease time first. Waking up in the morning to a lease that say
its still got 18+ hours to go and no net shouldn't happen. If the
DHCP server has said the address is good for 24 hours it should be
good for 24 hours.
I know first level support will say to reboot, which forces a renew
which fails, but one shouldn't have to reboot for a renumber event.
Run the old and new spaces in parallel for 24 hours.
SSDs can be a good alternative these days as well. Some of them have gotten
to be quite fast. Sure, you'll have to replace them more often than spinning media,
but, the write times can be quite a bit better.
SSDs can be a good alternative these days as well. Some of them have gotten
to be quite fast. Sure, you'll have to replace them more often than spinning media,
but,
Actually the the scale of writes associated with this application is unlikely to significantly impact the service life of an SLC nand ssd with a solid block shadowing/wear leveling implementation. back in 2007 I was convinced that we could improve on the reliability of our network appliances with industrial 2.5" sata and enterprise sas disks, and the situation has only improved since.
If you're just fighting IOPS, another compromise might be using a ramdisk,
and then committing that data to storage every x seconds.
Yes, you might be breaking the RFC, but depending on what it's used for, you
could probably commit every 3-5 seconds without much penalty and limit your
data loss potential on server failure.
Or as others have said... some sort of SSD/cache solution.