SYSTEM WARNING: 'date_default_timezone_get(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone.' in '/usr/share/mantis/www/core.php' line 264

0001441: make DHCP servers HA-ready - MantisBT
MantisBT - Endian Firewall
View Issue Details
0001441Endian FirewallHigh Availabilitypublic2008-11-10 12:042009-10-27 12:01
peter-endian 
peter-endian 
normalmajorhave not tried
closedfixed 
2.2-rc3 
2.3 
0001441: make DHCP servers HA-ready
when the slave takes over the masters job, and there is a dhcp server on master, the slave does not know anything about the master's leases and re-assigns ip addresses to different hosts which lead in ip conflicts.

Solution:
Make the dhcp server HA-ready as described here:
http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/com.ibm.cluster.csm16010.admin.doc/am7ad_dhcpfaillnx.html [^]
heavy, needsfix
Issue History
2008-11-10 12:04peter-endianNew Issue
2008-11-10 12:04peter-endianAssigned To => peter-endian
2008-11-10 12:04peter-endianTag Attached: needsfix
2008-11-12 11:56peter-endianTag Attached: heavy
2008-11-20 21:10peter-endianNote Added: 0001817
2008-11-21 12:40peter-endianNote Added: 0001820
2008-11-24 11:07peter-endianNote Added: 0001824
2008-11-24 11:31peter-endianNote Added: 0001825
2008-12-17 12:12peter-endianNote Added: 0001884
2008-12-17 21:43peter-endianNote Added: 0001891
2008-12-23 14:00peter-endianNote Added: 0001909
2008-12-23 21:31peter-endianStatusnew => resolved
2008-12-23 21:31peter-endianFixed in Version => 2.3
2008-12-23 21:31peter-endianResolutionopen => fixed
2009-10-27 12:01peter-endianStatusresolved => closed

Notes
(0001817)
peter-endian   
2008-11-20 21:10   
There is a possibility to solve the problem with the link above, however it is not so easy as it looks like.
There are 3 main problems:

1) it is only possible to have 2 DHCP servers in failover, not more
2) the vrrp backup / dhcp secondary needs to define a dhcp subnet
   for the management network, otherwise the daemon will not start
3) the DHCP failover protocol in reality is a DHCP loadbalancer. In reality
   it is planned that the secondary answers also to some requests. In our case
   this is not possible *as it is now* because the secondary is not in the
   master's subnet's, since the VIP is on the master, and we assign no RIP
   of the virtual subnet, but only from the management network.
4) The DHCP server is operational only after the first handshake of both
   the primary and the secondary. There is no possibility to have DHCP running
   when only the master is configured and the slave is not installed yet.
   After one single contact of both, after the initial synchronization of the
   leases, both will work also when the other is not reachable.
5) Both machines need to have exactly the same time

Possible solutions:
1) we need to define that we can have only 2 DHCP servers, a primary and a
   secondary. On clusters with more nodes, the other nodes will not run a
   DHCP secondary. It's simply not possible and to complicated to find a
   workaround and in the end it makes not much sense to have more than 2.

4) We need to communicate this. I think it's ok that it's necessary to have
   configured both nodes on a HA installation before DHCP can become
   operational.

5) This only needs to be written within the manual. It's nothing very surprising

2) 2 solutions:
   a) patch away that check
   b) restart the dhcp server always when a master/backup transition will be
      done, adding/removing that subnet definition

3) It would be great if both servers could answer. I managed it however also
   to have one server not reachable by the client. The primary will answer
   after a little (2s) timeout. Warning messages appear and it is slower, but at
   the end it works.

   Found out another possibility. *If it works* that would be way better:
   When the dhcp servers will start at a time when the machine holds the VIP
   the DHCP server is able to respond to the requests, also if the VIP will
   be removed from the device.
   That means, also a vrrp backup would be able to answer and the loadbalancer
   would work as it is planned.

   But in *this* configuration it would be a nasty hack which surely causes
   problems. I would patch the server in order to add a configuration
   possibility which allows the server to listen on an interface and answer
   to requets of subnets which the machine don't actually hold. Some sort of
   virtual DHCP.
   This however needs also solution 2b

More tests needed...
(0001820)
peter-endian   
2008-11-21 12:40   
2) can be solved using shared-network

for example:

shared-network "lala" {
subnet 192.168.21.0 netmask 255.255.255.0 {
        pool {

                        failover peer "slave";
                deny dynamic bootp clients;



        range 192.168.21.10 192.168.21.10;
        option subnet-mask 255.255.255.0;
        option domain-name "localdomain";
        option routers 192.168.21.1;
        option wpad "http://192.168.21.1/proxy.pac"; [^]
        option domain-name-servers 192.168.21.1;
        default-lease-time 3600;
        max-lease-time 7200;
        }
}
subnet 192.168.178.0 netmask 255.255.255.0 {
deny unknown-clients;
}
}
(0001824)
peter-endian   
2008-11-24 11:07   
slave need to synchronize ntp with master, otherwise the time gap is to huge and the dhcp failover will refuse the connection between the 2 machines
(0001825)
peter-endian   
2008-11-24 11:31   
3) definitively works

patching dhcpd in order to be able to configure an interface and a source-ip per subnet definition would solve the problem.
every subnet definition holds already a field "interface" and a field "interface-address", which store exactly those information. On startup that information will be read out from the NIC's. This behaviour should be changed in order to manually set that information regardless of the system configuration.
(0001884)
peter-endian   
2008-12-17 12:12   
the "interface" does already exist,. it is only undocumented.
by adding the interface statement in every subnet definition, making it point to the respective zone's bridge interface, the slave dhcp is listening also on the unconfigured zone, but responds with ip address 0.0.0.0, since that interface has no ip address.

by setting server-identifier to the zones VIP, the dhcp server answers with that ip address instead of 0.0.0.0.

so, no patch needed, just configure it correctly.

example:

failover peer "slave" {
        secondary;
        address 192.168.177.2;
        port 519;
        peer port 519;
        peer address 192.168.177.1;
        max-response-delay 3;
        max-unacked-updates 3;
}

subnet 192.168.22.0 netmask 255.255.255.0 {
        interface br2;
        server-identifier 192.168.22.1;
                pool {
                    failover peer "slave";
                    deny dynamic bootp clients;
                    range 192.168.22.10 192.168.22.254;
                    option subnet-mask 255.255.255.0;
                    option domain-name "localdomain";
                    option routers 192.168.22.1;
                    option wpad "http://192.168.22.1/proxy.pac"; [^]
                    option domain-name-servers 192.168.22.1;
                    default-lease-time 30;
                    max-lease-time 72;
        }
}

shared-network green {
        subnet 192.168.20.0 netmask 255.255.255.0 {
                pool {
                        failover peer "slave";
                        deny dynamic bootp clients;
                        range 192.168.20.10 192.168.20.254;
                        option subnet-mask 255.255.255.0;
                        option domain-name "localdomain";
                        option routers 192.168.20.1;
                        option wpad "http://192.168.20.1/proxy.pac"; [^]
                        option domain-name-servers 192.168.20.1;
                        default-lease-time 3600;
                        max-lease-time 7200;
                }
        }
        subnet 192.168.177.0 netmask 255.255.255.0 {
                deny unknown-clients;
        }
}
(0001891)
peter-endian   
2008-12-17 21:43   
adapted dhcp.conf template

- restartscript needs HA_* values and DHCP_CLIENT
- shouldStart() needs to stop dhcp if the node is >= 2
- slave ntp should maybe sync from master
(0001909)
peter-endian   
2008-12-23 14:00   
master should redirect ntp connections of slave to the masters ntp server