SYSTEM WARNING: 'date_default_timezone_get(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone.' in '/usr/share/mantis/www/core.php' line 264

0001441: make DHCP servers HA-ready - MantisBT Endian Bugtracker
Endian Issue Tracker





Please see now our new Bugtracker system: JIRA








View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0001441Endian FirewallHigh Availabilitypublic2008-11-10 12:042009-10-27 12:01
Reporterpeter-endian 
Assigned Topeter-endian 
PrioritynormalSeveritymajorReproducibilityhave not tried
StatusclosedResolutionfixed 
PlatformOSOS Version
Product Version2.2-rc3 
Target VersionFixed in Version2.3 
Summary0001441: make DHCP servers HA-ready
Descriptionwhen the slave takes over the masters job, and there is a dhcp server on master, the slave does not know anything about the master's leases and re-assigns ip addresses to different hosts which lead in ip conflicts.

Solution:
Make the dhcp server HA-ready as described here:
http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/com.ibm.cluster.csm16010.admin.doc/am7ad_dhcpfaillnx.html [^]
Tagsheavy, needsfix
Attached Files

- Relationships

-  Notes
(0001817)
peter-endian (administrator)
2008-11-20 21:10

There is a possibility to solve the problem with the link above, however it is not so easy as it looks like.
There are 3 main problems:

1) it is only possible to have 2 DHCP servers in failover, not more
2) the vrrp backup / dhcp secondary needs to define a dhcp subnet
   for the management network, otherwise the daemon will not start
3) the DHCP failover protocol in reality is a DHCP loadbalancer. In reality
   it is planned that the secondary answers also to some requests. In our case
   this is not possible *as it is now* because the secondary is not in the
   master's subnet's, since the VIP is on the master, and we assign no RIP
   of the virtual subnet, but only from the management network.
4) The DHCP server is operational only after the first handshake of both
   the primary and the secondary. There is no possibility to have DHCP running
   when only the master is configured and the slave is not installed yet.
   After one single contact of both, after the initial synchronization of the
   leases, both will work also when the other is not reachable.
5) Both machines need to have exactly the same time

Possible solutions:
1) we need to define that we can have only 2 DHCP servers, a primary and a
   secondary. On clusters with more nodes, the other nodes will not run a
   DHCP secondary. It's simply not possible and to complicated to find a
   workaround and in the end it makes not much sense to have more than 2.

4) We need to communicate this. I think it's ok that it's necessary to have
   configured both nodes on a HA installation before DHCP can become
   operational.

5) This only needs to be written within the manual. It's nothing very surprising

2) 2 solutions:
   a) patch away that check
   b) restart the dhcp server always when a master/backup transition will be
      done, adding/removing that subnet definition

3) It would be great if both servers could answer. I managed it however also
   to have one server not reachable by the client. The primary will answer
   after a little (2s) timeout. Warning messages appear and it is slower, but at
   the end it works.

   Found out another possibility. *If it works* that would be way better:
   When the dhcp servers will start at a time when the machine holds the VIP
   the DHCP server is able to respond to the requests, also if the VIP will
   be removed from the device.
   That means, also a vrrp backup would be able to answer and the loadbalancer
   would work as it is planned.

   But in *this* configuration it would be a nasty hack which surely causes
   problems. I would patch the server in order to add a configuration
   possibility which allows the server to listen on an interface and answer
   to requets of subnets which the machine don't actually hold. Some sort of
   virtual DHCP.
   This however needs also solution 2b

More tests needed...
(0001820)
peter-endian (administrator)
2008-11-21 12:40

2) can be solved using shared-network

for example:

shared-network "lala" {
subnet 192.168.21.0 netmask 255.255.255.0 {
        pool {

                        failover peer "slave";
                deny dynamic bootp clients;



        range 192.168.21.10 192.168.21.10;
        option subnet-mask 255.255.255.0;
        option domain-name "localdomain";
        option routers 192.168.21.1;
        option wpad "http://192.168.21.1/proxy.pac"; [^]
        option domain-name-servers 192.168.21.1;
        default-lease-time 3600;
        max-lease-time 7200;
        }
}
subnet 192.168.178.0 netmask 255.255.255.0 {
deny unknown-clients;
}
}
(0001824)
peter-endian (administrator)
2008-11-24 11:07

slave need to synchronize ntp with master, otherwise the time gap is to huge and the dhcp failover will refuse the connection between the 2 machines
(0001825)
peter-endian (administrator)
2008-11-24 11:31

3) definitively works

patching dhcpd in order to be able to configure an interface and a source-ip per subnet definition would solve the problem.
every subnet definition holds already a field "interface" and a field "interface-address", which store exactly those information. On startup that information will be read out from the NIC's. This behaviour should be changed in order to manually set that information regardless of the system configuration.
(0001884)
peter-endian (administrator)
2008-12-17 12:12

the "interface" does already exist,. it is only undocumented.
by adding the interface statement in every subnet definition, making it point to the respective zone's bridge interface, the slave dhcp is listening also on the unconfigured zone, but responds with ip address 0.0.0.0, since that interface has no ip address.

by setting server-identifier to the zones VIP, the dhcp server answers with that ip address instead of 0.0.0.0.

so, no patch needed, just configure it correctly.

example:

failover peer "slave" {
        secondary;
        address 192.168.177.2;
        port 519;
        peer port 519;
        peer address 192.168.177.1;
        max-response-delay 3;
        max-unacked-updates 3;
}

subnet 192.168.22.0 netmask 255.255.255.0 {
        interface br2;
        server-identifier 192.168.22.1;
                pool {
                    failover peer "slave";
                    deny dynamic bootp clients;
                    range 192.168.22.10 192.168.22.254;
                    option subnet-mask 255.255.255.0;
                    option domain-name "localdomain";
                    option routers 192.168.22.1;
                    option wpad "http://192.168.22.1/proxy.pac"; [^]
                    option domain-name-servers 192.168.22.1;
                    default-lease-time 30;
                    max-lease-time 72;
        }
}

shared-network green {
        subnet 192.168.20.0 netmask 255.255.255.0 {
                pool {
                        failover peer "slave";
                        deny dynamic bootp clients;
                        range 192.168.20.10 192.168.20.254;
                        option subnet-mask 255.255.255.0;
                        option domain-name "localdomain";
                        option routers 192.168.20.1;
                        option wpad "http://192.168.20.1/proxy.pac"; [^]
                        option domain-name-servers 192.168.20.1;
                        default-lease-time 3600;
                        max-lease-time 7200;
                }
        }
        subnet 192.168.177.0 netmask 255.255.255.0 {
                deny unknown-clients;
        }
}
(0001891)
peter-endian (administrator)
2008-12-17 21:43

adapted dhcp.conf template

- restartscript needs HA_* values and DHCP_CLIENT
- shouldStart() needs to stop dhcp if the node is >= 2
- slave ntp should maybe sync from master
(0001909)
peter-endian (administrator)
2008-12-23 14:00

master should redirect ntp connections of slave to the masters ntp server

- Issue History
Date Modified Username Field Change
2008-11-10 12:04 peter-endian New Issue
2008-11-10 12:04 peter-endian Assigned To => peter-endian
2008-11-10 12:04 peter-endian Tag Attached: needsfix
2008-11-12 11:56 peter-endian Tag Attached: heavy
2008-11-20 21:10 peter-endian Note Added: 0001817
2008-11-21 12:40 peter-endian Note Added: 0001820
2008-11-24 11:07 peter-endian Note Added: 0001824
2008-11-24 11:31 peter-endian Note Added: 0001825
2008-12-17 12:12 peter-endian Note Added: 0001884
2008-12-17 21:43 peter-endian Note Added: 0001891
2008-12-23 14:00 peter-endian Note Added: 0001909
2008-12-23 21:31 peter-endian Status new => resolved
2008-12-23 21:31 peter-endian Fixed in Version => 2.3
2008-12-23 21:31 peter-endian Resolution open => fixed
2009-10-27 12:01 peter-endian Status resolved => closed

Copyright © 2005-2008 Endian, SRL. All rights reserved.


Copyright © 2000 - 2012 MantisBT Group
Powered by Mantis Bugtracker