SYSTEM WARNING: 'date_default_timezone_get(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone.' in '/usr/share/mantis/www/core.php' line 264

0003015: Uplink failover doesn't seem to work. Failed uplink doesn't come back online. - MantisBT
MantisBT - Endian Firewall
View Issue Details
0003015Endian FirewallNetwork related (VPN, uplinks)public2010-06-18 01:522011-02-11 11:36
0003015: Uplink failover doesn't seem to work. Failed uplink doesn't come back online.
Automatic failover with two PPPoE WAN uplinks doesn't seem to work. unplugging main uplink causes complete disconnection while backup link is up. After plugging the line back in tha main uplink cannot re-establish connection until reboot.

Failover after manually disabling main uplink works fine.

See "Additional Information" for test description and attached screenshots for config.
Test 1:

1. Rebooted EFW. Both uplinks are up.
2. Went to /Network/Interfaces/Uplink Editor and deactivated the main uplink.
3. Main uplink went to "INACTIVE" on dashboard.
3. Backup interface took over. Everything is working fine.

Test 2:

1. Rebooted EFW. Both uplinks are up.
2. Pulled the DSL plug (between DSL modem and wall socket).
3. Main uplink went to "CONNECTING" on dashboard.
4. Backup interface DOES NOT take over. No internet connection through the EWF possible.

In this state the main uplink switches back and forth between "CONNECTING" and "INACTIVE". The backup uplink stays "UP".

And here comes the worst issue: when I re-connect the wire that I've unplugged before, the mail uplink switches between "CONNECTING" and "DEAD", but wouldn't go back up. The log shows this (repeating):

Jun 18 03:16:02 pppd[20281] Plugin loaded.
Jun 18 03:16:02 pppd[20281] RP-PPPoE plugin version 3.3 compiled against pppd 2.4.4
Jun 18 03:16:02 pppd[20281] pppd 2.4.4 started by root, uid 0
Jun 18 03:16:02 pppd[20281] PPP session is 6613
Jun 18 03:16:02 pppd[20281] Using interface ppp1
Jun 18 03:16:02 pppd[20281] Connect: ppp1 <--> eth4
Jun 18 03:16:03 pppd[20281] CHAP authentication succeeded
Jun 18 03:16:03 pppd[20281] CHAP authentication succeeded
Jun 18 03:16:03 pppd[20281] peer from calling number 00:90:1A:42:8A:BE authorized
Jun 18 03:16:03 pppd[20281] local IP address
Jun 18 03:16:03 pppd[20281] remote IP address
Jun 18 03:16:03 pppd[20281] primary DNS address
Jun 18 03:16:03 pppd[20281] secondary DNS address
Jun 18 03:16:09 pppd[20288] Terminating on signal 15
Jun 18 03:16:09 pppd[20288] Connect time 0.1 minutes.
Jun 18 03:16:09 pppd[20288] Sent 120 bytes, received 40 bytes.
Jun 18 03:16:09 pppd[20288] Connection terminated.
Jun 18 03:16:09 pppd[20288] Exit.
Jun 18 03:16:10 pppd[20762] Plugin loaded.
Jun 18 03:16:10 pppd[20762] RP-PPPoE plugin version 3.3 compiled against pppd 2.4.4
Jun 18 03:16:10 pppd[20762] pppd 2.4.4 started by root, uid 0
Jun 18 03:16:45 pppd[20762] Timeout waiting for PADO packets
Jun 18 03:16:45 pppd[20762] Unable to complete PPPoE Discovery
Jun 18 03:16:45 pppd[20762] Exit.

Only a reboot brings me back online. The backup link showed "UP" all the time.

This one might be a duplicate of case 0002213: "Endian Firewall not automatically change default route to the secondary uplink".
jpg failover.jpg (159,316) 2010-06-18 01:52
Issue History
2010-06-18 01:52akurzNew Issue
2010-06-18 01:52akurzFile Added: failover.jpg
2010-06-20 00:11akurzNote Added: 0004541
2010-06-23 23:15akurzNote Added: 0004556
2010-07-02 05:03rjeeves33Note Added: 0004576
2010-07-02 05:03rjeeves33Note Edited: 0004576
2010-09-20 17:51peter-endianNote Added: 0004770
2010-09-20 17:51peter-endianStatusnew => feedback
2010-09-20 20:19akurzNote Added: 0004775
2010-09-23 10:25peter-endianStatusfeedback => acknowledged
2010-09-23 10:25peter-endianTarget Version => future
2010-11-24 13:59luca-endianTag Attached: purple
2011-01-26 11:40peracchiNote Added: 0005528
2011-01-31 09:23luca-endianCustomer Occurencies => 10+
2011-02-01 09:18lorenzo-endianStatusacknowledged => new
2011-02-01 09:18lorenzo-endianAssigned To => peter-endian
2011-02-01 09:18lorenzo-endianStatusnew => acknowledged
2011-02-08 21:02peter-endianStatusacknowledged => confirmed
2011-02-11 11:36peter-endianStatusconfirmed => resolved
2011-02-11 11:36peter-endianFixed in Version => 2.5
2011-02-11 11:36peter-endianResolutionopen => fixed

2010-06-20 00:11   
Update: I got it working ONCE. Out of sheer desparation I set up a policy route as follows:

Source: GREEN, Destination: ANY, Service/Port: ANY/ANY, Route via: Main uplink, Use backup link if uplink fails: checked

- switched off uplink manually -> failover works after about 10 sec.
- switched main uplink back on -> routing switched over to main
- pulled cable from main uplink DSL modem -> failover works after about 30 sec.
- plugged cable back in -> main uplink comes back up and traffic goes through it

Perfect so far. Only when I tried the same thing about an hour later it didn't work and I haven't got it working since then. I haven't changed a thing between the two tests.

No idea if it worked once because of the policy rule I've set up or if it was pure coincidence.

During my tests I also found that at one point traffic from the ORANGE interface suddenly went through the main uplink, which it is not allowed to do (set by policy route). After a reboot ORANGE traffic went through the correct uplink again and always did ever since.

Something seems to be real wrong there.
2010-06-23 23:15   
Nobody interested???

One more update: as a workaround I have now removed the DSL modems and replaced them with small DSL routers, which are now taking care of the connections. The interfaces are set to "Ethernet DHCP".

With these settings everything seems to work all right (yet), but having two cheap router boxes in front of the EFW doesn't exactly make me sleep better. In my opinion the failover feature is one of the biggest differentiators to any other free firewall, but if it doesn't work with PPPoE - which is still one of the most widely used connection protocols - it will scare people away. It's also quite disturbing that this request sits here for 6 days and doesn't seem to get any attention, while other (from my perspective very minor) stuff is dealt with.

I am still willing to help fixing this issue by testing all scenarios possible with my equipment, but if it takes too long I'll live with the workaround for the time being and keep on looking around for alternatives. I know, I'm just one guy, but think about this as an example. There may be hundreds of people with the same issue who simply don't tell anybody. They just walk away. Wouldn't it be a pity to erode the (already not too big) community this way?

2010-07-02 05:03   
I'm interested mate. I can repoduce the PPPoE failover also. Such a shame as that's one of the attractions of the Endian device. I see this was an issue that got resolved back in an RC of 2.2. Looks like it's back.

2010-09-20 17:51   
Hey guys.
Thanks for the detailed report and tests

Please understand that we can't jump on each new bug report since we are sometimes busy with other stuff which to us mostly are the bigger show stopper. We do what we can to fix all problems, but lately due to heavy cleanups and further development needed to check the bug-tracker periodically rather than constantly.

Could you please confirm that your provider isn't refusing you if you try to reconnect to promptly? Most provider have a hammering prevention.

If this is not the case, probably there's a problem with DNS, since at that moment when the uplinksdaemon is checking the hosts ( probably is no dns resolver reachable (should not be the case however)..
-> please try to disable host checking

I hope this bug did not return (from 2.2), also because we did not really change code on uplink code.
2010-09-20 20:19   
Hi Peter,

I haven't checked into this issue any further as I found a workaround, as stated in this thread.

I believe the fact that those cheap routers don't seem to have any issues maintaining the connections proves that there's nothing wrong with the way the connections are set up (which are with two different providers btw.). The flaw is more likely to be found somewhere around DSL modem / PPPoE handling on the Endian box.

BTW: a dropped connection with any of my providers can be restored without delay. That's if there's no other issue with the connection/provider/line, which is the case most of the times when failover is supposed to trigger.

Also BTW: I understand you have other stuff to deal with, but this case was opened almost exactly 3 months ago. One can hardly speak about "jumping" on anything after this duration ;)

2011-01-26 11:40   
Hi everbody! :)

Any progress on this issue?

More than 6 months have passed and for me this bug is a big show stopper... :P

Best regards,