SYSTEM WARNING: 'date_default_timezone_get(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone.' in '/usr/share/mantis/www/core.php' line 264

0003015: Uplink failover doesn't seem to work. Failed uplink doesn't come back online. - MantisBT Endian Bugtracker
Endian Issue Tracker

Please see now our new Bugtracker system: JIRA

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0003015Endian FirewallNetwork related (VPN, uplinks)public2010-06-18 01:522011-02-11 11:36
Assigned Topeter-endian 
PlatformOSOS Version
Product Version2.4 
Target VersionfutureFixed in Version2.5 
Summary0003015: Uplink failover doesn't seem to work. Failed uplink doesn't come back online.
DescriptionAutomatic failover with two PPPoE WAN uplinks doesn't seem to work. unplugging main uplink causes complete disconnection while backup link is up. After plugging the line back in tha main uplink cannot re-establish connection until reboot.

Failover after manually disabling main uplink works fine.

See "Additional Information" for test description and attached screenshots for config.
Additional InformationTest 1:

1. Rebooted EFW. Both uplinks are up.
2. Went to /Network/Interfaces/Uplink Editor and deactivated the main uplink.
3. Main uplink went to "INACTIVE" on dashboard.
3. Backup interface took over. Everything is working fine.

Test 2:

1. Rebooted EFW. Both uplinks are up.
2. Pulled the DSL plug (between DSL modem and wall socket).
3. Main uplink went to "CONNECTING" on dashboard.
4. Backup interface DOES NOT take over. No internet connection through the EWF possible.

In this state the main uplink switches back and forth between "CONNECTING" and "INACTIVE". The backup uplink stays "UP".

And here comes the worst issue: when I re-connect the wire that I've unplugged before, the mail uplink switches between "CONNECTING" and "DEAD", but wouldn't go back up. The log shows this (repeating):

Jun 18 03:16:02 pppd[20281] Plugin loaded.
Jun 18 03:16:02 pppd[20281] RP-PPPoE plugin version 3.3 compiled against pppd 2.4.4
Jun 18 03:16:02 pppd[20281] pppd 2.4.4 started by root, uid 0
Jun 18 03:16:02 pppd[20281] PPP session is 6613
Jun 18 03:16:02 pppd[20281] Using interface ppp1
Jun 18 03:16:02 pppd[20281] Connect: ppp1 <--> eth4
Jun 18 03:16:03 pppd[20281] CHAP authentication succeeded
Jun 18 03:16:03 pppd[20281] CHAP authentication succeeded
Jun 18 03:16:03 pppd[20281] peer from calling number 00:90:1A:42:8A:BE authorized
Jun 18 03:16:03 pppd[20281] local IP address
Jun 18 03:16:03 pppd[20281] remote IP address
Jun 18 03:16:03 pppd[20281] primary DNS address
Jun 18 03:16:03 pppd[20281] secondary DNS address
Jun 18 03:16:09 pppd[20288] Terminating on signal 15
Jun 18 03:16:09 pppd[20288] Connect time 0.1 minutes.
Jun 18 03:16:09 pppd[20288] Sent 120 bytes, received 40 bytes.
Jun 18 03:16:09 pppd[20288] Connection terminated.
Jun 18 03:16:09 pppd[20288] Exit.
Jun 18 03:16:10 pppd[20762] Plugin loaded.
Jun 18 03:16:10 pppd[20762] RP-PPPoE plugin version 3.3 compiled against pppd 2.4.4
Jun 18 03:16:10 pppd[20762] pppd 2.4.4 started by root, uid 0
Jun 18 03:16:45 pppd[20762] Timeout waiting for PADO packets
Jun 18 03:16:45 pppd[20762] Unable to complete PPPoE Discovery
Jun 18 03:16:45 pppd[20762] Exit.

Only a reboot brings me back online. The backup link showed "UP" all the time.

This one might be a duplicate of case 0002213: "Endian Firewall not automatically change default route to the secondary uplink".
Attached Filesjpg file icon failover.jpg [^] (159,316 bytes) 2010-06-18 01:52

- Relationships

-  Notes
akurz (reporter)
2010-06-20 00:11

Update: I got it working ONCE. Out of sheer desparation I set up a policy route as follows:

Source: GREEN, Destination: ANY, Service/Port: ANY/ANY, Route via: Main uplink, Use backup link if uplink fails: checked

- switched off uplink manually -> failover works after about 10 sec.
- switched main uplink back on -> routing switched over to main
- pulled cable from main uplink DSL modem -> failover works after about 30 sec.
- plugged cable back in -> main uplink comes back up and traffic goes through it

Perfect so far. Only when I tried the same thing about an hour later it didn't work and I haven't got it working since then. I haven't changed a thing between the two tests.

No idea if it worked once because of the policy rule I've set up or if it was pure coincidence.

During my tests I also found that at one point traffic from the ORANGE interface suddenly went through the main uplink, which it is not allowed to do (set by policy route). After a reboot ORANGE traffic went through the correct uplink again and always did ever since.

Something seems to be real wrong there.
akurz (reporter)
2010-06-23 23:15

Nobody interested???

One more update: as a workaround I have now removed the DSL modems and replaced them with small DSL routers, which are now taking care of the connections. The interfaces are set to "Ethernet DHCP".

With these settings everything seems to work all right (yet), but having two cheap router boxes in front of the EFW doesn't exactly make me sleep better. In my opinion the failover feature is one of the biggest differentiators to any other free firewall, but if it doesn't work with PPPoE - which is still one of the most widely used connection protocols - it will scare people away. It's also quite disturbing that this request sits here for 6 days and doesn't seem to get any attention, while other (from my perspective very minor) stuff is dealt with.

I am still willing to help fixing this issue by testing all scenarios possible with my equipment, but if it takes too long I'll live with the workaround for the time being and keep on looking around for alternatives. I know, I'm just one guy, but think about this as an example. There may be hundreds of people with the same issue who simply don't tell anybody. They just walk away. Wouldn't it be a pity to erode the (already not too big) community this way?

rjeeves33 (reporter)
2010-07-02 05:03
edited on: 2010-07-02 05:03

I'm interested mate. I can repoduce the PPPoE failover also. Such a shame as that's one of the attractions of the Endian device. I see this was an issue that got resolved back in an RC of 2.2. Looks like it's back.

peter-endian (administrator)
2010-09-20 17:51

Hey guys.
Thanks for the detailed report and tests

Please understand that we can't jump on each new bug report since we are sometimes busy with other stuff which to us mostly are the bigger show stopper. We do what we can to fix all problems, but lately due to heavy cleanups and further development needed to check the bug-tracker periodically rather than constantly.

Could you please confirm that your provider isn't refusing you if you try to reconnect to promptly? Most provider have a hammering prevention.

If this is not the case, probably there's a problem with DNS, since at that moment when the uplinksdaemon is checking the hosts ( probably is no dns resolver reachable (should not be the case however)..
-> please try to disable host checking

I hope this bug did not return (from 2.2), also because we did not really change code on uplink code.
akurz (reporter)
2010-09-20 20:19

Hi Peter,

I haven't checked into this issue any further as I found a workaround, as stated in this thread.

I believe the fact that those cheap routers don't seem to have any issues maintaining the connections proves that there's nothing wrong with the way the connections are set up (which are with two different providers btw.). The flaw is more likely to be found somewhere around DSL modem / PPPoE handling on the Endian box.

BTW: a dropped connection with any of my providers can be restored without delay. That's if there's no other issue with the connection/provider/line, which is the case most of the times when failover is supposed to trigger.

Also BTW: I understand you have other stuff to deal with, but this case was opened almost exactly 3 months ago. One can hardly speak about "jumping" on anything after this duration ;)

peracchi (reporter)
2011-01-26 11:40

Hi everbody! :)

Any progress on this issue?

More than 6 months have passed and for me this bug is a big show stopper... :P

Best regards,

- Issue History
Date Modified Username Field Change
2010-06-18 01:52 akurz New Issue
2010-06-18 01:52 akurz File Added: failover.jpg
2010-06-20 00:11 akurz Note Added: 0004541
2010-06-23 23:15 akurz Note Added: 0004556
2010-07-02 05:03 rjeeves33 Note Added: 0004576
2010-07-02 05:03 rjeeves33 Note Edited: 0004576
2010-09-20 17:51 peter-endian Note Added: 0004770
2010-09-20 17:51 peter-endian Status new => feedback
2010-09-20 20:19 akurz Note Added: 0004775
2010-09-23 10:25 peter-endian Status feedback => acknowledged
2010-09-23 10:25 peter-endian Target Version => future
2010-11-24 13:59 luca-endian Tag Attached: purple
2011-01-26 11:40 peracchi Note Added: 0005528
2011-01-31 09:23 luca-endian Customer Occurencies => 10+
2011-02-01 09:18 lorenzo-endian Status acknowledged => new
2011-02-01 09:18 lorenzo-endian Assigned To => peter-endian
2011-02-01 09:18 lorenzo-endian Status new => acknowledged
2011-02-08 21:02 peter-endian Status acknowledged => confirmed
2011-02-11 11:36 peter-endian Status confirmed => resolved
2011-02-11 11:36 peter-endian Fixed in Version => 2.5
2011-02-11 11:36 peter-endian Resolution open => fixed

Copyright © 2005-2008 Endian, SRL. All rights reserved.

Copyright © 2000 - 2012 MantisBT Group
Powered by Mantis Bugtracker