SYSTEM WARNING: 'date_default_timezone_get(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone.' in '/usr/share/mantis/www/core.php' line 264

0000948: DNS not working - MantisBT
MantisBT - Endian Firewall
View Issue Details
0000948Endian FirewallNetwork related (VPN, uplinks)public2008-06-13 21:212008-10-08 13:30
jsalgado 
peter-endian 
urgentmajorrandom
closedfixed 
2.2-rc1 
2.2-rc32.2-rc3 
0000948: DNS not working
I have seen now for serveral days, that suddenly we have no DNS resolution, we do have internet we may ping IP addresses without any problem but we may not ping www.google.com.

If we set on our clients external DNS instead of pointing to the Endian box, everything works all right

I have checked and the dns service is up, even thou we obviously arent able to get an ip for our internal hostnames defined in the endian BOX
We have tested now on 3 or 4 different boxes.

 It just happens, then sometimes suddenly starts working again, without any reboot or tweak, It happens more often when we have just installed a new box.

Sometimes the endian may fail on the evening and in the next morning it is still wrong. on one of the boxes a reboot or two fixes the problem.

On a box yesterday, it was still failing after 6 reboots, we went to lunch, and when we return, DNS resolution was sudenly working.

I think severy is major because, user thinks the box or the link is down. they are unable to browse.

I have also checked using DNS proxy enabled, o via endians http proxy, and it behaves the same. NO DNS resolution, with or without http or dns proxy
No tags attached.
related to 0000923closed peter-endian setxtaccess: sometimes it does not create rules because of the iptablesxtaccess cache file. 
related to 0001039closed peter-endian iptables has a race condition when it will be started in parallel 
Issue History
2008-06-13 21:21jsalgadoNew Issue
2008-06-13 21:21jsalgadoStatusnew => assigned
2008-06-13 21:21jsalgadoAssigned To => peter-endian
2008-06-18 08:01peter-endianNote Added: 0001321
2008-06-18 08:01peter-endianStatusassigned => feedback
2008-06-18 19:36jsalgadoNote Added: 0001332
2008-06-23 22:36peter-endianTarget Version => 2.2-rc2
2008-06-23 22:40peter-endianNote Added: 0001347
2008-06-24 04:41jsalgadoNote Added: 0001349
2008-06-24 04:44jsalgadoNote Edited: 0001349
2008-06-24 09:57peter-endianNote Added: 0001351
2008-06-25 02:50jvodanNote Added: 0001356
2008-06-25 12:56peter-endianRelationship addedrelated to 0000923
2008-06-25 17:02jsalgadoNote Added: 0001368
2008-06-25 21:47jsalgadoNote Added: 0001371
2008-06-26 00:37jvodanNote Added: 0001372
2008-06-30 10:18peter-endianRelationship addedrelated to 0001039
2008-07-24 16:05ra-endianTarget Version2.2-rc2 => 2.2-rc3
2008-07-28 10:49peter-endianPrioritynormal => urgent
2008-08-04 17:22peter-endianStatusfeedback => resolved
2008-08-04 17:22peter-endianFixed in Version => 2.2-rc3
2008-08-04 17:22peter-endianResolutionopen => fixed
2008-10-08 13:30peter-endianStatusresolved => closed

Notes
(0001321)
peter-endian   
2008-06-18 08:01   
we face no such problems on any of our installations. maybe it is related to something on with your setup.

can you add "query-logs" to /var/efw/dnsmasq/dnsmasq.custom.tmpl
and then
restartdnsmasq.py

afterwards you will have more dnsmasq debug messages in /var/log/messages
so maybe you can sort out the problem.

possible causes:
- an internal client is flooding the resolver with to many dns requests
- one of the resolvers is not working properly (you should see that in the logs then)
(0001332)
jsalgado   
2008-06-18 19:36   
I is very odd, because I do have experienced it on every single rc1 install I have done (over 8 now) on 3 XEN, 3 AMD, 2 INTEL, 1 Dual Athlon, It has happened on T1, Static IP, dinamic IP, PPPoE.

And every one of them, now and then loose dns resolution, and have to be rebooted. I have tested on 8 different LANs (not just ours) they all failed ON or near the first install, and then over time (which then may agree on a flood maybe)

I have heavily tested the resolvers before posting this bug, I tested over 8 diferent LANs, with different Internet providers, and hardware, and i have never come to feel comfortable about RC1 dns resolving It is really far from "rock solid", we are testing right here one RC1 box, with just linux clients, 3 at most simultaneously, and we still get DNS dropped about once a week. Is the same case to my Xen friends (they almos reboot daily just to be ahead the dns stall), my T1 Friend is holding his fist week, havent failed after the install day which lasted the whole day without DNS, we had to map, directly to the ISPs DNSs (which were exactly the same ON the endian box).

As I mentioned before when this fails, the endian box is able to solve DNS without any hazle, it just stops providing DNS solving for the LAN clients.

Thank you.
(0001347)
peter-endian   
2008-06-23 22:40   
can you try to restart the service with:
restartdnsmasq.py --force

if that does not help with:
/etc/init.d/dnsmasq restart

if that does not help please create the file:
/var/efw/dnsmasq/dnsmasq.custom.tmpl
with content:
log-queries

and restart with: restartdnsmasq.py

dnsmasq will then log messages for each query to /var/log/messages
please try to see if there is a problem with a dns.

here may be a problem with the restart of dnsmasq, i faced a similar situation, but i am unsure if that was only bad configuration, since it worked thereafter. your information would be much appreciated.
(0001349)
jsalgado   
2008-06-24 04:41   
(edited on: 2008-06-24 04:44)
Thank you very much for your help, I once did test with /etc/init.d/dnsmasq before with my T1 friend, I have rebooted three times, and I was erroneously looking for the "bind" service, when i found dnsmasq, I restarted it... and DNS came back to life, but we were already, using the PDC DNS*, I have not been able to test once more because when this happens I/we usually reboot, but I will test on our next chance.

* I couldnt be more betrayed by myself, I actualy leave WIN2003 for DNS!!! (well I will keep that a secret) =)

Thanks again, I do agree it seems to be a restart or start problem with dnsmasq. Because when I configure a a regular Linux Box as a router, I follow the steps I did to know where the problem is, And thats why I was looking for "BIND", since he is not here I will suspect from his brother dnsmasq.

(0001351)
peter-endian   
2008-06-24 09:57   
aah ok, so there's probably another cause:

did you configure internal resolvers on our uplink?
this is not going to work.
we automatically create policy routes in order to force dns traffic for the respective dns resolver to exit to the uplink where that resolver is configured.

this way we can use all dns resolvers of all providers at the same time. otherwise providers may block dns treffic if it does not come from an ip address of their pool.

if you need to configure internal dns servers you can do that with dns proxy > custom nameserver
(0001356)
jvodan   
2008-06-25 02:50   
I got this problem post update (last few days)
I found that if I put in a system access rule for DNS on the firewall
DNS works again.
(0001368)
jsalgado   
2008-06-25 17:02   
Thank you peter, do you mean by "internal resolvers" a DNS server on my LAN?
Well either way I have used 2 kinds of DNS, the automatic for PPPoE, and for T1/E1 I have set always the providers own DNS servers ( External to my network offcourse)

I understand what you say, about providers blocking dns traffic out of their pool request. But I havent sent DNSs other than the provider's.

But this is interesting to know, and it get me to think, that my provider uses DNS s server addresses more or less out of their (at least) expected pool. My ip is 189.X.X.X and our DNSs service is 200.X.X.X . Do you think this will have trouble with the automatically crated policies ?
(0001371)
jsalgado   
2008-06-25 21:47   
Thank you jvodan, I havent been able to test you solution, because DNS has been working for some days... But isnt DNS system access rule a preset out of the box?
(0001372)
jvodan   
2008-06-26 00:37   
Using netstat I saw that somthing was listening of 0.0.0.0:53
yet using tcpdump I could see no replies from dns request, after that it was kind of instinctive to add a system rule for access to port 53.

I assume there is normally a rule in there (though hidden from the gui)
I guess somehow/someway, post recent updates it doesn't get added when the software rebuilds the revlevant files in /etc/ from /var/efw/.

Well that's my guess.
I haven't being using efw long.