SYSTEM WARNING: 'date_default_timezone_get(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone.' in '/usr/share/mantis/www/core.php' line 264

0003170: Collectd Crash abnormally - MantisBT Endian Bugtracker
Endian Issue Tracker





Please see now our new Bugtracker system: JIRA








View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0003170Endian FirewallGUIpublic2010-10-05 10:162011-03-18 10:34
Reporterardit-endian 
Assigned Topeter-endian 
PrioritynormalSeverityfeatureReproducibilityalways
StatusconfirmedResolutionopen 
PlatformOSOS Version
Product Version2.3.1 
Target VersionFixed in Version 
Summary0003170: Collectd Crash abnormally
Description------------------------------------------
root@endian:~ # /etc/init.d/collectd start
Starting collectd: [ OK ]
root@endian:~ # /etc/init.d/collectd status
collectd (pid 26008) is running...
root@endian:~ # /etc/init.d/collectd status
collectd (pid 26083) is running...
root@endian:~ # /etc/init.d/collectd status
collectd (pid 26083) is running...
root@endian:~ # /etc/init.d/collectd status
collectd dead but pid file exists

This issue is coming back, For unknown reasons collectd crashes after 5 sec.

Here is the output of collectd -f
------------------------------------------
Dispatching value to all write plugins failed with status -1.uc_update: Value too old: name = 8c989399-24d4-4214-8566-f7c7a224c9cf/ntpd/time_dispersion-196.43.1.14; value time = 1286273031; last cache update = 1286273031;rrdtool plugin: (rc->last_value = 1286273031) >= (value_time = 1286273031)Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1.uc_update: Value too old: name = 8c989399-24d4-4214-8566-f7c7a224c9cf/ntpd/delay-196.43.1.14; value time = 1286273031; last cache update = 1286273031;rrdtool plugin: (rc->last_value = 1286273031) >= (value_time = 1286273031)Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1.uc_update: Value too old: name = 8c989399-24d4-4214-8566-f7c7a224c9cf/ntpd/time_offset-196.43.1.14; value time = 1286273036; last cache update = 1286273036;rrdtool plugin: (rc->last_value = 1286273036) >= (value_time = 1286273036)Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1.uc_update: Value too old: name = 8c989399-24d4-4214-8566-f7c7a224c9cf/ntpd/time_dispersion-196.43.1.14; value time = 1286273036; last cache update = 1286273036;rrdtool plugin: (rc->last_value = 1286273036) >= (value_time = 1286273036)Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1.uc_update: Value too old: name = 8c989399-24d4-4214-8566-f7c7a224c9cf/ntpd/delay-196.43.1.14; value time = 1286273036; last cache update = 1286273036;rrdtool plugin: (rc->last_value = 1286273036) >= (value_time = 1286273036)Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1.Buss Error.
-------------------------------

The programs crash after the "Buss Error"
To work around this I removed

/var/lib/collectd/rrd/8c989399-24d4-4214-8566-f7c7a224c9cf dir, and at least the program was not any more crashing (still displays the errors).

The bug is reproducible on some machines only.
Tagspurple
Attached Files? file icon collectd-logs [^] (91,155 bytes) 2011-02-15 18:31 [Show Content]

- Relationships

-  Notes
(0004913)
ardit-endian (developer)
2010-10-05 10:18

Found this on collectd wiki
http://collectd.org/wiki/index.php/Target:Write [^]
maybe it helps..
(0004915)
peter-endian (administrator)
2010-10-05 10:29

hmm, this happens when the new value has an older timestamp as the last entries in the rrd databases.

this will happen for sure if the clock changes abruptly (manually or with ntpdate)
ntpd should not cause this since it is shifting the clock slowly making it running faster or slower.

could this be happened?

of course it should not crash if this happens
isn't monit restarting it automatically?

there's also a shellscript:
/usr/local/bin/rrdfix.sh

which removes all rrd files if the last entry is newer than the current time.
it is running every 5 minutes, maybe it is not working correctly?
or probably that is the source of the problem, since it removes the rrd database while collectd writes to it (?)

can you please check that?
(0004916)
ardit-endian (developer)
2010-10-05 14:45

I'm sorry, I'm unable to make this tests :( because I have no access any more to that machine and I don't know in detail how collectd works.
(0005704)
ardit-endian (developer)
2011-02-15 16:22

Hi,

I had the same problem on another system, this one is 2.3 up to date and have 4 cpu,s
so, again collectd was unable to run, before trying my fix above I run rrdfix and it executes correctly and gives me back the console, I run it several times to be sure.

After that, try again to start collectd but it exit's with (again a bus error).
I had to remove all dirs under /var/lib/collectd/rrd to make it run "properly"
and when I say "normally" this is the output:

collectd -C /etc/collectd.conf -f
Found a configuration for the `iptables' plugin, but the plugin isn't loaded or didn't register a configuration callback.Found a configuration for the `iptables' plugin, but the plugin isn't loaded or didn't register a configuration callback.Found a configuration for the `iptables' plugin, but the plugin isn't loaded or didn't register a configuration callback.Initialization complete, entering read-loop.unixsock plugin: Removing stale socket file /var/run/collectdread-function of plugin `openvpn' failed. Will suspend it for 10 seconds.read-function of plugin `openvpn' failed. Will suspend it for 20 seconds.read-function of plugin `openvpn' failed. Will suspend it for 40 seconds.read-function of plugin `openvpn' failed. Will suspend it for 80 seconds.

The bus error was:

------------------------------------
root@EFW-VOB:/etc # collectd -C collectd.conf -f
Found a configuration for the `iptables' plugin, but the plugin isn't loaded or didn't register a configuration callback.Found a configuration for the `iptables' plugin, but the plugin isn't loaded or didn't register a configuration callback.Found a configuration for the `iptables' plugin, but the plugin isn't loaded or didn't register a configuration callback.Initialization complete, entering read-loop.unixsock plugin: Removing stale socket file /var/run/collectdread-function of plugin `openvpn' failed. Will suspend it for 10 seconds.Bus error
--------------------------------------

now the graphics are generated but 2 of the cpu's (the guy have 4 CPU's) are keeping 100% (one of them always, the other not that much).. top shows ntop consuming the processor, anyway after restarting ntop it's ok (cpu eating).
(0005706)
lorenzo-endian (manager)
2011-02-15 18:31

Hello everybody,

Attacched, the logs of a 2,4 fully up-to-date...the execution of rrdfix.sh does not change anything.

Hope this helps....

BTW, on my installation the collectd process does not crash.

Lo
(0005708)
ardit-endian (developer)
2011-02-16 08:39

Hello,

in my installation either :)

for some reason the way rrd's are generated in /var/lib/collectd/rrd/ make the daemon crash.
bu any way this message is quite persistent.

"Found a configuration for the `iptables' plugin, but the plugin isn't loaded or didn't register a configuration callback.Found a configuration for the `iptables' plugin, but the plugin isn't loaded or didn't register a configuration callback.Found a configuration for the `iptables' plugin, but the plugin isn't loaded or didn't register a configuration callback.filecount plugin: No directories have been configured.Initialization of plugin `filecount' failed with status -1. Plugin will be unloaded.tail plugin: File list is empty. Returning an error.Initialization of plugin `tail' failed with status -1. Plugin will be unloaded.Initialization complete, entering read-loop.read-function of plugin `openvpn' failed. Will suspend it for 10 seconds.unixsock plugin: Socket file /var/run/collectd is in use by another process.read-function of plugin `openvpn' failed."

run it with collectd -C /etc/collectd.conf -f

Why the errors?
(0005710)
ardit-endian (developer)
2011-02-16 08:51

ah, as Peter said: "hmm, this happens when the new value has an older timestamp as the last entries in the rrd databases."

-------------------------------------------------------------
 failed: illegal attempt to update using time 1297845950 when last update time is 1297845989 (minimum one second step)rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/d1087ad6-f595-4a70-8a12-a4ddadced983/disk-hda4/disk_time.rrd) failed: illegal attempt to update using time 1297845945 when last update time is 1297846044 (minimum one second step)rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/d1087ad6-f595-4a70-8a12-a4ddadced983/disk-hda5/disk_octets.rrd) failed: illegal attempt to update using time 1297845945 when last update time is 1297845994 (minimum one second step)rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/d1087ad6-f595-4a70-8a12-a4ddadced983/disk-hda5/disk_ops.rrd) failed: illegal attempt to update using time 1297845945 when last update time is 1297845999 (minimum one second step)rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/d1087ad6-f595-4a70-8a12-a4ddadced983/disk-hda5/disk_merged.rrd) failed: illegal attempt to update using time 1297845945 when last update time is 1297845964 (minimum one second step)rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/d1087ad6-f595-4a70-8a12-a4ddadced983/disk-hda6/disk_time.rrd) failed: illegal attempt to update using time 1297845945 when last update time is 1297845959 (minimum one second step)rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/d1087ad6-f595-4a70-8a12-a4ddadced983/disk-hda1/disk_octets.rrd) failed: illegal attempt to update using time 1297845950 when last update time is 1297846029 (minimum one second step)rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/d1087ad6-f595-4a70-8a12-a4ddadced983/disk-hda3/disk_octets.rrd) failed: illegal attempt to update using time 1297845950 when last update time is 1297845954 (minimum one second step)rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/d1087ad6-f595-4a70-8a12-a4ddadced983/disk-hda4/disk_ops.rrd) failed: illegal attempt to update using time 1297845950 when last update time is 1297846034 (minimum one second step)
---------------------------------------------------------------
[this was in my testing machine]

Just let the daemon run for a while in (-f) your system and you will confirm this.

Thanks.
(0005894)
lorenzo-endian (manager)
2011-03-08 17:18

hey ardit,

on one hand, I can confirm the plugin problems (today I saw the same on my test system), on the other hand my collecd never crashes :/

Now it is running, I will keep you updated.

Thanks a lot

Lo
(0005898)
ardit-endian (developer)
2011-03-09 09:10
edited on: 2011-03-09 09:11

Hi,

yes the crash happened only in some systems [why should this happen?] , but there are errors with the plugins and also with rrd's timestamps.

Do you confirm this?

(0005899)
lorenzo-endian (manager)
2011-03-09 09:12
edited on: 2011-03-09 09:22

Yes, those errors are confirmed!


- Issue History
Date Modified Username Field Change
2010-10-05 10:16 ardit-endian New Issue
2010-10-05 10:18 ardit-endian Note Added: 0004913
2010-10-05 10:20 luca-endian Tag Attached: purple
2010-10-05 10:29 peter-endian Note Added: 0004915
2010-10-05 10:29 peter-endian Status new => acknowledged
2010-10-05 14:45 ardit-endian Note Added: 0004916
2011-02-15 16:22 ardit-endian Note Added: 0005704
2011-02-15 16:22 ardit-endian Customer Occurencies => 2-3
2011-02-15 16:38 ra-endian Status acknowledged => new
2011-02-15 16:38 ra-endian Assigned To => lorenzo-endian
2011-02-15 18:31 lorenzo-endian Note Added: 0005706
2011-02-15 18:31 lorenzo-endian Status new => feedback
2011-02-15 18:31 lorenzo-endian File Added: collectd-logs
2011-02-16 08:39 ardit-endian Note Added: 0005708
2011-02-16 08:51 ardit-endian Note Added: 0005710
2011-03-08 17:18 lorenzo-endian Note Added: 0005894
2011-03-09 09:10 ardit-endian Note Added: 0005898
2011-03-09 09:11 ardit-endian Note Edited: 0005898
2011-03-09 09:12 lorenzo-endian Note Added: 0005899
2011-03-09 09:22 lorenzo-endian Note Edited: 0005899
2011-03-18 10:34 lorenzo-endian Assigned To lorenzo-endian => peter-endian
2011-03-18 10:34 lorenzo-endian Status feedback => confirmed

Copyright © 2005-2008 Endian, SRL. All rights reserved.


Copyright © 2000 - 2012 MantisBT Group
Powered by Mantis Bugtracker