touch-packages team mailing list archive
-
touch-packages team
-
Mailing list archive
-
Message #83430
[Bug 1429427] Re: Unexplainable time jumps in CRON
Gave it another 10 mins, this time checking the 'receiving' end, being
rsyslog in my case:
[pid 583] <... select resumed> ) = 1 (in [0])
[pid 583] recvmsg(0, {msg_name(0)=NULL, msg_iov(1)=[{"<86>Jun 9 10:01:01 CRON[4860]: pam_unix(cron:session): session opened for user sogo by (uid=0)", 8096}], msg_controllen=64, {cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, MSG_DONTWAIT) = 95
[pid 583] futex(0x73de74, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x73de70, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
[pid 585] <... futex resumed> ) = 0
[pid 583] <... futex resumed> ) = 1
[pid 585] futex(0x73e060, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 583] futex(0x73e060, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 585] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 583] <... futex resumed> ) = 0
[pid 585] futex(0x73e060, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 583] select(4, [0 3], NULL, NULL, NULL <unfinished ...>
[pid 585] <... futex resumed> ) = 0
[pid 585] write(5, "Jun 9 10:01:01 mail1 CRON[4860]: pam_unix(cron:session): session opened for user sogo by (uid=0)\n", 98) = 98
[pid 585] write(1, "May 30 19:38:01 mail1 CRON[4860]: message repeated 3 times: [ (sogo) CMD (/usr/sbin/sogo-ealarms-notify && /usr/sbin/sogo-tool expire-sessions 60)]\n", 148) = 148
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC <unfinished ...>
[pid 583] <... select resumed> ) = 1 (in [0])
[pid 585] <... open resumed> ) = -1 ENOENT (No such file or directory)
[pid 583] recvmsg(0, <unfinished ...>
[pid 585] futex(0x73de74, FUTEX_WAIT_PRIVATE, 2668217, NULL <unfinished ...>
[pid 583] <... recvmsg resumed> {msg_name(0)=NULL, msg_iov(1)=[{"<78>Jun 9 10:01:01 CRON[4861]: (sogo) CMD (/usr/sbin/sogo-ealarms-notify && /usr/sbin/sogo-tool expire-sessions 60)", 8096}], msg_controllen=64, {cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, MSG_DONTWAIT) = 116
[pid 585] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 583] futex(0x73de74, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x73de70, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
[pid 585] futex(0x73e060, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 583] <... futex resumed> ) = 0
[pid 585] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 583] futex(0x73e060, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 585] futex(0x73e060, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 583] <... futex resumed> ) = 0
[pid 585] <... futex resumed> ) = 0
[pid 583] select(4, [0 3], NULL, NULL, NULL <unfinished ...>
[pid 585] write(1, "Jun 9 10:01:01 mail1 CRON[4861]: (sogo) CMD (/usr/sbin/sogo-ealarms-notify && /usr/sbin/sogo-tool expire-sessions 60)\n", 119) = 119
[pid 585] futex(0x73de74, FUTEX_WAIT_PRIVATE, 2668219, NULL <unfinished ...>
[pid 583] <... select resumed> ) = 1 (in [0])
[pid 583] recvmsg(0, {msg_name(0)=NULL, msg_iov(1)=[{"<86>Jun 9 10:01:01 CRON[4860]: pam_unix(cron:session): session closed for user sogo", 8096}], msg_controllen=64, {cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, MSG_DONTWAIT) = 84
[pid 583] futex(0x73de74, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x73de70, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
[pid 585] <... futex resumed> ) = 0
[pid 583] <... futex resumed> ) = 1
[pid 585] futex(0x73e060, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 583] select(4, [0 3], NULL, NULL, NULL <unfinished ...>
[pid 585] <... futex resumed> ) = 0
[pid 585] write(5, "Jun 9 10:01:01 mail1 CRON[4860]: pam_unix(cron:session): session closed for user sogo\n", 87) = 87
[pid 585] futex(0x73de74, FUTEX_WAIT_PRIVATE, 2668221, NULL <unfinished ...>
>From this I can see I need to comment out this darn xconsole in
/etc/rsyslog.d/50-default.conf since this is a vm. But the more
interesting thing is of course:
recvmsg -> pam session open -> write the same to syslog files
recvmsg -> cron cmd -> write the same to syslog
recvmsg -> pam session closed -> write to syslog
But in this instance where things mess up, I'm seeing:
recvmsg -> pam session open -> write to syslog (still all good)
(no matching recvmsg) cron cmd -> write to syslog (wrong date/time, same cron pid from session open, repeated 3 times ???)
recvmsg -> cron cmd (sane pid again) -> write to syslog (all good)
recvmsg -> pam session closed -> write to syslog (all good)
So I doubt this has anything to do with cron, pretty much convinced it's in rsyslog.
And then the question is, is this some rsyslog bug or misconfiguration, which imho should still not cause anything like this to happen. With misconfig I mean the xconsole maybe causing some race but I'd have to monitor a bit further then for this to happen at the right time. Right time as in, most likely around the same time this particular cron starts.
And the reporter of this bug mentioned restarting cron did not help,
only rebooting helps. That would make sense since everything points at
rsyslog and not cron. So I will also try restarting just rsyslog and see
what that brings before things go nuts again ;-)
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to cron in Ubuntu.
https://bugs.launchpad.net/bugs/1429427
Title:
Unexplainable time jumps in CRON
Status in rsyslog package in Ubuntu:
Confirmed
Bug description:
On my main server I see unexplainable time jumps backwards in the syslog. Those jumps affect CRON.
Example:
Feb 10 06:48:01 nostromo CRON[20351]: (root) CMD ( /storage/exec/checkinternet.sh 2>/dev/null 1>/dev/null)
Feb 10 06:49:01 nostromo CRON[20364]: (root) CMD ( /storage/exec/checkinternet.sh 2>/dev/null 1>/dev/null)
Feb 10 06:50:01 nostromo CRON[20386]: (root) CMD ( /storage/exec/status-nostromo.sh >/dev/null 2>&1)
Feb 7 05:40:01 nostromo CRON[20389]: (root) CMD ( /storage/exec/checkinternet.sh 2>/dev/null 1>/dev/null)
Feb 10 06:50:01 nostromo CRON[20390]: (root) CMD ( /storage/exec/checkinternet.sh 2>/dev/null 1>/dev/null)
Feb 10 06:50:01 nostromo CRON[20391]: (root) CMD ( /storage/exec/checkip.sh 2>/dev/null 1>/dev/null)
For debugging I did the following:
Start xclock and watch xclock and tail -f /var/log/syslog in parallel. When CRON logged a wrong time, xclock did NOT show any time jump but seemed to freeze for a fraction of a second.
Open a screen and start a script that will once per second read the time (in unix seconds) and compare the read time with the time read a second ago. If the current time was smaller, the script would send an email with a process list from before and after the jump. The script also never detected any time jump.
In summary, my current impression is that there might be a bug in CRON because no other programm seems to be able to see the "wrong" time. The server in question is syslog server for 4 servers and 3 network devices. The time jumps exclusively show in syslog entries from the local CRON instance. Not in any remote syslog entry and not in any other local syslog entry, e.g. from DHCPD, bind, tftpd, etc. etc.
Also, after a reboot, things work ok for several days upto about 2 or 3 weeks. Then the "time jumps" start to occur with increasing frequency.
I don't use user crontabs but maintain all jobs in /etc/crontab. I
have number of jobs which are triggered every minute and another
number of jobs which are triggered every 5 minutes (maybe some CRON
internal counter overflow problem?).
Hardware:
Asus P9D-V
Intel Xeon E3-1240L V3
16GB ECC RAM
128GB SSD System
3x3TB ZFS RaidZ2 storage
1x3TB Misc. data
CMOS battery already changed and board inspected.
nostromo:~ # lsb_release -rd
Description: Ubuntu 14.04.2 LTS
Release: 14.04
nostromo:~ # apt-cache policy cron
cron:
Installed: 3.0pl1-124ubuntu2
Candidate: 3.0pl1-124ubuntu2
Version table:
*** 3.0pl1-124ubuntu2 0
500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages
100 /var/lib/dpkg/status
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1429427/+subscriptions
References