openerp-india team mailing list archive
-
openerp-india team
-
Mailing list archive
-
Message #23688
[Bug 1129967] Re: [6.1] Cron loop with separate OpenERP and PostgreSQL servers with out-of-sync clocks
Hi,
Your servers were drifting away progressively from each other in time,
and this is bound to cause issues in the long term, so it's best to
notice it quickly, one way or another. You really needed to synchronize
their clocks.
That said it's clearly not an intended feature, and we could imagine
removing that "return" line to make the scheduling fair even in that
case. For the record in 7.0 this was fixed at the same time as bug
1086396 at revision 4744
revid:odo@xxxxxxxxxxx-20121222011355-te4osnedofct0p1u, but this patch
cannot be backported to 6.1 as-is, as it would violate the stable
policy.
** Changed in: openobject-server
Importance: Undecided => Wishlist
** Changed in: openobject-server
Status: New => Opinion
** Changed in: openobject-server
Assignee: (unassigned) => OpenERP's Framework R&D (openerp-dev-framework)
--
You received this bug notification because you are a member of OpenERP
Indian Team, which is subscribed to OpenERP Server.
https://bugs.launchpad.net/bugs/1129967
Title:
[6.1] Cron loop with separate OpenERP and PostgreSQL servers with out-
of-sync clocks
Status in OpenERP Server:
Opinion
Bug description:
We have been seeing an issue after migration from a single server running both OpenERP and PostgreSQL, to two separate servers, the first hosting OpenERP and the second PostgreSQL. The cron worker (openerp-cron-worker) will go into a loop and rapidly log:
openerp.addons.base.ir.ir_cron: Starting job ...
After a few seconds or minutes the job is actually started correctly
and then it gets back into this loop again. This always seems to occur
on jobs which run every minute and as a side affect prevents any other
jobs from running at all.
After some intensive debugging it appears that the issue is down to
the clocks being out of sync between the two servers, and as the
clocks drift further apart the problem does get worse.
The simple solution... use ntpd! But I am thinking if there may be any
other cases where this problem would become apparent, or any other
parts of the system which would also be impacted by this problem.
It turns out the problem occurs for the following reason:
* In '_acquire_job' an SQL query is used to find all cron jobs which need to run (Uses time from PostgreSQL server)
* In '_process_job' it will use datetime.now() to get the current time (Uses times from OpenERP server)
* It will then go into a loop to compare now and the nextcall times and immediately abort because in this case nextcall<now.
This was encountered on version 6.1 r4334. I checked the same problem in 7.0 and although it does have the same problem with comparing dates, it does manage to run the other cron jobs as well, the scheduling seems fairer.
I think to make 6.1 fairer, in ir_cron.py line 363, the "return True" could be taken down one indentation, currently it is part of the "for job in cr.dictfetchall()" loop, shouldn't it be outside the loop?
Any thoughts on how a date mismatch between Postgres and OpenERP
should be handled?
Thanks,
Craig
To manage notifications about this bug go to:
https://bugs.launchpad.net/openobject-server/+bug/1129967/+subscriptions
References